PulseAugur
EN
LIVE 08:08:57

New ODE framework boosts multimodal AI agents with reusable visuals

Researchers have developed a new framework called On-policy Data Evolution (ODE) to improve multimodal deep search agents. ODE addresses two key limitations: the inability to reuse intermediate visual information from search results and the static nature of training data. The system introduces an image bank reference protocol for reusable visual context and a closed-loop data generator that refines training data based on the agent's current capabilities. This approach significantly boosts agent performance, with an ODE-enhanced Qwen3-VL-8B model achieving a 39.0% average score across benchmarks, surpassing Gemini-2.5 Pro. AI

IMPACT Enhances multimodal agent capabilities by enabling reusable visual context and adaptive training data, potentially improving performance on complex search and reasoning tasks.

RANK_REASON The cluster contains a research paper detailing a new framework and its performance improvements on benchmarks. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.CL →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

COVERAGE [1]

  1. arXiv cs.CL TIER_1 English(EN) · Shijue Huang, Hangyu Guo, Guanting Dong, Chenxin Li, Junting Lu, Xinyu Geng, Zhaochen Su, Zhenyu Li, Shuang Chen, Hongru Wang, Yi R. Fung ·

    Towards On-Policy Data Evolution for Visual-Native Multimodal Deep Search Agents

    arXiv:2605.10832v2 Announce Type: replace Abstract: Multimodal deep search requires an agent to solve open-world problems by chaining search, tool use, and visual reasoning over evolving textual and visual context. Two bottlenecks limit current systems. First, existing tool-use h…