New ODE framework boosts multimodal search agents, beats Gemini Pro

By PulseAugur Editorial · [1 sources] · 2026-05-11 16:49

Researchers have developed a new framework called On-policy Data Evolution (ODE) to improve multimodal deep search agents. This system allows agents to reuse intermediate visual information from search results and dynamically refines training data based on the agent's current learning progress. ODE enhances agent performance across various benchmarks, with significant improvements shown for Qwen3-VL models, surpassing Gemini-2.5 Pro in complex agent-workflow settings. AI

IMPACT Enhances multimodal search agent capabilities by enabling better data evolution and visual context reuse, potentially improving performance on complex tasks.

RANK_REASON The cluster contains a research paper detailing a new framework and its performance on benchmarks. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.CL →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

COVERAGE [1]

arXiv cs.CL TIER_1 English(EN) · Yi R. Fung · 2026-05-11 16:49

Towards On-Policy Data Evolution for Visual-Native Multimodal Deep Search Agents

Multimodal deep search requires an agent to solve open-world problems by chaining search, tool use, and visual reasoning over evolving textual and visual context. Two bottlenecks limit current systems. First, existing tool-use harnesses treat images returned by search, browsing, …

COVERAGE [1]

Towards On-Policy Data Evolution for Visual-Native Multimodal Deep Search Agents

RELATED ENTITIES

RELATED TOPICS