Researchers have developed a new framework called On-policy Data Evolution (ODE) to improve multimodal deep search agents. ODE addresses two key limitations: the inability to reuse intermediate visual information from search results and the static nature of training data. The system introduces an image bank reference protocol for reusable visual context and a closed-loop data generator that refines training data based on the agent's current capabilities. This approach significantly boosts agent performance, with an ODE-enhanced Qwen3-VL-8B model achieving a 39.0% average score across benchmarks, surpassing Gemini-2.5 Pro. AI
IMPACT Enhances multimodal agent capabilities by enabling reusable visual context and adaptive training data, potentially improving performance on complex search and reasoning tasks.
RANK_REASON The cluster contains a research paper detailing a new framework and its performance improvements on benchmarks. [lever_c_demoted from research: ic=1 ai=1.0]
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →