Towards On-Policy Data Evolution for Visual-Native Multimodal Deep Search Agents
Researchers have developed a new framework called On-policy Data Evolution (ODE) to improve multimodal deep search agents. ODE addresses two key limitations: the inability to reuse intermediate visual information from search results and the static nature of training data. The system introduces an image bank reference protocol for reusable visual context and a closed-loop data generator that refines training data based on the agent's current capabilities. This approach significantly boosts agent performance, with an ODE-enhanced Qwen3-VL-8B model achieving a 39.0% average score across benchmarks, surpassing Gemini-2.5 Pro. AI
IMPACT Enhances multimodal agent capabilities by enabling reusable visual context and adaptive training data, potentially improving performance on complex search and reasoning tasks.