Researchers have developed DocArena, a novel pipeline that automatically transforms raw document collections into training environments for search agents. This system leverages multimodal large language models (MLLMs) for visual perception and question-answering pair generation, eliminating the need for human annotation. The resulting DocArena-79K dataset, comprising over 8,000 documents across various domains and languages, has been used to train a Doc-Search agent. Experiments demonstrate that agents trained on DocArena data achieve superior performance in both retrieval accuracy and question-answering quality compared to agents trained on text-based environments. AI
IMPACT Enables more efficient and scalable training of multimodal document search agents, potentially improving information retrieval in complex document sets.
RANK_REASON This is a research paper detailing a new method and dataset for training AI agents. [lever_c_demoted from research: ic=1 ai=1.0]
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →