Researchers have developed MSEarth, a new multimodal benchmark designed to evaluate the capabilities of multimodal large language models (MLLMs) in Earth science reasoning. This dataset comprises over 289,000 figures with detailed captions and contextual discussions, drawn from open-access scientific publications across the five major Earth science spheres. MSEarth supports tasks like figure captioning, multiple-choice questions, and open-ended reasoning, aiming to provide a high-fidelity resource for advancing MLLMs in scientific discovery. AI
Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →
IMPACT Establishes a new benchmark for MLLMs in scientific reasoning, potentially accelerating AI applications in Earth science research.
RANK_REASON This is a research paper introducing a new benchmark dataset for evaluating multimodal large language models in Earth science. [lever_c_demoted from research: ic=1 ai=1.0]