New MSEarth benchmark uses MLLMs for Earth science discovery

By PulseAugur Editorial · [1 sources] · 2026-05-06 04:00

Researchers have developed MSEarth, a new multimodal benchmark designed to evaluate the capabilities of multimodal large language models (MLLMs) in Earth science reasoning. This dataset comprises over 289,000 figures with detailed captions and contextual discussions, drawn from open-access scientific publications across the five major Earth science spheres. MSEarth supports tasks like figure captioning, multiple-choice questions, and open-ended reasoning, aiming to provide a high-fidelity resource for advancing MLLMs in scientific discovery. AI

IMPACT Establishes a new benchmark for MLLMs in scientific reasoning, potentially accelerating AI applications in Earth science research.

RANK_REASON This is a research paper introducing a new benchmark dataset for evaluating multimodal large language models in Earth science. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.AI →

paper
other

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

New MSEarth benchmark uses MLLMs for Earth science discovery

COVERAGE [1]

arXiv cs.AI TIER_1 English(EN) · Xiangyu Zhao, Wanghan Xu, Bo Liu, Yuhao Zhou, Fenghua Ling, Ben Fei, Xiaoyu Yue, Lei Bai, Wenlong Zhang, Xiao-Ming Wu · 2026-05-06 04:00

MSEarth: A Multimodal Benchmark for Earth Science Phenomenon Discovery with MLLMs

arXiv:2505.20740v3 Announce Type: replace Abstract: The rapid advancement of multimodal large language models (MLLMs) offers new opportunities for complex scientific challenges, yet their application in earth science-especially at the graduate level-remains underexplored due to a…

COVERAGE [1]

MSEarth: A Multimodal Benchmark for Earth Science Phenomenon Discovery with MLLMs

RELATED ENTITIES

RELATED TOPICS