Brief · PulseAugur

TOOL · arXiv cs.AI English(EN) · 2w

LiveK12Bench: Have Large Multimodal Models Truly Conquered High School-level Examinations?

A new benchmark called LiveK12Bench has been developed to assess the capabilities of Large Multimodal Models (LMMs) in high school-level examinations. This dynamic, multi-disciplinary benchmark includes over 2,000 questions from recent real-world exam papers across Mathematics, Physics, Chemistry, and Biology. Experiments using LiveK12Bench revealed significant performance drops for advanced models like GPT-5, highlighting a gap between their idealized reasoning and readiness for educational applications. AI

IMPACT Highlights critical limitations in LMMs' ability to handle complex, real-world educational assessments, indicating a need for further development beyond current reasoning benchmarks.

GPT-5
Large Multimodal Models
LiveK12Bench