PulseAugur / Brief
EN
LIVE 17:52:13

Brief

last 24h
[1/1] 224 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

  1. LiveK12Bench: Have Large Multimodal Models Truly Conquered High School-level Examinations?

    A new benchmark called LiveK12Bench has been developed to assess the capabilities of Large Multimodal Models (LMMs) in high school-level examinations. This dynamic, multi-disciplinary benchmark includes over 2,000 questions from recent real-world exam papers across Mathematics, Physics, Chemistry, and Biology. Experiments using LiveK12Bench revealed significant performance drops for advanced models like GPT-5, highlighting a gap between their idealized reasoning and readiness for educational applications. AI

    IMPACT Highlights critical limitations in LMMs' ability to handle complex, real-world educational assessments, indicating a need for further development beyond current reasoning benchmarks.