PulseAugur / Brief
EN
LIVE 12:40:13

Brief

last 24h
[1/1] 223 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

  1. Sci-Rho: A Multilingual Visually-Grounded Symbolic Benchmark for STEM Problems

    Researchers have introduced Sci-Rho, a new multilingual benchmark designed to test the robustness of visual-language models (VLMs) on STEM problems. This benchmark includes over 4,200 problem templates across five subjects and seven languages, generating more than 42,000 unique instances. Evaluations of 17 state-of-the-art VLMs revealed a significant gap between average and worst-case accuracy, with smaller models showing more performance degradation across languages compared to larger, proprietary models. AI

    IMPACT Highlights the need for more robust evaluation methods for VLMs, particularly across different languages and visual contexts.