PulseAugur / Brief
EN
LIVE 10:16:53

Brief

last 24h
[1/1] 224 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

  1. DSAEval: Evaluating Data Science Agents on a Wide Range of Real-World Data Science Problems

    A new benchmark called DSAEval has been introduced to evaluate data science agents on real-world problems. The benchmark includes multimodal perception, multi-query interactions, and multi-dimensional evaluation across reasoning, code, and results. In evaluations, Claude Sonnet 4.5 performed best overall, while MiMo-V2-Pro and GPT-5.2 excelled in duration and step efficiency, respectively. The study also found that multimodal perception significantly improves performance on vision tasks, though challenges persist in unstructured data domains. AI

    IMPACT Establishes a new standard for evaluating AI data science agents, highlighting current limitations and future research directions.