PulseAugur / Brief
EN
LIVE 23:29:37

Brief

last 24h
[1/1] 222 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

  1. TS-Skill: A Benchmark for Evaluating Analytical Skills in Time-Series Question Answering

    Researchers have introduced TS-Skill, a new benchmark designed to evaluate the analytical capabilities of large language models (LLMs) and time-series language models (TSLMs) in time-series question answering (TSQA). This benchmark focuses on three specific skills: temporal scale selection, temporal localization, and cross-interval integration, which are crucial for understanding temporal data patterns. Experiments using TS-Skill revealed significant performance gaps across these skills, particularly highlighting challenges in integrating information across separate time intervals for non-agentic models. AI

    IMPACT Provides a granular evaluation framework to identify and address specific temporal reasoning weaknesses in LLMs and TSLMs.