PulseAugur / Brief
EN
LIVE 10:19:18

Brief

last 24h
[1/1] 222 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

  1. Knowledge Index of Noah's Ark

    A new benchmark called KINA has been developed to evaluate large language models across 261 fine-grained disciplines, addressing issues of scaling-driven design and annotation quality. The benchmark, comprising 899 items, aims for disciplinary representativeness and improved review quality through a novel tournament system. In evaluations of 42 models, Gemini-3.1-Pro-Preview led with 53.17%, followed by Claude-Opus-4.6 and GPT-5.4, indicating significant room for improvement. AI

    IMPACT Establishes a new, more rigorous benchmark for LLM evaluation, potentially driving improvements in model capabilities and disciplinary understanding.