Brief

last 24h

[3/3] 221 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

RESEARCH · arXiv cs.AI English(EN) · 1w · [2 sources]

GIM: Evaluating models via tasks that integrate multiple cognitive domains

Researchers have introduced the Grounded Integration Measure (GIM), a new benchmark designed to evaluate large language models by integrating multiple cognitive domains. GIM comprises 820 original problems that require coordinating various cognitive operations over accessible knowledge, aiming to assess reasoning grounded in realistic tasks rather than pure memorization or abstract reasoning. The benchmark includes a public-private split for contamination diagnostics and utilizes an IRT model calibrated on over 200,000 prompt-response pairs from 28 models to generate robust ability estimates and a comprehensive leaderboard. AI

IMPACT Introduces a new evaluation framework that moves beyond knowledge recall and abstract reasoning to test integrated cognitive abilities in LLMs.
RESEARCH · arXiv cs.AI English(EN) · 6d · [2 sources]

Interaction Locality in Hierarchical Recursive Reasoning

Researchers have introduced a new framework called "interaction locality" to measure how information flows within AI models during spatial reasoning tasks. This framework analyzes whether computations remain localized or cross semantic boundaries, applying it to hierarchical and recursive reasoning models like HRM and TRM. The study found that high-level states in these models tend to write information locally, which is then accumulated into broader structures through recursive updates, a pattern also observed in embodied 3D models at module boundaries. AI

IMPACT Provides a new measurement framework for understanding spatial reasoning in AI, potentially leading to more efficient and interpretable models.
RESEARCH · arXiv cs.CL English(EN) · 6d · [2 sources]

optimize_anything: A Universal API for Optimizing any Text Parameter

Researchers have developed "optimize_anything," a universal API that uses LLMs to solve a wide range of optimization problems by treating them as text-based improvements. This system demonstrates state-of-the-art results across diverse tasks, including enhancing AI agent architectures, optimizing cloud scheduling algorithms, and generating efficient CUDA kernels. The research highlights that providing actionable side information and employing multi-task learning significantly improves convergence and final scores compared to score-only feedback or independent optimization. AI

IMPACT This new optimization paradigm could unify diverse problem-solving tasks under a single LLM-based framework, potentially streamlining development and improving performance across various domains.

Brief

GIM: Evaluating models via tasks that integrate multiple cognitive domains

Interaction Locality in Hierarchical Recursive Reasoning

optimize_anything: A Universal API for Optimizing any Text Parameter