frontier models
PulseAugur coverage of frontier models — every cluster mentioning frontier models across labs, papers, and developer communities, ranked by signal.
4 天有情绪数据
-
New metric 'intelligence per watt' measures local AI efficiency
A new research paper introduces "intelligence per watt" (IPW) as a metric to evaluate the efficiency of local AI models. The study found that local models can accurately answer 88.7% of real-world queries and have shown…
-
AI models fail to reliably forecast scientific progress, study finds
A new benchmark called CUSP has been developed to evaluate AI's ability to forecast scientific progress. The study found that current frontier AI models struggle with predicting the realization and timing of scientific …
-
New benchmarks tackle AI agent safety in complex environments
Researchers are developing new benchmarks to address the safety risks of AI agents, particularly in multi-agent and interactive environments. GT-HarmBench evaluates frontier models in game-theoretic scenarios, revealing…
-
Microsoft: Frontier AI models falter on long, complex tasks
Microsoft researchers discovered that advanced AI models struggle with long, multi-step tasks, introducing errors even in complex workflows. This suggests that current frontier models are not yet reliable for intricate,…
-
AI agent clarification timing is task-dependent, study finds
A new study on long-horizon AI agents reveals that the optimal timing for seeking clarification is not always early in the execution process. Researchers found that the value of clarification varies significantly depend…