Brief

last 24h

[2/2] 221 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

COMMENTARY · AI Snake Oil English(EN) · 3d

Did Google’s AI agents really build an operating system for $916?

Researchers are questioning Google's claims about its AI agents building an operating system for under $1,000. They argue that the "single prompt" description is misleading, as the prompt was thousands of lines long and the process involved a specialized scaffold and agent oversight. Furthermore, Google has not provided evidence to show the agents wrote the code from scratch rather than copying existing material, nor have they released the prompt, code, or logs for independent verification. AI

IMPACT Raises questions about the reliability of AI agent capabilities in complex software development and the transparency of company demonstrations.
TOOL · arXiv cs.AI English(EN) · 4d

Open-World Evaluations for Measuring Frontier AI Capabilities

Researchers have introduced a new evaluation method called open-world evaluations, which complements traditional benchmark-based assessments for frontier AI capabilities. These evaluations focus on long-horizon, complex real-world tasks that are assessed qualitatively rather than through automated scoring. As a demonstration, an AI agent successfully developed and published an iOS application to the Apple App Store with minimal human intervention, indicating potential for widespread capabilities. AI

IMPACT Introduces a new evaluation framework that may offer a more realistic assessment of AI capabilities beyond current benchmarks.

Brief

Did Google’s AI agents really build an operating system for $916?

Open-World Evaluations for Measuring Frontier AI Capabilities