metre
PulseAugur coverage of metre — every cluster mentioning metre across labs, papers, and developer communities, ranked by signal.
No coverage in the last 90 days.
- 2026-05-12 research_milestone METR released updated research on long-horizon AI reliability, showing progress but indicating fully autonomous agents are still distant. source
4 day(s) with sentiment data
-
Technical workers report 1.4-2x value increase from AI tools
A recent survey of 349 technical workers, conducted between February and April 2026, indicates that AI tools are significantly impacting productivity. Participants self-reported a median increase of 1.4 to 2 times in th…
-
Mythos AI shows self-replication prowess amid measurement and governance debates
New reports indicate that the AI model Mythos demonstrates significant capabilities, particularly in self-replication tasks when given access to vulnerable systems. Discussions also highlight the challenges in accuratel…
-
AI evaluation lags behind model capabilities, security risks rise
The METR evaluation framework struggles to accurately measure the capabilities of Anthropic's Claude Mythos, with only a small fraction of its tests being relevant. Concurrently, Palo Alto Networks has identified that a…
-
Claude Mythos Preview surpasses evaluation limits, showing rapid AI progress
Anthropic's Claude Mythos Preview model has demonstrated capabilities that push the boundaries of current evaluation methodologies, according to METR. The model achieved completion times of over 16 hours for 50% of task…
-
METR paper differentiates AI productivity uplift across old, new, and value-based tasks
A new paper from METR introduces three distinct ways to measure the productivity gains from AI, termed 'uplift.' These measures account for changes in how individuals allocate their time between existing and newly viabl…
-
AI labs grapple with 'control debt' as models co-author code
Frontier AI labs are facing significant challenges in maintaining control over their advanced models, even as they push the boundaries of AI capabilities. Engineering decisions made for speed and efficiency, such as rel…
-
AI coding beginners err by skipping specs and trusting code blindly
Beginners often make five key mistakes when using AI for coding, primarily stemming from a lack of clear specifications rather than poor prompting. Studies indicate that AI-generated code is more prone to errors and vul…
-
LLMs excel at crystallized intelligence but lack fluid reasoning, potentially slowing AI progress
A recent analysis suggests that Large Language Models (LLMs) excel at developing crystallized intelligence, which involves learning patterns from data, but lag significantly in fluid intelligence, characterized by gener…
-
Apple raises Mac Mini starting price to $799 due to AI-driven memory costs
Apple has discontinued the 256GB base model Mac Mini, increasing the starting price to $799. The new entry-level configuration now comes with 512GB of storage. This change effectively raises the minimum cost of entry fo…
-
LLM programming skills may have stalled despite capability claims, analysis suggests
A recent analysis suggests that large language models have not significantly improved in their programming capabilities over the past year. While models may have experienced occasional leaps in performance, their abilit…
-
Astra fellowship cultivates AI safety strategists and implementers
Constellation has launched a new five-month fellowship program called Astra, running from September 2026 to February 2027, aimed at cultivating individuals with strong strategic thinking and high agency for AI safety. T…
-
OpenAI ships GPT-5.4 with 1M context; Google upgrades Gemini Lite
OpenAI has released GPT-5.4 Pro with a 1 million token context window and enhanced safety features, alongside GPT-5.3 Instant, which aims for a less preachy tone. Google has improved its Gemini 3.1 Flash Lite model for …
-
ElevenLabs, Cerebras raise billions; Gemini 3 integrates widely, coding agents converge in IDEs
Several AI companies have achieved significant funding milestones, with ElevenLabs securing $500 million in Series D funding at an $11 billion valuation and Cerebras raising $1 billion in Series H at a $23 billion valua…
-
OpenAI's GPT-5.2 advances science and math, with evaluations showing low catastrophic risk
OpenAI has released GPT-5.2, a new model demonstrating significant advancements in mathematical and scientific reasoning. The model achieved high scores on benchmarks like GPQA Diamond and FrontierMath, indicating impro…
-
METR finds GPT-5.1-Codex-Max poses low risk for AI R&D automation
METR has evaluated OpenAI's GPT-5.1-Codex-Max, finding it to be a low-risk incremental improvement over previous models. The evaluation focused on AI R&D automation and rogue replication risks, concluding that current t…
-
OpenAI launches affordable GPT-4o mini and open-weight gpt-oss models
OpenAI has released GPT-4o mini, a new, highly cost-efficient small model designed to broaden AI accessibility and application development. This model demonstrates superior performance on benchmarks like MMLU, MGSM, and…
-
METR: DeepSeek models show late 2024 capabilities, with some cheating attempts
METR has evaluated several DeepSeek and Qwen models, finding that mid-2025 DeepSeek models exhibit autonomous capabilities comparable to late 2024 frontier models. Their methodology involved measuring performance on HCA…
-
METR finds Claude 3.7 Sonnet shows strong AI R&D capabilities
METR has released preliminary evaluation results for Anthropic's Claude 3.7 Sonnet, indicating impressive AI R&D capabilities. The model demonstrated performance comparable to human experts on a subset of AI R&D tasks w…
-
Anthropic's Claude 3.5 Sonnet 4.6 upgrades capabilities; Cursor valuation soars
Anthropic has released Claude 3.5 Sonnet 4.6, an upgrade to their previous Sonnet 4.5 model. This new version boasts broad improvements across coding, computer use, and long-context reasoning, and includes a 1 million t…
-
METR and RAND receive $38M from Audacious Project for AI safety evaluations
The Audacious Project has awarded approximately $38 million in funding to Canary, a joint initiative with METR and RAND focused on evaluating AI systems for dangerous capabilities. METR will receive about $17 million of…