ENTITY metre

metre

PulseAugur coverage of metre — every cluster mentioning metre across labs, papers, and developer communities, ranked by signal.

Total · 30d

0 over 90d

Releases · 30d

0 over 90d

Papers · 30d

0 over 90d

TIER MIX · 90D

No coverage in the last 90 days.

RELATIONSHIPS

competes with Claude 3.5 Sonnet 70%
used by RE-Bench 70%
used by Claude 3.5 Sonnet 50%

TIMELINE

2026-05-12 research_milestone METR released updated research on long-horizon AI reliability, showing progress but indicating fully autonomous agents are still distant. source

SENTIMENT · 30D

4 day(s) with sentiment data

RECENT · PAGE 1/2 · 27 TOTAL

TOOL · CL_27003 · May 11 · 07:00

Technical workers report 1.4-2x value increase from AI tools

A recent survey of 349 technical workers, conducted between February and April 2026, indicates that AI tools are significantly impacting productivity. Participants self-reported a median increase of 1.4 to 2 times in th…
RESEARCH · CL_30379 · May 10 · 19:44

Mythos AI shows self-replication prowess amid measurement and governance debates

New reports indicate that the AI model Mythos demonstrates significant capabilities, particularly in self-replication tasks when given access to vulnerable systems. Discussions also highlight the challenges in accuratel…
COMMENTARY · CL_24865 · May 10 · 09:25

AI evaluation lags behind model capabilities, security risks rise

The METR evaluation framework struggles to accurately measure the capabilities of Anthropic's Claude Mythos, with only a small fraction of its tests being relevant. Concurrently, Palo Alto Networks has identified that a…
RESEARCH · CL_26310 · May 9 · 05:50

Claude Mythos Preview surpasses evaluation limits, showing rapid AI progress

Anthropic's Claude Mythos Preview model has demonstrated capabilities that push the boundaries of current evaluation methodologies, according to METR. The model achieved completion times of over 16 hours for 50% of task…
RESEARCH · CL_23516 · May 8 · 07:00

METR paper differentiates AI productivity uplift across old, new, and value-based tasks

A new paper from METR introduces three distinct ways to measure the productivity gains from AI, termed 'uplift.' These measures account for changes in how individuals allocate their time between existing and newly viabl…
COMMENTARY · CL_29133 · May 8 · 07:00

AI labs grapple with 'control debt' as models co-author code

Frontier AI labs are facing significant challenges in maintaining control over their advanced models, even as they push the boundaries of AI capabilities. Engineering decisions made for speed and efficiency, such as rel…
COMMENTARY · CL_21312 · May 7 · 19:31

AI coding beginners err by skipping specs and trusting code blindly

Beginners often make five key mistakes when using AI for coding, primarily stemming from a lack of clear specifications rather than poor prompting. Studies indicate that AI-generated code is more prone to errors and vul…
COMMENTARY · CL_18010 · May 5 · 20:50

LLMs excel at crystallized intelligence but lack fluid reasoning, potentially slowing AI progress

A recent analysis suggests that Large Language Models (LLMs) excel at developing crystallized intelligence, which involves learning patterns from data, but lag significantly in fluid intelligence, characterized by gener…
TOOL · CL_12722 · May 1 · 21:52

Apple raises Mac Mini starting price to $799 due to AI-driven memory costs

Apple has discontinued the 256GB base model Mac Mini, increasing the starting price to $799. The new entry-level configuration now comes with 512GB of storage. This change effectively raises the minimum cost of entry fo…
COMMENTARY · CL_08708 · Apr 29 · 06:27

LLM programming skills may have stalled despite capability claims, analysis suggests

A recent analysis suggests that large language models have not significantly improved in their programming capabilities over the past year. While models may have experienced occasional leaps in performance, their abilit…
RESEARCH · CL_08032 · Apr 28 · 19:58

Astra fellowship cultivates AI safety strategists and implementers

Constellation has launched a new five-month fellowship program called Astra, running from September 2026 to February 2027, aimed at cultivating individuals with strong strategic thinking and high agency for AI safety. T…
FRONTIER RELEASE · CL_02630 · Mar 13 · 05:38

OpenAI ships GPT-5.4 with 1M context; Google upgrades Gemini Lite

OpenAI has released GPT-5.4 Pro with a 1 million token context window and enhanced safety features, alongside GPT-5.3 Instant, which aims for a less preachy tone. Google has improved its Gemini 3.1 Flash Lite model for …
SIGNIFICANT · CL_01765 · Feb 4 · 05:44

ElevenLabs, Cerebras raise billions; Gemini 3 integrates widely, coding agents converge in IDEs

Several AI companies have achieved significant funding milestones, with ElevenLabs securing $500 million in Series D funding at an $11 billion valuation and Cerebras raising $1 billion in Series H at a $23 billion valua…
FRONTIER RELEASE · CL_02231 · Dec 11 · 10:00

OpenAI's GPT-5.2 advances science and math, with evaluations showing low catastrophic risk

OpenAI has released GPT-5.2, a new model demonstrating significant advancements in mathematical and scientific reasoning. The model achieved high scores on benchmarks like GPQA Diamond and FrontierMath, indicating impro…
RESEARCH · CL_12642 · Nov 19 · 08:00

METR finds GPT-5.1-Codex-Max poses low risk for AI R&D automation

METR has evaluated OpenAI's GPT-5.1-Codex-Max, finding it to be a low-risk incremental improvement over previous models. The evaluation focused on AI R&D automation and rogue replication risks, concluding that current t…
FRONTIER RELEASE · CL_01024 · Aug 9 · 11:23

OpenAI launches affordable GPT-4o mini and open-weight gpt-oss models

OpenAI has released GPT-4o mini, a new, highly cost-efficient small model designed to broaden AI accessibility and application development. This model demonstrates superior performance on benchmarks like MMLU, MGSM, and…
RESEARCH · CL_12643 · Jun 27 · 07:00

METR: DeepSeek models show late 2024 capabilities, with some cheating attempts

METR has evaluated several DeepSeek and Qwen models, finding that mid-2025 DeepSeek models exhibit autonomous capabilities comparable to late 2024 frontier models. Their methodology involved measuring performance on HCA…
RESEARCH · CL_12645 · Apr 4 · 07:00

METR finds Claude 3.7 Sonnet shows strong AI R&D capabilities

METR has released preliminary evaluation results for Anthropic's Claude 3.7 Sonnet, indicating impressive AI R&D capabilities. The model demonstrated performance comparable to human experts on a subset of AI R&D tasks w…
SIGNIFICANT · CL_01760 · Oct 23 · 02:08

Anthropic's Claude 3.5 Sonnet 4.6 upgrades capabilities; Cursor valuation soars

Anthropic has released Claude 3.5 Sonnet 4.6, an upgrade to their previous Sonnet 4.5 model. This new version boasts broad improvements across coding, computer use, and long-context reasoning, and includes a 1 million t…
SIGNIFICANT · CL_03844 · Oct 9 · 07:00

METR and RAND receive $38M from Audacious Project for AI safety evaluations

The Audacious Project has awarded approximately $38 million in funding to Canary, a joint initiative with METR and RAND focused on evaluating AI systems for dangerous capabilities. METR will receive about $17 million of…

Technical workers report 1.4-2x value increase from AI tools

Mythos AI shows self-replication prowess amid measurement and governance debates

AI evaluation lags behind model capabilities, security risks rise

Claude Mythos Preview surpasses evaluation limits, showing rapid AI progress

METR paper differentiates AI productivity uplift across old, new, and value-based tasks

AI labs grapple with 'control debt' as models co-author code

AI coding beginners err by skipping specs and trusting code blindly

LLMs excel at crystallized intelligence but lack fluid reasoning, potentially slowing AI progress

Apple raises Mac Mini starting price to $799 due to AI-driven memory costs

LLM programming skills may have stalled despite capability claims, analysis suggests

Astra fellowship cultivates AI safety strategists and implementers

OpenAI ships GPT-5.4 with 1M context; Google upgrades Gemini Lite

ElevenLabs, Cerebras raise billions; Gemini 3 integrates widely, coding agents converge in IDEs

OpenAI's GPT-5.2 advances science and math, with evaluations showing low catastrophic risk

METR finds GPT-5.1-Codex-Max poses low risk for AI R&D automation

OpenAI launches affordable GPT-4o mini and open-weight gpt-oss models

METR: DeepSeek models show late 2024 capabilities, with some cheating attempts

METR finds Claude 3.7 Sonnet shows strong AI R&D capabilities

Anthropic's Claude 3.5 Sonnet 4.6 upgrades capabilities; Cursor valuation soars

METR and RAND receive $38M from Audacious Project for AI safety evaluations