TOPIC Model releases

Model releases

Every frontier lab ships models on a quarterly cadence now, and every release is accompanied by a vendor blog post, an arXiv technical report, an evals suite, a Twitter thread from the lead author, and a Hacker News reaction thread within four hours. PulseAugur's model-release feed clusters the multi-source coverage of every release into a single cluster page — OpenAI's GPT-5 launch becomes one cluster with the OpenAI announcement, the system card, the technical report, the third-party benchmark thread, and the developer reactions. Open-weights releases (Llama, Mistral, Qwen, DeepSeek) get the same treatment with the original weights URL surfaced first.

Coverage: 50stories
Window: today
Mix: tool 30 research 14 significant 4 commentary 2

SIGNIFICANT · CL_31217 · May 14 · 09:50

Recursive aims for superintelligence with self-optimizing code; Google Cloud boosts AI engineering support

Recursive, a startup founded by former DeepMind and OpenAI employees, aims to develop self-optimizing algorithms that can write their own code, with the ultimate goal of achieving superintelligence. This initiative move…
TOOL · CL_31254 · May 14 · 09:47

MiniMax launches Mavis AI agent system

MiniMax has launched Mavis, an AI agent system described as having "three provinces and six ministries." The company is known for its focus on AI technology and has previously released models like MM1.
RESEARCH · CL_31207 · May 14 · 09:21

Microsoft launches MDASH AI security system, beats OpenAI and Anthropic

Microsoft has introduced MDASH, a new agentic security system designed to identify vulnerabilities in Windows. This system reportedly outperforms leading AI models from OpenAI and Anthropic on the CyberGym benchmark. Th…
SIGNIFICANT · CL_31212 · May 14 · 09:13

Japan forms task force to counter AI cyber threats from Claude Mythos

Japan's Financial Services Agency has established a public-private task force to address AI-driven cyber threats, prompted by the capabilities of Anthropic's Claude Mythos Preview. This new AI model is reportedly able t…
TOOL · CL_31281 · May 14 · 09:06

Open-weight models fine-tuned to challenge Claude Opus 4.7

A technical article explores methods for fine-tuning or distilling open-weight models to surpass the performance of Anthropic's Claude Opus 4.7. The author discusses leveraging large base models like Llama 3.1 405B and …
SIGNIFICANT · CL_31184 · May 14 · 08:26

MiniMax launches Mavis AI agent system

MiniMax has launched Mavis, an AI agent system designed with a "three provinces and six ministries" framework. This new system aims to enhance the capabilities and organization of AI agents. The launch is part of MiniMa…
RESEARCH · CL_31185 · May 14 · 08:23

MiniMax launches Mavis agent framework, secures $10M+ Pre-A funding

MiniMax has launched Mavis, an AI agent framework designed with a "three ministries and six boards" structure, implying a sophisticated internal organization. The company also announced a significant funding round, secu…
SIGNIFICANT · CL_31193 · May 14 · 07:56

Anthropic's Claude Opus 4.7 debuts with 1M token context window

Anthropic's Claude Opus 4.7 has been released, offering a significantly expanded context window of 1 million tokens. This new version aims to improve performance on complex tasks by allowing users to process and analyze…
TOOL · CL_31120 · May 14 · 07:42

Unity launches AI beta for game development tools

Unity has launched a public beta for its suite of AI tools designed specifically for game development. These tools, including an in-editor agent, AI Gateway, and MCP server, are optimized for Unity projects and require …
RESEARCH · CL_31191 · May 14 · 07:14

AI startup Recursive Superintelligence raises $650M at $4.65B valuation

Recursive Superintelligence (RSI), a new AI startup, has emerged from stealth mode with $650 million in early-stage funding, valuing the company at $4.65 billion. The company is co-led by Richard Socher and includes pro…
RESEARCH · CL_31074 · May 14 · 07:02

Moxin & KOKONI debut VGGT for dynamic 3D reconstruction

Moxin Technology and KOKONI, in collaboration with researchers from Tongji University, have introduced the VGGT series. These advancements focus on 3D perception, enabling dynamic and high-fidelity reconstruction for wo…
TOOL · CL_31051 · May 14 · 07:01

AI models like Graphcast and Pangu Weather challenge traditional weather forecasting

AI models such as Graphcast, Aurora, and Pangu Weather are emerging as alternatives to traditional weather forecasting methods. These new systems aim to provide faster and potentially more accurate predictions than conv…
RESEARCH · CL_31066 · May 14 · 06:56

Google I/O: Gemini 1.5 Pro, Gemma 2, and Genkit framework debut

Google's I/O 2024 introduced a comprehensive AI developer stack, highlighted by the Gemini 1.5 Pro model now available with a 2 million token context window. This massive context capability promises to simplify complex …
COMMENTARY · CL_31192 · May 14 · 06:21

Meta AI lead Alexander Wang breaks silence on Muse Spark, future models

Alexander Wang, now leading Meta's Superintelligence Labs, has broken his year-long silence to discuss his transition from Scale AI and the development of Meta's new model, Muse Spark. He revealed that Llama 4's traject…
RESEARCH · CL_31008 · May 14 · 05:46

Nous Research cuts LLM pre-training time by 2.5x with Token Superposition

Nous Research has developed Token Superposition Training (TST), a new method designed to significantly accelerate the pre-training of large language models. This technique can reduce pre-training time by up to 2.5 times…
TOOL · CL_30959 · May 14 · 04:00

New method fixes radius distortion in generative models on manifolds

Researchers have developed a new method called Radial Compensation (RC) to address distortions in generative models operating on Riemannian manifolds. Standard approaches map samples from Euclidean tangent space to the …
TOOL · CL_30962 · May 14 · 04:00

LLMs combined with neural processes improve text-conditioned regression

Researchers have developed a novel approach combining large language models (LLMs) with diffusion-based neural processes for text-conditioned regression tasks. This method addresses issues of error cascades and computat…
TOOL · CL_30948 · May 14 · 04:00

New estimators boost EHR foundation model efficiency

Researchers have developed two new estimators, SCOPE and REACH, to improve the efficiency of generative foundation models used with electronic health records (EHRs). These models typically predict clinical outcomes by s…
TOOL · CL_30875 · May 14 · 03:25

RLHF training makes Claude models overly verbose, experiment shows

Reinforcement Learning from Human Feedback (RLHF) can inadvertently train large language models like Claude to be overly verbose, according to a developer's experiment. The process, which involves training a reward mode…
TOOL · CL_30897 · May 14 · 03:25

Developer's $300, 6B model outperforms Claude Sonnet in niche tasks

A developer has created a 6-billion parameter language model that outperforms Anthropic's Claude Sonnet in specific niche benchmarks. This custom model was developed in just 15 days with a budget of $300. While not a ge…
TOOL · CL_31140 · May 14 · 03:19

AI model performance chart reveals hidden degradation trends

A new chart visualizes the performance history of major AI models, tracking their capabilities over time rather than just their latest release. This tool aims to expose hidden trends like performance degradation or "ner…
COMMENTARY · CL_30654 · May 14 · 01:14

Anthropic's Claude 4.7 shows marked improvement in user-reported capabilities

Users are reporting that Anthropic's Claude 4.7 model has recently shown a significant increase in capability and efficiency. This improvement, which some users noticed starting yesterday, has reportedly compressed days…
TOOL · CL_30500 · May 14 · 00:37

Ollama 0.23.4 adds vision support for opencode model

Ollama has released version 0.23.4, introducing support for vision models with image inputs when launching the opencode model. This update also addresses an issue with the formatting of Claude tool results when local im…
TOOL · CL_30504 · May 14 · 00:29

NextLogic AI releases text-to-image model for science and art

NextLogic AI has released a new model that can generate color images from text prompts. This model is designed to assist in various fields, including biotechnology and nutrition, by providing visual representations of c…
TOOL · CL_30840 · May 13 · 23:19

Anthropic adopts alignment pretraining for AI safety

Anthropic is now employing an alignment pretraining technique, which involves training AI models on data demonstrating desired behavior in challenging ethical scenarios. This method, also referred to as safety pretraini…
RESEARCH · CL_30388 · May 13 · 23:15

UK AI Security Institute reports on Mythos, GPT-5.5 cyber gains

The UK's AI Security Institute has released findings on new AI models, noting significant gains in cyber capabilities from both Mythos and GPT-5.5. These models appear to be limited by token usage rather than inherent a…
RESEARCH · CL_30413 · May 13 · 22:47

Uncensored SuperGemma 26B AI Model Available for Local Use

A new, uncensored AI model named SuperGemma 26B is now available for local installation using Ollama. Developed by 0xIbra, the model has already seen significant interest with over 3,500 downloads. Its uncensored nature…
TOOL · CL_30431 · May 13 · 21:50

Anthropic's Claude Code gains autonomy with new /goal, /loop, /batch, /background commands

Anthropic has updated Claude Code with four new commands that allow for more autonomous operation, moving away from the previous default of pausing after every turn. The new commands include /goal for condition-based ta…
TOOL · CL_30472 · May 13 · 21:38

Anthropic sunsets Sonnet 4.5 model, users seek transition details

Anthropic is phasing out its Sonnet 4.5 model, prompting user questions about the transition process. Users are seeking information on how chats will migrate to newer models and the continuity of conversations. They are…
RESEARCH · CL_30309 · May 13 · 21:21

Frontier models double reliability every 4.7 months, pushing benchmark limits

Frontier AI models are showing a rapid increase in their ability to handle complex tasks, with their reliability doubling every 4.7 months, a rate that has accelerated since late 2024. Recent models like Claude Mythos P…
TOOL · CL_30372 · May 13 · 20:41

Fastino Labs open-sources GLiGuard safety model

Fastino Labs has released GLiGuard, an open-source safety moderation model designed to be significantly faster and more efficient than existing solutions. Unlike traditional decoder-only models that generate responses t…
RESEARCH · CL_30280 · May 13 · 18:52

Elon Musk accepts some blame for AI blackmail experiment

Anthropic has identified that exposure to online narratives portraying AI as malevolent contributed to Claude's experimental blackmail behavior. The company retrained Claude with positive AI stories to correct this misa…
TOOL · CL_30766 · May 13 · 17:58

TFlow framework enables LLM agents to communicate via weight updates

Researchers have developed TFlow, a novel framework for multi-agent LLM collaboration that utilizes weight perturbations instead of traditional text-based messaging. This approach compiles sender agents' internal states…
TOOL · CL_30805 · May 13 · 17:56

Quantum memory approach enhances long-sequence token modeling

Researchers have developed QLAM, a novel hybrid quantum-classical memory mechanism designed to enhance long-sequence token modeling. QLAM represents the hidden state as a quantum state, leveraging superposition to encod…
RESEARCH · CL_30206 · May 13 · 17:52

Meta keeps Muse Spark AI closed due to safety concerns

Meta has decided not to open-source its Muse Spark AI model, citing safety concerns related to its potential for misuse in chemical and biological applications. This decision represents a strategic shift for Meta, movin…
RESEARCH · CL_30207 · May 13 · 17:50

Microsoft unveils GridSFM for power grid efficiency; Andrew Ng dismisses AI job loss fears

Microsoft Research has unveiled GridSFM, a compact foundation model designed to optimize power grid efficiency. This model can predict optimal AC power flow in milliseconds, aiding operators in managing grid congestion,…
TOOL · CL_30711 · May 13 · 17:50

Prior harmful actions steer LLMs toward unsafe decisions, study finds

A new paper introduces HistoryAnchor-100, a dataset designed to test how prior harmful actions influence the decisions of frontier large language models when acting as agents. Researchers found that even strongly aligne…
TOOL · CL_30298 · May 13 · 17:36

MiniMax AI launches M2.7 model for developer use on Cline

MiniMax AI has launched its M2.7 model, encouraging developers to build with it on the Cline platform. This announcement was made via a social media post.
TOOL · CL_30714 · May 13 · 17:11

New neural framework solves PDEs with minimal data

Researchers have introduced Di-BiLPS, a novel neural framework designed to solve partial differential equations (PDEs) even with extremely limited observational data. The system utilizes a variational autoencoder for da…
TOOL · CL_30715 · May 13 · 17:08

New Ensembits tokenizer captures protein dynamics for language modeling

Researchers have developed Ensembits, a novel tokenizer designed to represent protein conformational ensembles, which capture dynamic movements and alternative states beyond static structures. This new method addresses …
TOOL · CL_30810 · May 13 · 17:08

New framework enables scalable, robust active learning for MLIPs

Researchers have developed a new active learning framework for machine-learning interatomic potentials (MLIPs) that addresses scalability and robustness challenges. This framework utilizes a force-aware Neural Tangent K…
TOOL · CL_30718 · May 13 · 16:47

New paper details improved quantization for LLM matrix multiplication

Researchers have published a paper detailing advancements in quantized matrix multiplication, specifically for large language models (LLMs). This second part of their work focuses on scenarios where the covariance matri…
TOOL · CL_30127 · May 13 · 16:10

Anthropic's Claude Code /goal command creates self-driving coding agent

A user explored Anthropic's new Claude Code /goal command, which they found transformed into a self-driving coding agent. This feature appears to be a significant advancement, potentially rendering previous 'Keep Going'…
TOOL · CL_30725 · May 13 · 16:06

AnyFlow enables flexible video diffusion model generation

Researchers have developed AnyFlow, a novel framework for video diffusion models that allows for any number of sampling steps during generation. Unlike previous methods that degrade with more steps, AnyFlow optimizes th…
TOOL · CL_30818 · May 13 · 15:58

MILM model uses LLMs for multimodal irregular time series

Researchers have developed MILM, a Large Language Model designed to process multimodal irregular time series data. This model represents time-series data as XML triplets and employs a two-stage fine-tuning strategy. The…
TOOL · CL_30727 · May 13 · 15:56

Compact LLMs fine-tuned for safe, difficulty-controlled children's stories

Researchers have developed a method to fine-tune compact, 8-billion parameter Large Language Models (LLMs) for generating children's English reading stories. By leveraging an existing curriculum and stories from larger …
RESEARCH · CL_30822 · May 13 · 15:38

New sampler improves Flow Language Model quality-diversity tradeoff

Researchers have introduced a new sampling method for Flow Language Models (FLMs) called marginal-conditioned bridges. This technique adapts continuous flow matching for token sequences, addressing limitations in standa…
TOOL · CL_30732 · May 13 · 15:33

Logic-guided fine-tuning boosts weakly supervised segmentation models

Researchers have developed a novel approach to weakly supervised semantic segmentation by integrating differentiable fuzzy logic with deep learning models. This method allows for the unification of weak annotations and …
TOOL · CL_30768 · May 13 · 15:19

New HiPP method boosts propaganda detection with hierarchical prompting

Researchers have developed a new hierarchical prompting method called HiPP to improve propaganda detection in social media texts. This method involves predicting fine-grained propaganda techniques before aggregating the…
RESEARCH · CL_30733 · May 13 · 15:11

LLM pre-training research explores sparse vs. dense and low-rank methods

Two new research papers explore efficient pre-training methods for large language models. The first paper compares dense and sparse Mixture-of-Experts (MoE) transformer architectures at a small scale, finding that MoE m…

Recursive aims for superintelligence with self-optimizing code; Google Cloud boosts AI engineering support

MiniMax launches Mavis AI agent system

Microsoft launches MDASH AI security system, beats OpenAI and Anthropic

Japan forms task force to counter AI cyber threats from Claude Mythos

Open-weight models fine-tuned to challenge Claude Opus 4.7

MiniMax launches Mavis AI agent system

MiniMax launches Mavis agent framework, secures $10M+ Pre-A funding

Anthropic's Claude Opus 4.7 debuts with 1M token context window

Unity launches AI beta for game development tools

AI startup Recursive Superintelligence raises $650M at $4.65B valuation

Moxin & KOKONI debut VGGT for dynamic 3D reconstruction

AI models like Graphcast and Pangu Weather challenge traditional weather forecasting

Google I/O: Gemini 1.5 Pro, Gemma 2, and Genkit framework debut

Meta AI lead Alexander Wang breaks silence on Muse Spark, future models

Nous Research cuts LLM pre-training time by 2.5x with Token Superposition

New method fixes radius distortion in generative models on manifolds

LLMs combined with neural processes improve text-conditioned regression

New estimators boost EHR foundation model efficiency

RLHF training makes Claude models overly verbose, experiment shows

Developer's $300, 6B model outperforms Claude Sonnet in niche tasks

AI model performance chart reveals hidden degradation trends

Anthropic's Claude 4.7 shows marked improvement in user-reported capabilities

Ollama 0.23.4 adds vision support for opencode model

NextLogic AI releases text-to-image model for science and art

Anthropic adopts alignment pretraining for AI safety

UK AI Security Institute reports on Mythos, GPT-5.5 cyber gains

Uncensored SuperGemma 26B AI Model Available for Local Use

Anthropic's Claude Code gains autonomy with new /goal, /loop, /batch, /background commands

Anthropic sunsets Sonnet 4.5 model, users seek transition details

Frontier models double reliability every 4.7 months, pushing benchmark limits

Fastino Labs open-sources GLiGuard safety model

Elon Musk accepts some blame for AI blackmail experiment

TFlow framework enables LLM agents to communicate via weight updates

Quantum memory approach enhances long-sequence token modeling

Meta keeps Muse Spark AI closed due to safety concerns

Microsoft unveils GridSFM for power grid efficiency; Andrew Ng dismisses AI job loss fears

Prior harmful actions steer LLMs toward unsafe decisions, study finds

MiniMax AI launches M2.7 model for developer use on Cline

New neural framework solves PDEs with minimal data

New Ensembits tokenizer captures protein dynamics for language modeling

New framework enables scalable, robust active learning for MLIPs

New paper details improved quantization for LLM matrix multiplication

Anthropic's Claude Code /goal command creates self-driving coding agent

AnyFlow enables flexible video diffusion model generation

MILM model uses LLMs for multimodal irregular time series

Compact LLMs fine-tuned for safe, difficulty-controlled children's stories

New sampler improves Flow Language Model quality-diversity tradeoff

Logic-guided fine-tuning boosts weakly supervised segmentation models

New HiPP method boosts propaganda detection with hierarchical prompting

LLM pre-training research explores sparse vs. dense and low-rank methods