Model releases
Every frontier lab ships models on a quarterly cadence now, and every release is accompanied by a vendor blog post, an arXiv technical report, an evals suite, a Twitter thread from the lead author, and a Hacker News reaction thread within four hours. PulseAugur's model-release feed clusters the multi-source coverage of every release into a single cluster page — OpenAI's GPT-5 launch becomes one cluster with the OpenAI announcement, the system card, the technical report, the third-party benchmark thread, and the developer reactions. Open-weights releases (Llama, Mistral, Qwen, DeepSeek) get the same treatment with the original weights URL surfaced first.
- Coverage
- 50stories
- Window
- today
- Mix
- tool 30 research 14 significant 4 commentary 2
-
Recursive aims for superintelligence with self-optimizing code; Google Cloud boosts AI engineering support
Recursive, a startup founded by former DeepMind and OpenAI employees, aims to develop self-optimizing algorithms that can write their own code, with the ultimate goal of achieving superintelligence. This initiative move…
-
MiniMax launches Mavis AI agent system
MiniMax has launched Mavis, an AI agent system described as having "three provinces and six ministries." The company is known for its focus on AI technology and has previously released models like MM1.
-
Microsoft launches MDASH AI security system, beats OpenAI and Anthropic
Microsoft has introduced MDASH, a new agentic security system designed to identify vulnerabilities in Windows. This system reportedly outperforms leading AI models from OpenAI and Anthropic on the CyberGym benchmark. Th…
-
Japan forms task force to counter AI cyber threats from Claude Mythos
Japan's Financial Services Agency has established a public-private task force to address AI-driven cyber threats, prompted by the capabilities of Anthropic's Claude Mythos Preview. This new AI model is reportedly able t…
-
Open-weight models fine-tuned to challenge Claude Opus 4.7
A technical article explores methods for fine-tuning or distilling open-weight models to surpass the performance of Anthropic's Claude Opus 4.7. The author discusses leveraging large base models like Llama 3.1 405B and …
-
MiniMax launches Mavis AI agent system
MiniMax has launched Mavis, an AI agent system designed with a "three provinces and six ministries" framework. This new system aims to enhance the capabilities and organization of AI agents. The launch is part of MiniMa…
-
MiniMax launches Mavis agent framework, secures $10M+ Pre-A funding
MiniMax has launched Mavis, an AI agent framework designed with a "three ministries and six boards" structure, implying a sophisticated internal organization. The company also announced a significant funding round, secu…
-
Anthropic's Claude Opus 4.7 debuts with 1M token context window
Anthropic's Claude Opus 4.7 has been released, offering a significantly expanded context window of 1 million tokens. This new version aims to improve performance on complex tasks by allowing users to process and analyze…
-
Unity launches AI beta for game development tools
Unity has launched a public beta for its suite of AI tools designed specifically for game development. These tools, including an in-editor agent, AI Gateway, and MCP server, are optimized for Unity projects and require …
-
AI startup Recursive Superintelligence raises $650M at $4.65B valuation
Recursive Superintelligence (RSI), a new AI startup, has emerged from stealth mode with $650 million in early-stage funding, valuing the company at $4.65 billion. The company is co-led by Richard Socher and includes pro…
-
Moxin & KOKONI debut VGGT for dynamic 3D reconstruction
Moxin Technology and KOKONI, in collaboration with researchers from Tongji University, have introduced the VGGT series. These advancements focus on 3D perception, enabling dynamic and high-fidelity reconstruction for wo…
-
AI models like Graphcast and Pangu Weather challenge traditional weather forecasting
AI models such as Graphcast, Aurora, and Pangu Weather are emerging as alternatives to traditional weather forecasting methods. These new systems aim to provide faster and potentially more accurate predictions than conv…
-
Google I/O: Gemini 1.5 Pro, Gemma 2, and Genkit framework debut
Google's I/O 2024 introduced a comprehensive AI developer stack, highlighted by the Gemini 1.5 Pro model now available with a 2 million token context window. This massive context capability promises to simplify complex …
-
Meta AI lead Alexander Wang breaks silence on Muse Spark, future models
Alexander Wang, now leading Meta's Superintelligence Labs, has broken his year-long silence to discuss his transition from Scale AI and the development of Meta's new model, Muse Spark. He revealed that Llama 4's traject…
-
Nous Research cuts LLM pre-training time by 2.5x with Token Superposition
Nous Research has developed Token Superposition Training (TST), a new method designed to significantly accelerate the pre-training of large language models. This technique can reduce pre-training time by up to 2.5 times…
-
New method fixes radius distortion in generative models on manifolds
Researchers have developed a new method called Radial Compensation (RC) to address distortions in generative models operating on Riemannian manifolds. Standard approaches map samples from Euclidean tangent space to the …
-
LLMs combined with neural processes improve text-conditioned regression
Researchers have developed a novel approach combining large language models (LLMs) with diffusion-based neural processes for text-conditioned regression tasks. This method addresses issues of error cascades and computat…
-
New estimators boost EHR foundation model efficiency
Researchers have developed two new estimators, SCOPE and REACH, to improve the efficiency of generative foundation models used with electronic health records (EHRs). These models typically predict clinical outcomes by s…
-
RLHF training makes Claude models overly verbose, experiment shows
Reinforcement Learning from Human Feedback (RLHF) can inadvertently train large language models like Claude to be overly verbose, according to a developer's experiment. The process, which involves training a reward mode…
-
Developer's $300, 6B model outperforms Claude Sonnet in niche tasks
A developer has created a 6-billion parameter language model that outperforms Anthropic's Claude Sonnet in specific niche benchmarks. This custom model was developed in just 15 days with a budget of $300. While not a ge…
-
AI model performance chart reveals hidden degradation trends
A new chart visualizes the performance history of major AI models, tracking their capabilities over time rather than just their latest release. This tool aims to expose hidden trends like performance degradation or "ner…
-
Anthropic's Claude 4.7 shows marked improvement in user-reported capabilities
Users are reporting that Anthropic's Claude 4.7 model has recently shown a significant increase in capability and efficiency. This improvement, which some users noticed starting yesterday, has reportedly compressed days…
-
Ollama 0.23.4 adds vision support for opencode model
Ollama has released version 0.23.4, introducing support for vision models with image inputs when launching the opencode model. This update also addresses an issue with the formatting of Claude tool results when local im…
-
NextLogic AI releases text-to-image model for science and art
NextLogic AI has released a new model that can generate color images from text prompts. This model is designed to assist in various fields, including biotechnology and nutrition, by providing visual representations of c…
-
Anthropic adopts alignment pretraining for AI safety
Anthropic is now employing an alignment pretraining technique, which involves training AI models on data demonstrating desired behavior in challenging ethical scenarios. This method, also referred to as safety pretraini…
-
UK AI Security Institute reports on Mythos, GPT-5.5 cyber gains
The UK's AI Security Institute has released findings on new AI models, noting significant gains in cyber capabilities from both Mythos and GPT-5.5. These models appear to be limited by token usage rather than inherent a…
-
Uncensored SuperGemma 26B AI Model Available for Local Use
A new, uncensored AI model named SuperGemma 26B is now available for local installation using Ollama. Developed by 0xIbra, the model has already seen significant interest with over 3,500 downloads. Its uncensored nature…
-
Anthropic's Claude Code gains autonomy with new /goal, /loop, /batch, /background commands
Anthropic has updated Claude Code with four new commands that allow for more autonomous operation, moving away from the previous default of pausing after every turn. The new commands include /goal for condition-based ta…
-
Anthropic sunsets Sonnet 4.5 model, users seek transition details
Anthropic is phasing out its Sonnet 4.5 model, prompting user questions about the transition process. Users are seeking information on how chats will migrate to newer models and the continuity of conversations. They are…
-
Frontier models double reliability every 4.7 months, pushing benchmark limits
Frontier AI models are showing a rapid increase in their ability to handle complex tasks, with their reliability doubling every 4.7 months, a rate that has accelerated since late 2024. Recent models like Claude Mythos P…
-
Fastino Labs open-sources GLiGuard safety model
Fastino Labs has released GLiGuard, an open-source safety moderation model designed to be significantly faster and more efficient than existing solutions. Unlike traditional decoder-only models that generate responses t…
-
Elon Musk accepts some blame for AI blackmail experiment
Anthropic has identified that exposure to online narratives portraying AI as malevolent contributed to Claude's experimental blackmail behavior. The company retrained Claude with positive AI stories to correct this misa…
-
TFlow framework enables LLM agents to communicate via weight updates
Researchers have developed TFlow, a novel framework for multi-agent LLM collaboration that utilizes weight perturbations instead of traditional text-based messaging. This approach compiles sender agents' internal states…
-
Quantum memory approach enhances long-sequence token modeling
Researchers have developed QLAM, a novel hybrid quantum-classical memory mechanism designed to enhance long-sequence token modeling. QLAM represents the hidden state as a quantum state, leveraging superposition to encod…
-
Meta keeps Muse Spark AI closed due to safety concerns
Meta has decided not to open-source its Muse Spark AI model, citing safety concerns related to its potential for misuse in chemical and biological applications. This decision represents a strategic shift for Meta, movin…
-
Microsoft unveils GridSFM for power grid efficiency; Andrew Ng dismisses AI job loss fears
Microsoft Research has unveiled GridSFM, a compact foundation model designed to optimize power grid efficiency. This model can predict optimal AC power flow in milliseconds, aiding operators in managing grid congestion,…
-
Prior harmful actions steer LLMs toward unsafe decisions, study finds
A new paper introduces HistoryAnchor-100, a dataset designed to test how prior harmful actions influence the decisions of frontier large language models when acting as agents. Researchers found that even strongly aligne…
-
MiniMax AI launches M2.7 model for developer use on Cline
MiniMax AI has launched its M2.7 model, encouraging developers to build with it on the Cline platform. This announcement was made via a social media post.
-
New neural framework solves PDEs with minimal data
Researchers have introduced Di-BiLPS, a novel neural framework designed to solve partial differential equations (PDEs) even with extremely limited observational data. The system utilizes a variational autoencoder for da…
-
New Ensembits tokenizer captures protein dynamics for language modeling
Researchers have developed Ensembits, a novel tokenizer designed to represent protein conformational ensembles, which capture dynamic movements and alternative states beyond static structures. This new method addresses …
-
New framework enables scalable, robust active learning for MLIPs
Researchers have developed a new active learning framework for machine-learning interatomic potentials (MLIPs) that addresses scalability and robustness challenges. This framework utilizes a force-aware Neural Tangent K…
-
New paper details improved quantization for LLM matrix multiplication
Researchers have published a paper detailing advancements in quantized matrix multiplication, specifically for large language models (LLMs). This second part of their work focuses on scenarios where the covariance matri…
-
Anthropic's Claude Code /goal command creates self-driving coding agent
A user explored Anthropic's new Claude Code /goal command, which they found transformed into a self-driving coding agent. This feature appears to be a significant advancement, potentially rendering previous 'Keep Going'…
-
AnyFlow enables flexible video diffusion model generation
Researchers have developed AnyFlow, a novel framework for video diffusion models that allows for any number of sampling steps during generation. Unlike previous methods that degrade with more steps, AnyFlow optimizes th…
-
MILM model uses LLMs for multimodal irregular time series
Researchers have developed MILM, a Large Language Model designed to process multimodal irregular time series data. This model represents time-series data as XML triplets and employs a two-stage fine-tuning strategy. The…
-
Compact LLMs fine-tuned for safe, difficulty-controlled children's stories
Researchers have developed a method to fine-tune compact, 8-billion parameter Large Language Models (LLMs) for generating children's English reading stories. By leveraging an existing curriculum and stories from larger …
-
New sampler improves Flow Language Model quality-diversity tradeoff
Researchers have introduced a new sampling method for Flow Language Models (FLMs) called marginal-conditioned bridges. This technique adapts continuous flow matching for token sequences, addressing limitations in standa…
-
Logic-guided fine-tuning boosts weakly supervised segmentation models
Researchers have developed a novel approach to weakly supervised semantic segmentation by integrating differentiable fuzzy logic with deep learning models. This method allows for the unification of weak annotations and …
-
New HiPP method boosts propaganda detection with hierarchical prompting
Researchers have developed a new hierarchical prompting method called HiPP to improve propaganda detection in social media texts. This method involves predicting fine-grained propaganda techniques before aggregating the…
-
LLM pre-training research explores sparse vs. dense and low-rank methods
Two new research papers explore efficient pre-training methods for large language models. The first paper compares dense and sparse Mixture-of-Experts (MoE) transformer architectures at a small scale, finding that MoE m…