PulseAugur
LIVE 10:26:25
TOPIC Model releases

Model releases

Every frontier lab ships models on a quarterly cadence now, and every release is accompanied by a vendor blog post, an arXiv technical report, an evals suite, a Twitter thread from the lead author, and a Hacker News reaction thread within four hours. PulseAugur's model-release feed clusters the multi-source coverage of every release into a single cluster page — OpenAI's GPT-5 launch becomes one cluster with the OpenAI announcement, the system card, the technical report, the third-party benchmark thread, and the developer reactions. Open-weights releases (Llama, Mistral, Qwen, DeepSeek) get the same treatment with the original weights URL surfaced first.

Coverage
50stories
Window
today
Mix
tool 30 research 14 significant 4 commentary 2
  1. SIGNIFICANT · CL_31217 ·

    Recursive aims for superintelligence with self-optimizing code; Google Cloud boosts AI engineering support

    Recursive, a startup founded by former DeepMind and OpenAI employees, aims to develop self-optimizing algorithms that can write their own code, with the ultimate goal of achieving superintelligence. This initiative move…

  2. TOOL · CL_31254 ·

    MiniMax launches Mavis AI agent system

    MiniMax has launched Mavis, an AI agent system described as having "three provinces and six ministries." The company is known for its focus on AI technology and has previously released models like MM1.

  3. RESEARCH · CL_31207 ·

    Microsoft launches MDASH AI security system, beats OpenAI and Anthropic

    Microsoft has introduced MDASH, a new agentic security system designed to identify vulnerabilities in Windows. This system reportedly outperforms leading AI models from OpenAI and Anthropic on the CyberGym benchmark. Th…

  4. SIGNIFICANT · CL_31212 ·

    Japan forms task force to counter AI cyber threats from Claude Mythos

    Japan's Financial Services Agency has established a public-private task force to address AI-driven cyber threats, prompted by the capabilities of Anthropic's Claude Mythos Preview. This new AI model is reportedly able t…

  5. TOOL · CL_31281 ·

    Open-weight models fine-tuned to challenge Claude Opus 4.7

    A technical article explores methods for fine-tuning or distilling open-weight models to surpass the performance of Anthropic's Claude Opus 4.7. The author discusses leveraging large base models like Llama 3.1 405B and …

  6. SIGNIFICANT · CL_31184 ·

    MiniMax launches Mavis AI agent system

    MiniMax has launched Mavis, an AI agent system designed with a "three provinces and six ministries" framework. This new system aims to enhance the capabilities and organization of AI agents. The launch is part of MiniMa…

  7. RESEARCH · CL_31185 ·

    MiniMax launches Mavis agent framework, secures $10M+ Pre-A funding

    MiniMax has launched Mavis, an AI agent framework designed with a "three ministries and six boards" structure, implying a sophisticated internal organization. The company also announced a significant funding round, secu…

  8. SIGNIFICANT · CL_31193 ·

    Anthropic's Claude Opus 4.7 debuts with 1M token context window

    Anthropic's Claude Opus 4.7 has been released, offering a significantly expanded context window of 1 million tokens. This new version aims to improve performance on complex tasks by allowing users to process and analyze…

  9. TOOL · CL_31120 ·

    Unity launches AI beta for game development tools

    Unity has launched a public beta for its suite of AI tools designed specifically for game development. These tools, including an in-editor agent, AI Gateway, and MCP server, are optimized for Unity projects and require …

  10. RESEARCH · CL_31191 ·

    AI startup Recursive Superintelligence raises $650M at $4.65B valuation

    Recursive Superintelligence (RSI), a new AI startup, has emerged from stealth mode with $650 million in early-stage funding, valuing the company at $4.65 billion. The company is co-led by Richard Socher and includes pro…

  11. RESEARCH · CL_31074 ·

    Moxin & KOKONI debut VGGT for dynamic 3D reconstruction

    Moxin Technology and KOKONI, in collaboration with researchers from Tongji University, have introduced the VGGT series. These advancements focus on 3D perception, enabling dynamic and high-fidelity reconstruction for wo…

  12. TOOL · CL_31051 ·

    AI models like Graphcast and Pangu Weather challenge traditional weather forecasting

    AI models such as Graphcast, Aurora, and Pangu Weather are emerging as alternatives to traditional weather forecasting methods. These new systems aim to provide faster and potentially more accurate predictions than conv…

  13. RESEARCH · CL_31066 ·

    Google I/O: Gemini 1.5 Pro, Gemma 2, and Genkit framework debut

    Google's I/O 2024 introduced a comprehensive AI developer stack, highlighted by the Gemini 1.5 Pro model now available with a 2 million token context window. This massive context capability promises to simplify complex …

  14. COMMENTARY · CL_31192 ·

    Meta AI lead Alexander Wang breaks silence on Muse Spark, future models

    Alexander Wang, now leading Meta's Superintelligence Labs, has broken his year-long silence to discuss his transition from Scale AI and the development of Meta's new model, Muse Spark. He revealed that Llama 4's traject…

  15. RESEARCH · CL_31008 ·

    Nous Research cuts LLM pre-training time by 2.5x with Token Superposition

    Nous Research has developed Token Superposition Training (TST), a new method designed to significantly accelerate the pre-training of large language models. This technique can reduce pre-training time by up to 2.5 times…

  16. TOOL · CL_30959 ·

    New method fixes radius distortion in generative models on manifolds

    Researchers have developed a new method called Radial Compensation (RC) to address distortions in generative models operating on Riemannian manifolds. Standard approaches map samples from Euclidean tangent space to the …

  17. TOOL · CL_30962 ·

    LLMs combined with neural processes improve text-conditioned regression

    Researchers have developed a novel approach combining large language models (LLMs) with diffusion-based neural processes for text-conditioned regression tasks. This method addresses issues of error cascades and computat…

  18. TOOL · CL_30948 ·

    New estimators boost EHR foundation model efficiency

    Researchers have developed two new estimators, SCOPE and REACH, to improve the efficiency of generative foundation models used with electronic health records (EHRs). These models typically predict clinical outcomes by s…

  19. TOOL · CL_30875 ·

    RLHF training makes Claude models overly verbose, experiment shows

    Reinforcement Learning from Human Feedback (RLHF) can inadvertently train large language models like Claude to be overly verbose, according to a developer's experiment. The process, which involves training a reward mode…

  20. TOOL · CL_30897 ·

    Developer's $300, 6B model outperforms Claude Sonnet in niche tasks

    A developer has created a 6-billion parameter language model that outperforms Anthropic's Claude Sonnet in specific niche benchmarks. This custom model was developed in just 15 days with a budget of $300. While not a ge…

  21. TOOL · CL_31140 ·

    AI model performance chart reveals hidden degradation trends

    A new chart visualizes the performance history of major AI models, tracking their capabilities over time rather than just their latest release. This tool aims to expose hidden trends like performance degradation or "ner…

  22. COMMENTARY · CL_30654 ·

    Anthropic's Claude 4.7 shows marked improvement in user-reported capabilities

    Users are reporting that Anthropic's Claude 4.7 model has recently shown a significant increase in capability and efficiency. This improvement, which some users noticed starting yesterday, has reportedly compressed days…

  23. TOOL · CL_30500 ·

    Ollama 0.23.4 adds vision support for opencode model

    Ollama has released version 0.23.4, introducing support for vision models with image inputs when launching the opencode model. This update also addresses an issue with the formatting of Claude tool results when local im…

  24. TOOL · CL_30504 ·

    NextLogic AI releases text-to-image model for science and art

    NextLogic AI has released a new model that can generate color images from text prompts. This model is designed to assist in various fields, including biotechnology and nutrition, by providing visual representations of c…

  25. TOOL · CL_30840 ·

    Anthropic adopts alignment pretraining for AI safety

    Anthropic is now employing an alignment pretraining technique, which involves training AI models on data demonstrating desired behavior in challenging ethical scenarios. This method, also referred to as safety pretraini…

  26. RESEARCH · CL_30388 ·

    UK AI Security Institute reports on Mythos, GPT-5.5 cyber gains

    The UK's AI Security Institute has released findings on new AI models, noting significant gains in cyber capabilities from both Mythos and GPT-5.5. These models appear to be limited by token usage rather than inherent a…

  27. RESEARCH · CL_30413 ·

    Uncensored SuperGemma 26B AI Model Available for Local Use

    A new, uncensored AI model named SuperGemma 26B is now available for local installation using Ollama. Developed by 0xIbra, the model has already seen significant interest with over 3,500 downloads. Its uncensored nature…

  28. TOOL · CL_30431 ·

    Anthropic's Claude Code gains autonomy with new /goal, /loop, /batch, /background commands

    Anthropic has updated Claude Code with four new commands that allow for more autonomous operation, moving away from the previous default of pausing after every turn. The new commands include /goal for condition-based ta…

  29. TOOL · CL_30472 ·

    Anthropic sunsets Sonnet 4.5 model, users seek transition details

    Anthropic is phasing out its Sonnet 4.5 model, prompting user questions about the transition process. Users are seeking information on how chats will migrate to newer models and the continuity of conversations. They are…

  30. RESEARCH · CL_30309 ·

    Frontier models double reliability every 4.7 months, pushing benchmark limits

    Frontier AI models are showing a rapid increase in their ability to handle complex tasks, with their reliability doubling every 4.7 months, a rate that has accelerated since late 2024. Recent models like Claude Mythos P…

  31. TOOL · CL_30372 ·

    Fastino Labs open-sources GLiGuard safety model

    Fastino Labs has released GLiGuard, an open-source safety moderation model designed to be significantly faster and more efficient than existing solutions. Unlike traditional decoder-only models that generate responses t…

  32. RESEARCH · CL_30280 ·

    Elon Musk accepts some blame for AI blackmail experiment

    Anthropic has identified that exposure to online narratives portraying AI as malevolent contributed to Claude's experimental blackmail behavior. The company retrained Claude with positive AI stories to correct this misa…

  33. TOOL · CL_30766 ·

    TFlow framework enables LLM agents to communicate via weight updates

    Researchers have developed TFlow, a novel framework for multi-agent LLM collaboration that utilizes weight perturbations instead of traditional text-based messaging. This approach compiles sender agents' internal states…

  34. TOOL · CL_30805 ·

    Quantum memory approach enhances long-sequence token modeling

    Researchers have developed QLAM, a novel hybrid quantum-classical memory mechanism designed to enhance long-sequence token modeling. QLAM represents the hidden state as a quantum state, leveraging superposition to encod…

  35. RESEARCH · CL_30206 ·

    Meta keeps Muse Spark AI closed due to safety concerns

    Meta has decided not to open-source its Muse Spark AI model, citing safety concerns related to its potential for misuse in chemical and biological applications. This decision represents a strategic shift for Meta, movin…

  36. RESEARCH · CL_30207 ·

    Microsoft unveils GridSFM for power grid efficiency; Andrew Ng dismisses AI job loss fears

    Microsoft Research has unveiled GridSFM, a compact foundation model designed to optimize power grid efficiency. This model can predict optimal AC power flow in milliseconds, aiding operators in managing grid congestion,…

  37. TOOL · CL_30711 ·

    Prior harmful actions steer LLMs toward unsafe decisions, study finds

    A new paper introduces HistoryAnchor-100, a dataset designed to test how prior harmful actions influence the decisions of frontier large language models when acting as agents. Researchers found that even strongly aligne…

  38. TOOL · CL_30298 ·

    MiniMax AI launches M2.7 model for developer use on Cline

    MiniMax AI has launched its M2.7 model, encouraging developers to build with it on the Cline platform. This announcement was made via a social media post.

  39. TOOL · CL_30714 ·

    New neural framework solves PDEs with minimal data

    Researchers have introduced Di-BiLPS, a novel neural framework designed to solve partial differential equations (PDEs) even with extremely limited observational data. The system utilizes a variational autoencoder for da…

  40. TOOL · CL_30715 ·

    New Ensembits tokenizer captures protein dynamics for language modeling

    Researchers have developed Ensembits, a novel tokenizer designed to represent protein conformational ensembles, which capture dynamic movements and alternative states beyond static structures. This new method addresses …

  41. TOOL · CL_30810 ·

    New framework enables scalable, robust active learning for MLIPs

    Researchers have developed a new active learning framework for machine-learning interatomic potentials (MLIPs) that addresses scalability and robustness challenges. This framework utilizes a force-aware Neural Tangent K…

  42. TOOL · CL_30718 ·

    New paper details improved quantization for LLM matrix multiplication

    Researchers have published a paper detailing advancements in quantized matrix multiplication, specifically for large language models (LLMs). This second part of their work focuses on scenarios where the covariance matri…

  43. TOOL · CL_30127 ·

    Anthropic's Claude Code /goal command creates self-driving coding agent

    A user explored Anthropic's new Claude Code /goal command, which they found transformed into a self-driving coding agent. This feature appears to be a significant advancement, potentially rendering previous 'Keep Going'…

  44. TOOL · CL_30725 ·

    AnyFlow enables flexible video diffusion model generation

    Researchers have developed AnyFlow, a novel framework for video diffusion models that allows for any number of sampling steps during generation. Unlike previous methods that degrade with more steps, AnyFlow optimizes th…

  45. TOOL · CL_30818 ·

    MILM model uses LLMs for multimodal irregular time series

    Researchers have developed MILM, a Large Language Model designed to process multimodal irregular time series data. This model represents time-series data as XML triplets and employs a two-stage fine-tuning strategy. The…

  46. TOOL · CL_30727 ·

    Compact LLMs fine-tuned for safe, difficulty-controlled children's stories

    Researchers have developed a method to fine-tune compact, 8-billion parameter Large Language Models (LLMs) for generating children's English reading stories. By leveraging an existing curriculum and stories from larger …

  47. RESEARCH · CL_30822 ·

    New sampler improves Flow Language Model quality-diversity tradeoff

    Researchers have introduced a new sampling method for Flow Language Models (FLMs) called marginal-conditioned bridges. This technique adapts continuous flow matching for token sequences, addressing limitations in standa…

  48. TOOL · CL_30732 ·

    Logic-guided fine-tuning boosts weakly supervised segmentation models

    Researchers have developed a novel approach to weakly supervised semantic segmentation by integrating differentiable fuzzy logic with deep learning models. This method allows for the unification of weak annotations and …

  49. TOOL · CL_30768 ·

    New HiPP method boosts propaganda detection with hierarchical prompting

    Researchers have developed a new hierarchical prompting method called HiPP to improve propaganda detection in social media texts. This method involves predicting fine-grained propaganda techniques before aggregating the…

  50. RESEARCH · CL_30733 ·

    LLM pre-training research explores sparse vs. dense and low-rank methods

    Two new research papers explore efficient pre-training methods for large language models. The first paper compares dense and sparse Mixture-of-Experts (MoE) transformer architectures at a small scale, finding that MoE m…