OpenAI, Anthropic, Google, Meta, and Alibaba release new models and agent platforms
ByPulseAugur Editorial·
Summary by gemini-2.5-flash-lite
from 198 sources
Several AI labs have released new open-weight models, including Alibaba's Qwen3.6-27B, which claims to outperform larger models on coding benchmarks, and Xiaomi's MiMo-V2.5 series, featuring enhanced agentic capabilities and multimodality. OpenAI has also open-sourced a privacy filter model for PII detection, targeting infrastructure needs. Additionally, Anthropic has launched Claude Design, a new tool for generating prototypes and presentations powered by Claude Opus 4.7, signaling a move into design tooling.
AI
**Alibaba** released **Qwen3.6-27B**, a dense, Apache 2.0 open coding model with thinking and non-thinking modes, outperforming the larger Qwen3.5-397B-A17B on multiple coding benchmarks including SWE-bench and Terminal-Bench. It supports native vision-language reasoning over ima…
**Moonshot's Kimi K2.6** is a major open-weight **1T-parameter MoE** model featuring **32B active parameters**, **384 experts**, **MLA attention**, **256K context window**, native multimodality, and **INT4 quantization**. It supports day-0 integration with platforms like **vLLM**…
**Anthropic** launched **Claude Design**, a prototyping tool powered by **Claude Opus 4.7**, targeting design workflows and competing with **Figma** and others. Benchmarks show **Opus 4.7** leading in coding and text tasks, with improved efficiency and adaptive reasoning, though …
**OpenAI** expanded its Agents SDK by separating the agent harness from compute/storage, enabling long-running, durable agents with features like file/computer use, skills, memory, and compaction. The harness is now open-source and supports execution via partner sandboxes, foster…
**Harness engineering** is emerging as a key discipline in AI agent development, emphasizing components like filesystems, memory, and retries beyond just models. **OpenAI's Codex** is expanding agentic coding workflows beyond software engineering, including codebase understanding…
**GLM-5.1** has reached **#3 on Code Arena**, surpassing **Gemini 3.1** and **GPT-5.4**, and matching **Claude Sonnet 4.6** in coding performance. **Z.ai** now holds the **#1 open model rank** close to the top overall. The advisor pattern, combining a cheap executor with an expen…
**Anthropic's Mythos** and **OpenAI's** upcoming restricted cyber-capable models are central to recent discussions, with debates on their security realism and evaluation methods. **LangChain's Deep Agents deploy** introduces an open memory, model-agnostic agent harness architectu…
**Meta Superintelligence Labs** launched **Muse Spark**, a natively multimodal reasoning model featuring tool use, visual chain of thought, and multi-agent orchestration. It is live on **meta.ai** and the Meta AI app with a private API preview and plans for open-sourcing future v…
**Google** introduced **Skills in Chrome**, enabling reusable browser workflows with Gemini prompts and a library of ready-made Skills, enhancing end-user agentization. **Tencent** teased **HYWorld 2.0**, an open-source 3D world model generating editable scenes from a single imag…
**Hermes Agent** is gaining attention as a leading open agent stack with features like self-improving skills, persistent memory, and a self-improvement loop. Its new **Manim skill** enables generation of math/technical animations, expanding agent capabilities. The Hermes ecosyste…
**Gemma 4** was launched by **Google** under an **Apache 2.0 license**, marking a significant open-model release focused on **reasoning, agentic workflows, multimodality, and on-device use**. It outperforms models 10x larger and has immediate ecosystem support including **vLLM**,…
**Arcee’s Trinity-Large-Thinking** was released with **open weights under Apache 2.0**, featuring a **400B total / 13B active** model size and strong agentic performance, ranking **#2 on PinchBench**. **Z.ai’s GLM-5V-Turbo** is a **vision coding model** with **native multimodal f…
**Anthropic** introduced **computer use inside Claude Code** for closed-loop verification in a research preview for Pro/Max users, enhancing reliable app iteration. **OpenAI** released a **Codex plugin for Claude Code**, enabling cross-agent composition and signaling a shift towa…
**Anthropic** is reportedly introducing a new AI model tier called **Capybara**, which is larger and more intelligent than **Claude Opus 4.6**, showing improved performance in coding, academic reasoning, and cybersecurity. The model is speculated to be around **10 trillion parame…
**ARC-AGI-3** benchmark introduced by **@arcprize** and **François Chollet** resets the frontier for general agentic reasoning with humans solving 100% of tasks versus under 1% for current models, focusing on zero-preparation generalization and human-like learning efficiency. The…
**Google** launched **Gemini 3.1 Flash Live**, a realtime voice and vision agent model with **2x longer conversation memory**, supporting **70 languages** and **128k context**. **Mistral AI** released **Voxtral TTS**, a low-latency, open-weight text-to-speech model supporting **9…
**Anthropic** advances agent infrastructure with a multi-agent harness emphasizing orchestration and "computer use" for complex software environments. **Figma**, **GitHub**, and **Cursor** launch design canvases with direct AI editing, showcasing tool-calling becoming product-nat…
**Anthropic** introduced **Claude Cowork** and **Claude Code** enabling desktop control of mouse, keyboard, and screen in a **macOS research preview**, expanding agent capabilities beyond APIs and browsers. The agent ecosystem is evolving towards long-running, parallel, tool-rich…
**Cursor's Composer 2**, built on **Kimi K2.5**, sparked discussion over model attribution and licensing, highlighting a shift toward post-trained derivatives of open-source models with domain-specific fine-tuning and reinforcement learning. **Claude Code** is expanding into thir…
**Cursor** launched **Composer 2**, a frontier-class coding model with major cost reductions and strong benchmark scores like **61.3 on CursorBench** and **73.7 on SWE-bench Multilingual**. The model was improved via a **first continued pretraining run** feeding into reinforcemen…
**OpenAI** released **GPT-5.4 mini** and **GPT-5.4 nano**, their most capable small models optimized for coding, multimodal understanding, and subagents, featuring a **400k context window** and over **2x speed** compared to GPT-5 mini. The mini model approaches larger GPT-5.4 per…
**Moonshot's Attention Residuals** paper introduced an input-dependent attention mechanism over prior layers with a **1.25x compute advantage** and less than **2% inference latency overhead**, validated on **Kimi Linear 48B total / 3B active**. The paper sparked debate on novelty…
**MCP tools** remain relevant for deterministic APIs despite ergonomic criticisms, with new **web MCP support in Chrome v146** enabling continuous browsing agents. Persistent memory is emerging as a key differentiator for agents, with IBM improving task completion rates and multi…
**Harnesses, agent infrastructure, and the MCP protocol** are central themes, with emphasis on how **harnesses, sandboxes, filesystem access, skills, memory, and observability** shape agent UI/UX and runtime environments. Despite jokes about MCP's demise, it remains vital in prod…
**NVIDIA’s Nemotron 3 Super** is a **120B parameter / ~12B active** open model featuring a **hybrid Mamba-Transformer / SSM Latent MoE** architecture and **1M context window**, delivering up to **2.2x faster inference than GPT-OSS-120B** in FP4 with strong throughput gains. It su…
**OpenAI** rolled out **GPT-5.4**, achieving tied **#1** on the **Artificial Analysis Intelligence Index** with **Gemini 3.1 Pro Preview** scoring **57** (up from 51 for GPT-5.2 xhigh). GPT-5.4 features a larger **~1.05M token** context window and higher per-token prices ($2.50/$…
**Gemini 3.1 Flash-Lite** is highlighted by **Demis Hassabis** for its speed and cost-efficiency, focusing on latency and cost per capability rather than raw performance. **NotebookLM Studio** introduces a new feature for generating immersive cinematic video overviews. Rumors abo…
**Google DeepMind** launched **Gemini 3.1 Flash-Lite**, emphasizing *dynamic thinking levels* for adjustable compute, with notable metrics like **$0.25/M input**, **$1.50/M output**, **1432 Elo on LMArena**, and **2.5× faster time-to-first-token** than Gemini 2.5 Flash. It suppor…
**Alibaba** released the **Qwen 3.5** series with models ranging from **0.8B to 9B** parameters, featuring **native multimodality**, **scaled reinforcement learning**, and targeting **edge and lightweight agent** deployments. The models support very long context windows up to **2…
**Gemini 3.1 Pro** demonstrates strong retrieval capabilities and cost efficiency compared to **GPT-5.2** and **Opus 4.6**, though users report tooling and UI issues. The **SWE-bench Verified** evaluation methodology is under scrutiny for consistency, with updates bringing result…
**Anthropic** released **Claude Opus/Sonnet 4.6**, showing a significant intelligence index jump but with increased token usage and cost. **Anthropic** also shared insights on AI agent autonomy, highlighting human-in-the-loop prevalence and software engineering tool calls. **Alib…
**OpenAI** launched **GPT-5.3-Codex** with a Super Bowl ad emphasizing "You can just build things" as a product strategy, focusing on builder tooling over chat interfaces. The model is rolling out across **Cursor, VS Code, and GitHub** with phased API access and is flagged as the…
**AI News** for early February 2026 highlights a detailed comparison between **GPT-5.3-Codex** and **Claude Opus 4.6**, with users noting **Codex's** strength in detailed scoped tasks and **Opus's** ergonomic advantage for exploratory work. Benchmarks on Karpathy's **nanochat GPT…
**AI News for 1/27/2026-1/28/2026** highlights a quiet day with deep dives into frontier model "personality split" where **GPT-5.2** excels at *exploration* and **Claude Opus 4.5** at *exploitation*, suggesting **OpenAI** suits research workflows and **Anthropic** commercial reli…
**Anthropic** launches "Claude in Excel Pro" with enhanced features. **OpenAI** reveals upcoming **Codex** agent loop and cybersecurity measures. **Google** boosts **Gemini App** quotas and partners with **Sakana AI** for advanced AI Scientist projects in Japan. **Cursor** introd…
**X Engineering** open-sourced its new transformer-based recommender algorithm, sparking community debate on transparency and fairness. **GLM-4.7-Flash (30B-A3B)** gains momentum as a strong local inference model with efficient KV-cache management and quantization tuning strategi…
**AI News for 1/16/2026-1/19/2026** covers new architectures for scaling Transformer memory and context, including **STEM** from **Carnegie Mellon** and **Meta AI**, which replaces part of the FFN with a token-indexed embedding lookup enabling CPU offload and asynchronous prefetc…
**OpenAI** launched **GPT-5.2-Codex** API, touted as their strongest coding model for long-running tasks and cybersecurity. **Cursor** integrated GPT-5.2-Codex to autonomously run a browser for a week, producing over 3 million lines of Rust code. **GitHub** incorporated it into t…
**Anthropic** tightens usage policies for **Claude Max** in third-party apps, prompting builders to adopt **model-agnostic orchestration** and **BYO-key** defaults to mitigate platform risks. The **Model Context Protocol (MCP)** is evolving into a key tooling plane with **OpenAI …
**AI News for 1/6/2026-1/7/2026** highlights a quiet day with key updates on **LangChain DeepAgents** introducing **Ralph Mode** for persistent agent loops, **Cursor** improving context management by reducing token usage by **46.9%**, and operational safety measures for coding ag…
**AI News** from early January 2026 highlights a viral economic prediction about **Vietnam** surpassing Thailand, **Microsoft**'s reported open-sourcing of **bitnet.cpp** for 1-bit CPU inference promising speed and energy gains, and a new research partnership between **Google Dee…
**DeepSeek** released a new paper on **mHC: Manifold-Constrained Hyper-Connections**, advancing residual-path design as a key scaling lever in neural networks. Their approach constrains residual mixing matrices to the **Birkhoff polytope** to improve stability and performance, wi…
**South Korea's Ministry of Science** launched a coordinated program with **5 companies** to develop sovereign foundation models from scratch, featuring large-scale MoE architectures like **SK Telecom A.X-K1 (519B total / 33B active)** and **LG K-EXAONE (236B MoE / 23B active)**,…
**Z.ai (GLM family) IPO in Hong Kong on Jan 8, 2026**, aiming to raise **$560M** at **HK$4.35B**, marking it as the "first AI-native LLM company" public listing. The IPO highlights **GLM-4.7** as a starting point. **Meta AI** acquired **Manus** for approximately **$4–5B**, with M…
**MiniMax M2.1** launches as an **open-source** agent and coding Mixture-of-Experts (MoE) model with **~10B active / ~230B total parameters**, claiming to outperform **Gemini 3 Pro** and **Claude Sonnet 4.5**, and supports local inference including on **Apple Silicon M3 Ultra** w…
**GLM-4.7** and **MiniMax M2.1** open-weight model releases highlight day-0 ecosystem support, coding throughput, and agent workflows, with GLM-4.7 achieving a +9.5% improvement over GLM-4.6 and MiniMax M2.1 positioned as an OSS Claude-like MoE model with 230B total parameters an…
**Zhipu AI's GLM-4.7** release marks a significant improvement in **coding, complex reasoning, and tool use**, quickly gaining ecosystem adoption via Hugging Face and OpenRouter. **Xiaomi's MiMo-V2-Flash** is highlighted as a practical, cost-efficient mixture-of-experts model opt…
**Alibaba** released **Qwen-Image-Layered**, an open-source model enabling Photoshop-grade layered image decomposition with recursive infinite layers and prompt-controlled structure. **Kling 2.6** introduced advanced motion control for image-to-video workflows, supported by a cre…
**GPT-5.2** shows mixed performance in public evaluations, excelling in agentic tasks but at a significantly higher cost (~**$620/run**) compared to **Opus 4.5** and **GPT-5.1**. It performs variably on reasoning and coding benchmarks, with some improvements on long-context tasks…
**NousResearch's Nomos 1** is a 30B open math model achieving a top Putnam score with only ~3B active parameters, enabling consumer Mac inference. **AxiomProver** also posts top Putnam results using ThinkyMachines' RL stack. **Mistral's Devstral 2 Small** outperforms DeepSeek v3.…
**Claude Code Skills** gains attention with a published talk and Hugging Face's new "skill" enabling one-line fine-tuning pipelines for models from ~0.5B to 70B parameters, supporting SFT, DPO, and GRPO, costing as low as ~$0.30 for small runs. **Zhipu AI** launches multimodal mo…
**vLLM 0.12.0** introduces DeepSeek support, GPU Model Runner V2, and quantization improvements with PyTorch 2.9.0 and CUDA 12.9. **NVIDIA** launches CUDA Tile IR and cuTile Python for advanced GPU tensor operations targeting Blackwell GPUs. **Hugging Face** releases Transformers…
**OpenAI's Code Red response** and **Anthropic's IPO** are major highlights. In AI video and imaging, **Kling 2.6** introduces native audio co-generation with coherent lip-sync, partnered with platforms like **ElevenLabs** and **OpenArt**. **Runway Gen-4.5** enhances lighting fid…
**Anthropic** introduces durable agents and MCP tasks for long-running workflows, with practical engineering patterns and integrations like Prefect. **Booking.com** deploys a large-scale agent system improving customer satisfaction using LangGraph, Kubernetes, GPT-4 Mini, and Wea…
**OpenAI** launched **GPT-5.1** featuring "adaptive reasoning" and developer-focused API improvements, including prompt caching and a reasoning_effort toggle for latency/cost tradeoffs. Independent analysis shows a minor intelligence bump with significant gains in agentic coding …
**GPT-5** leads Sudoku-Bench solving 33% of puzzles but 67% remain unsolved, highlighting challenges in meta-reasoning and spatial logic. New training methods like **GRPO fine-tuning** and "Thought Cloning" show limited success. Research on "looped LLMs" suggests pretrained model…
**Moonshot AI's Kimi K2 Thinking** AMA revealed a hybrid attention stack using **KDA + NoPE MLA** outperforming full MLA + RoPE, with the **Muon optimizer** scaling to ~1T parameters and native **INT4** QAT for cost-efficient inference. K2 Thinking ranks highly on **LisanBench** …
**Kimi-K2 Reasoner** has been integrated into **vLLM** and will soon be supported by **SGLang**, featuring a massive **1.2 trillion parameter MoE** configuration. **Perplexity AI** released research on cloud-portable trillion-parameter MoE kernels optimized for **AWS EFA**, with …
**Google's Project Suncatcher** prototypes scalable ML compute systems in orbit using solar energy with Trillium-generation TPUs surviving radiation, aiming for prototype satellites by 2027. **China's 50% electricity subsidies** for datacenters may offset chip efficiency gaps, wi…
**OpenAI** and **AWS** announced a strategic partnership involving a $38B compute deal to deploy hundreds of thousands of NVIDIA GB200 and GB300 chips, while **Microsoft** secured a license to ship NVIDIA GPUs to the UAE with a planned $7.9B datacenter investment. A 3-month NVFP4…
**Poolside** raised **$1B** at a **$12B valuation**. **Eric Zelikman** raised **$1B** after leaving **Xai**. **Weavy** joined **Figma**. New research highlights **FP16** precision reduces training-inference mismatch in **reinforcement-learning** fine-tuning compared to **BF16**. …
**Moonshot AI** released **Kimi Linear (KDA)** with day-0 infrastructure and strong long-context metrics, achieving up to **75% KV cache reduction** and **6x decoding throughput**. **MiniMax M2** pivoted to full attention for multi-hop reasoning, maintaining strong agentic coding…
**vLLM** announced support for **NVIDIA Nemotron Nano 2**, featuring a hybrid Transformer–Mamba design and tunable "thinking budget" enabling up to 6× faster token generation. **Mistral AI Studio** launched a production platform for agents with deep observability. **Baseten** rep…
**LangSmith** launched the **Insights Agent** with multi-turn evaluation for agent ops and observability, improving failure detection and user intent clustering. **Meta PyTorch** and **Hugging Face** introduced **OpenEnv**, a Gymnasium-style API and hub for reproducible agentic e…
**LangChain & LangGraph 1.0** released with major updates for reliable, controllable agents and unified docs, emphasizing "Agent Engineering." **Meta** introduced **PyTorch Monarch** and **TorchForge** for distributed programming and reinforcement learning, enabling large-scale a…
**Alibaba** released compact dense **Qwen3-VL** models at 4B and 8B sizes with FP8 options, supporting up to 1M context and open vocabulary detection, rivaling larger models like **Qwen2.5-VL-72B**. Ecosystem support includes **MLX-VLM**, **LM Studio**, **vLLM**, **Kaggle models*…
**FrontierMath Tier 4** results show **GPT-5 Pro** narrowly outperforming **Gemini 2.5 Deep Think** in reasoning accuracy, with concerns about problem leakage clarified by **Epoch AI Research**. **Mila** and **Microsoft** propose **Markovian Thinking** to improve reasoning effici…
**Samsung's 7M Tiny Recursive Model (TRM)** achieves superior reasoning on ARC-AGI and Sudoku with fewer layers and MLP replacing self-attention. **LeCun's team** introduces **JEPA-SCORE**, enabling density estimation from encoders without retraining. **AI21 Labs** releases **Jam…
**Kling 2.5 Turbo** leads in text-to-video and image-to-video generation with competitive pricing. **OpenAI Sora 2** shows strong instruction-following but has physics inconsistencies. **Google Gemini 2.5 Flash** "Nano Banana" image generation is now generally available with mult…
**Google** released a dense September update including **Gemini Robotics 1.5** with enhanced spatial/temporal reasoning, **Gemini Live**, **EmbeddingGemma**, and **Veo 3 GA** powering creative workflows. They also introduced agentic features like restaurant-reservation agents and…
**Alibaba** unveiled the **Qwen3** model family including **Qwen3-Max** and **Qwen3-VL** with a native 256K context window expandable to 1M, strong OCR in 32 languages, and rapid release velocity (~3.5 releases/month) backed by a $52B infrastructure roadmap. **OpenAI** launched *…
**Anthropic** published an in-depth postmortem on their August-September reliability issues. **OpenAI**'s GPTeam achieved a perfect 12/12 score at the **ICPC 2025** World Finals, showcasing rapid progress in general-purpose reasoning and introducing controllable "thinking time" t…
**GPT-5 Codex** rollout shows strong agentic coding capabilities with some token bloat issues. IDEs like **VS Code Insiders** and **Cursor 1.6** enhance context windows and model integration. **vLLM 0.10.2** supports aarch64 and NVIDIA GB200 with performance improvements. **AMD R…
**Meta** released **MobileLLM-R1**, a sub-1B parameter reasoning model family on Hugging Face with strong small-model math accuracy, trained on 4.2T tokens. **Alibaba** introduced **Qwen3-Next-80B-A3B** with hybrid attention, 256k context window, and improved long-horizon memory,…
**Cognition** raised **$400M** at a **$10.2B** valuation to advance AI coding agents, with **swyx** joining to support the "Decade of Agents" thesis. **Vercel** launched an OSS "vibe coding platform" using a tuned **GPT-5** agent loop. **Claude Code** emphasizes minimalism in age…
**Google DeepMind** released **EmbeddingGemma (308M)**, a small multilingual embedding model optimized for on-device retrieval-augmented generation and semantic search, supporting over 100 languages and running efficiently with quantization and EdgeTPU latency under 15ms. **Jina …
**Exa** raised a **$700m Series B**, **OpenPipe** was acquired by **Coreweave**, and **Statsig** and **Alex** were acquired by **OpenAI**. The **Agent/Client Protocol (ACP)** was introduced by the **Zed** team to standardize IDE-agent interoperability, supporting **Claude Code** …
**OpenAI** integrates **GPT-5** into Xcode 26 with improved coding latency, though some UX trade-offs are noted. **xAI's Grok Code Fast 1** gains momentum, surpassing **Claude Sonnet** in usage and praised for fast debugging. **Zhipu's GLM-4.5** offers a cost-effective coding pla…
**Apple** released three real-time vision-language models (**FastVLM**, **MobileCLIP2**) on Hugging Face with significant speed and size improvements, supporting WebGPU and Core ML. Their MLX framework now supports **MXFP4** format, competing with **NVFP4** for FP4 quantization. …
**xAI** released open weights for **Grok-2** and **Grok-2.5** with a novel MoE residual architecture and μP scaling, sparking community excitement and licensing concerns. **Microsoft** open-sourced **VibeVoice-1.5B**, a multi-speaker long-form TTS model with streaming support and…
**DeepMind** released **Genie 3**, an interactive multimodal world simulator with advanced spatial memory and real-time avatar control, and **SIMA**, an embodied training agent operating inside generated worlds. **Alibaba** introduced **Qwen-Image-Edit**, an open-weights image ed…
**Gemma 3 270M**, an ultra-small model optimized for edge and mobile use, was released and is gaining adoption. **NVIDIA** launched two open multilingual ASR models, **Canary 1B** and **Parakeet-TDT 0.6B**, trained on 1 million hours of data with CC-BY licensing, plus the efficie…
**OpenAI** rolled out **GPT-5** as the default in ChatGPT with new modes and a "warmer" personality, plus expanded message limits for Plus/Team users and Enterprise/Edu access. Performance rankings show **gpt-5-high** leading, with smaller variants also ranked, though critiques n…
**OpenAI** continues small updates to **GPT-5**, introducing "Auto/Fast/Thinking" modes with **196k token context**, **3,000 messages/week**, and dynamic routing to cheaper models for cost efficiency. The **MiniMax AI Agent Challenge** offers **$150,000** in prizes for AI agent d…
**OpenAI** released the **GPT-5** series including **GPT-5-mini** and **GPT-5-nano**, with mixed user feedback on performance and API behavior. **Anthropic** extended **Claude Sonnet 4** context window to **1 million tokens**, a 5x increase, enhancing large document processing. *…
**OpenAI** launched **GPT-5** with a unified user experience removing manual model selection, causing initial routing and access issues for Plus users that are being addressed with fixes including restored model options and increased usage limits. **GPT-5** introduces "Priority P…
**OpenAI** released its first open models since GPT-2, **gpt-oss-120b** and **gpt-oss-20b**, which quickly trended on **Hugging Face**. **Microsoft** supports these models via **Azure AI Foundry** and **Windows Foundry Local**. Key architectural innovations include **sliding wind…
**Chinese AI labs** have released powerful open-source models like **GLM-4.5** and **GLM-4.5-Air** from **Zhipu AI**, **Qwen3 Coder** and **Qwen3-235B** from **Alibaba**, and **Kimi K2** from **Moonshot AI**, highlighting a surge in permissively licensed models. **Zhipu AI's GLM-…
**Chinese labs** have released a wave of powerful, permissively licensed models in July, including **Zhipu AI's GLM-4.5** and **GLM-4.5-Air**, **Alibaba's Qwen3 Coder** and **Qwen3-235B**, and **Moonshot AI's Kimi K2**. These models feature large-scale Mixture of Experts architec…
**OpenAI** has fully rolled out its ChatGPT agent to all Plus, Pro, and Team users and is building hype for the upcoming **GPT-5**, which reportedly outperforms **Grok-4** and can build a cookie clicker game in two minutes. **Alibaba's Qwen** team released the open-source reasoni…
**Alibaba** announced the release of **Qwen3-Coder-480B-A35B-Instruct**, an open agentic code model with **480B** parameters and **256K** context length, praised for rapid development and strong coding performance. Benchmark claims of **41.8% on ARC-AGI-1** faced skepticism from …
**Moonshot AI** released the **Kimi K2**, a 1-trillion parameter ultra-sparse Mixture-of-Experts (MoE) model with the **MuonClip** optimizer and a large-scale agentic data pipeline using over **20,000 tools**. Shortly after, **Alibaba** updated its **Qwen3** model with the **Qwen…
**Mistral** released **Voxtral**, claimed as the world's best open speech recognition models, available via API and Hugging Face. **Moonshot AI** launched **Kimi K2**, a trillion-parameter **Mixture-of-Experts (MoE)** model, outperforming **GPT-4.1** on benchmarks with 65.4% on S…
**Cognition** is acquiring the remaining assets of **Windsurf** after a significant weekend deal. **Moonshot AI** released **Kimi K2**, an open-source, MIT-licensed agentic model with **1 Trillion total / 32B active parameters** using a Mixture-of-Experts architecture, trained on…
**LangChain** is nearing unicorn status, while **OpenAI** and **Google DeepMind's Gemini 3 Pro** models are launching soon. **Perplexity** rolls out its agentic browser **Comet** to waitlists, offering multitasking and voice command features. **xAI's Grok-4** update sparked contr…
Over the holiday weekend, key AI developments include the upcoming release of **Grok 4**, **Perplexity** teasing new projects, and community reactions to **Cursor** and **Dia**. Research highlights feature a paper on **Reinforcement Learning (RL)** improving generalization and re…
**Ilya Sutskever** confirmed his role as CEO of **Safe Superintelligence Inc. (SSI)** with **Daniel Levy** as President, dismissing acquisition rumors and emphasizing their strong team and compute resources. **Perplexity AI** expanded its data integrations by adding **Morningstar…
**Meta** has hired **Scale AI CEO Alexandr Wang** as its new **Chief AI Officer**, acquiring a **49% non-voting stake** in **Scale AI** for **$14.3 billion**, doubling its valuation to **~$28 billion**. This move is part of a major talent shuffle involving **Meta**, **OpenAI**, a…
**Meta** makes a major AI move by hiring **Scale AI** founder **Alexandr Wang** as Chief AI Officer and acquiring a 49% non-voting stake in **Scale AI** for **$14.3 billion**, doubling its valuation to about **$28 billion**. **Chai Discovery** announces **Chai-2**, a breakthrough…
**Meta** has poached top AI talent from **OpenAI**, including **Alexandr Wang** joining as Chief AI Officer to work towards superintelligence, signaling a strong push for the next **Llama** model. The AI job market shows polarization with high demand and compensation for top-tier…
**Google** released **Gemma 3n**, a multimodal model for edge devices available in **2B and 4B** parameter versions, with support across major frameworks like **Transformers** and **Llama.cpp**. **Tencent** open-sourced **Hunyuan-A13B**, a **Mixture-of-Experts (MoE)** model with …
**Bytedance** showcased an impressive state-of-the-art video generation model called **Seedance 1.0** without releasing it, while **Morph Labs** announced **Trinity**, an autoformalization system for Lean. **Huggingface Transformers** deprecated Tensorflow/JAX support. **Andrew N…
**China's Xiaohongshu (Rednote) released dots.llm1**, a **142B parameter open-source Mixture-of-Experts (MoE) language model** with **14B active parameters** and a **32K context window**, pretrained on **11.2 trillion high-quality, non-synthetic tokens**. The model supports effic…
**OpenAI** rolled out **Codex** to ChatGPT Plus users with internet access and fine-grained controls, improving memory features for free users. **Anthropic's Claude 4 Opus and Sonnet** models lead coding benchmarks, while **Google's Gemini 2.5 Pro and Flash** models gain recognit…
**DeepSeek R1-0528** release brings major improvements in reasoning, hallucination reduction, JSON output, and function calling, matching or surpassing closed models like **OpenAI o3** and **Gemini 2.5 Pro** on benchmarks such as **Artificial Analysis Intelligence Index**, **Live…
**DeepSeek R1 v2** model released with availability on Hugging Face and inference partners. The **Gemma model family** continues prolific development including **PaliGemma 2**, **Gemma 3**, and others. **Claude 4** and its variants like **Opus 4** and **Claude Sonnet 4** show top…
**OpenAI** plans to evolve **ChatGPT** into a **super-assistant** by 2025 with models like **o3** and **o4** enabling agentic tasks and supporting a billion users. Recent multimodal and reasoning model releases include ByteDance's **BAGEL-7B**, Google's **MedGemma**, and NVIDIA's…
**Anthropic's Claude 4 models (Opus 4, Sonnet 4)** demonstrate strong coding abilities, with Sonnet 4 achieving **72.7%** on SWE-bench and Opus 4 at **72.5%**. Claude Sonnet 4 excels in codebase understanding and is considered **SOTA on large codebases**. Criticism arose over Ant…
**Meta** released **KernelLLM 8B**, outperforming **GPT-4o** and **DeepSeek V3** on KernelBench-Triton Level 1. **Mistral Medium 3** debuted strongly in multiple benchmarks. **Qwen3** models introduced a unified framework with multilingual support. **DeepSeek-V3** features hardwa…
**Tencent's Hunyuan-Turbos** has risen to #8 on the LMArena leaderboard, showing strong performance across major categories and significant improvement since February. The **Qwen3 model family**, especially the **Qwen3 235B-A22B (Reasoning)** model, is noted for its intelligence …
**Gemini 2.5 Flash** shows a **12 point increase** in the Artificial Analysis Intelligence Index but costs **150x more** than Gemini 2.0 Flash due to **9x more expensive output tokens** and **17x higher token usage** during reasoning. **Mistral Medium 3** competes with **Llama 4 …
**Qwen model family** released quantized versions of Qwen3 models including **14B**, **32B**, and **235B** parameters, with promising coding capabilities in Qwen3-235B. **Microsoft** launched **Phi-4-reasoning**, a **14B** parameter model distilled from OpenAI's o3-mini, emphasiz…
**Microsoft** released **Phi-reasoning 4**, a finetuned 14B reasoning model slightly behind QwQ but limited by data transparency and token efficiency issues. **Anthropic** introduced remote MCP server support and a 45-minute Research mode in **Claude**. **Cursor** published a mod…
AI news for April 23-24, 2025, covering new model releases, benchmarks, and research developments from companies like openai, google deepmind, anthropic, and epoch ai research.
**Nemotron-H** model family introduces hybrid Mamba-Transformer models with up to **3x faster inference** and variants including **8B**, **56B**, and a compressed **47B** model. **Nvidia Eagle 2.5** is a frontier VLM for long-context multimodal learning, matching **GPT-4o** and *…
The AI news recap highlights independent evaluations showing **Grok-3** outperforming models like **GPT-4.5** and **Claude 3.7 Sonnet** on reasoning benchmarks, while **Grok-3 mini** excels in reasoning tasks. Research on **reinforcement learning (RL)** fine-tuning reveals potent…
**OpenAI** teased a *Memory update in ChatGPT* with limited technical details. Evidence suggests upcoming releases of **o3** and **o4-mini** models, alongside a press leak about **GPT-4.1**. **X.ai** launched the **Grok 3** and **Grok 3 mini** APIs, confirmed as **o1** level mode…
**OpenAI** announced that **o3** and **o4-mini** models will be released soon, with **GPT-5** expected in a few months, delayed for quality improvements and capacity planning. **DeepSeek** introduced **Self-Principled Critique Tuning (SPCT)** to enhance inference-time scalability…
**Gemini 2.5 Pro** shows strengths and weaknesses, notably lacking LaTex math rendering unlike **ChatGPT**, and scored **24.4%** on the **2025 US AMO**. **DeepSeek V3** ranks 8th and 12th on recent leaderboards. **Qwen 2.5** models have been integrated into the **PocketPal** app.…
**OpenAI** plans to release its first open-weight language model since **GPT-2** in the coming months, signaling a move towards more open AI development. **DeepSeek** launched its open-source **R1 model** earlier this year, challenging perceptions of China's AI progress. **Gemma …
**GPT-4o** was praised for its improved coding, instruction following, and freedom, becoming the leading non-reasoning coding model surpassing **DeepSeek V3** and **Claude 3.7 Sonnet** in coding benchmarks, though it still lags behind reasoning models like **o3-mini**. Concerns a…
**OpenAI** announced the new **GPT-4o** model with enhanced instruction-following, complex problem-solving, and native image generation capabilities. The model shows improved performance in math, coding, and creativity, with features like transparent background image generation. …
At Nvidia GTC Day 1, several AI updates were highlighted: **Google's Gemini 2.0 Flash** introduces image input/output but is not recommended for text-to-image tasks, with **Imagen 3** preferred for that. **Mistral AI** released **Mistral Small 3.1** with 128k token context window…
**Google DeepMind** announced updates to **Gemini 2.0**, including an upgraded **Flash Thinking model** with stronger reasoning and native image generation capabilities. **Cohere** launched **Command A**, a **111B** parameter dense model with a **256K context window** and competi…
**DeepSeek R1** demonstrates significant efficiency using **FP8** precision, outperforming **Gemma 3 27B** in benchmarks with a **Chatbot Arena Elo Score** of **1363** vs. **1338**, requiring substantial hardware like **32 H100 GPUs** and **2,560GB VRAM**. **OpenAI** labels **Dee…
The AI news recap highlights several key developments: **nanoMoE**, a PyTorch implementation of a mid-sized Mixture-of-Experts (MoE) model inspired by Andrej Karpathy's nanoGPT, enables pretraining on commodity hardware within a week. An agentic leaderboard ranks LLMs powering **…
**AI21 Labs launched Jamba 1.6**, touted as the **best open model for private enterprise deployment**, outperforming **Cohere, Mistral, and Llama** on benchmarks like **Arena Hard**. **Mistral AI** released a state-of-the-art **multimodal OCR model** with multilingual and structu…
**Weights and Biases** announced a **$1.7 billion acquisition by CoreWeave** ahead of CoreWeave's IPO. **CohereForAI** released the **Aya Vision models (8B and 32B parameters)** supporting **23 languages**, outperforming larger models like **Llama-3.2 90B Vision** and **Molmo 72B…
**GPT-4.5** sparked mixed reactions on Twitter, with **@karpathy** noting users preferred **GPT-4** in a poll despite his personal favor for GPT-4.5's creativity and humor. Critics like **@abacaj** highlighted **GPT-4.5's slowness** and questioned its practical value and pricing …
**Claude 3.7 Sonnet** demonstrates exceptional coding and reasoning capabilities, outperforming models like **DeepSeek R1**, **O3-mini**, and **GPT-4o** on benchmarks such as **SciCode** and **LiveCodeBench**. It is available on platforms including **Perplexity Pro**, **Anthropic…
**Grok-3**, a new family of LLMs from **xAI** using **200,000 Nvidia H100 GPUs** for advanced reasoning, outperforms models from **Google, Anthropic, and OpenAI** on math, science, and coding benchmarks. **DeepSeek-R1** from **ByteDance Research** achieves top accuracy on the cha…
**Smolagents** library by **Huggingface** continues trending. **ChatGPT-4o** latest version "chatgpt-40-latest-20250129" released. **DeepSeek R1 671B** sets speed record at **198 t/s**, fastest reasoning model, recommended with specific prompt settings. **Perplexity Deep Research…
**Zyphra AI** launched **Zonos-v0.1**, a leading open-weight text-to-speech model supporting multiple languages and zero-shot voice cloning. **Meta FAIR** released the open-source **Audiobox Aesthetics** model trained on 562 hours of audio data. **Kyutai Labs** introduced **Moshi…
**Google** released **Gemini 2.0 Flash Thinking Experimental 1-21**, a vision-language reasoning model with a **1 million-token context window** and improved accuracy on science, math, and multimedia benchmarks, surpassing **DeepSeek-R1** but trailing **OpenAI's o1**. **ZyphraAI*…
**DeepSeek-R1 surpasses OpenAI in GitHub stars**, marking a milestone in open-source AI with rapid growth in community interest. **AlphaGeometry2 achieves gold-medalist level performance with an 84% solving rate on IMO geometry problems**, showcasing significant advancements in A…
**DeepSeek-R1 and DeepSeek-V3** models have made significant advancements, trained on an **instruction-tuning dataset of 1.5M samples** with **600,000 reasoning** and **200,000 non-reasoning SFT data**. The models demonstrate strong **performance benchmarks** and are deployed on-…
**Huawei chips** are highlighted in a diverse AI news roundup covering **NVIDIA's** stock rebound, new open music foundation models like **Local Suno**, and competitive AI models such as **Qwen 2.5 Max** and **Deepseek V3**. The release of **DeepSeek Janus Pro**, a multimodal LLM…
**DeepSeek-V3**, a **671 billion parameter mixture-of-experts model**, surpasses **Llama 3.1 405B** and **GPT-4o** in coding and math benchmarks. **OpenAI** announced the upcoming release of **GPT-5** on **April 27, 2023**. **MiniMax-01 Coder mode** in **ai-gradio** enables build…
**Harvey** secured a new **$300M funding round**. **OuteTTS 0.3 1B & 500M** text-to-speech models were released featuring **zero-shot voice cloning**, **multilingual support** (en, jp, ko, zh, fr, de), and **emotion control**, powered by **OLMo-1B** and **Qwen 2.5 0.5B**. The **H…
**Helium-1 Preview** by **kyutai_labs** is a **2B-parameter multilingual base LLM** outperforming **Qwen 2.5**, trained on **2.5T tokens** with a **4096 context size** using token-level distillation from a **7B model**. **Phi-4 (4-bit)** was released in **lmstudio** on an **M4 ma…
**rStar-Math** surpasses **OpenAI's o1-preview** in math reasoning with **90.0% accuracy** using a **7B LLM** and **MCTS** with a **Process Reward Model**. **Alibaba** launches **Qwen Chat** featuring **Qwen2.5-Plus** and **Qwen2.5-Coder-32B-Instruct** models enhancing vision-lan…
**Sebastien Bubeck** introduced **REINFORCE++**, enhancing classical REINFORCE with **PPO-inspired techniques** for **30% faster training**. **AI21 Labs** released **Phi-4** under the **MIT License**, accessible via **Ollama**. **François Chollet** announced plans for **ARC-AGI-2…
**NVIDIA** has launched **Cosmos**, an open-source video world model trained on **20 million hours of video**, aimed at advancing **robotics** and **autonomous driving**. The release sparked debate over its open-source status and technical approach. Additionally, **NVIDIA** annou…
**Olmo 2** released a detailed tech report showcasing full pre, mid, and post-training details for a frontier fully open model. **PRIME**, an open-source reasoning solution, achieved **26.7% pass@1**, surpassing **GPT-4o** in benchmarks. Performance improvements include **Qwen 32…
**Sam Altman** publicly criticizes **DeepSeek** and **Qwen** models, sparking debate about **OpenAI**'s innovation claims and reliance on foundational research like the **Transformer architecture**. **Deepseek V3** shows significant overfitting issues in the **Misguided Attention…
**ChatGPT**, **Sora**, and the **OpenAI API** experienced a >5 hour outage but are now restored. Updates to **vLLM** enable **DeepSeek-V3** to run with enhanced **parallelism** and **CPU offloading**, improving **model deployment flexibility**. Discussions on **gradient descent**…
The **Qwen team** launched **QVQ**, a vision-enabled version of their experimental **QwQ o1 clone**, benchmarking comparably to **Claude 3.5 Sonnet**. Discussions include **Bret Taylor's** insights on autonomous software development distinct from the Copilot era. The **Latent Spa…
**o3** model gains significant attention with discussions around its capabilities and implications, including an OpenAI board member referencing "AGI." **LangChain** released their **State of AI 2024** survey. **Hume** announced **OCTAVE**, a **3B parameter** API-only speech-lang…
**OpenAI** announced their "12 Days of OpenAI" event with daily livestreams and potential releases including the **O1 full model**, **Sora video model**, and **GPT-4.5**. **Google DeepMind** released the **GenCast weather model** capable of **15-day forecasts in 8 minutes** using…
**AI News for 11/29/2024-12/2/2024** highlights several developments: **Nvidia** introduced **Puzzle**, a distillation-based neural architecture search for inference-optimized large language models, enhancing efficiency. The **IC-Light V2** model was released for varied illuminat…
**AI News for 11/29/2024-11/30/2024** covers key updates including the **Gemini multimodal model** advancing in musical structure understanding, a new **quantized SWE-Bench** for benchmarking at **1.3 bits per task**, and the launch of the **DeepSeek-R1 model** focusing on transp…
This week in AI news, **Anthropic** launched **Claude Sonnet 3.5**, enabling desktop app control via natural language. **Microsoft** introduced **Magentic-One**, a multi-agent system built on the **AutoGen framework**. **OpenCoder** was unveiled as an AI-powered code cookbook for…
This week in AI news highlights **Ollama 0.4** supporting **Meta's Llama 3.2 Vision** models (11B and 90B), with applications like handwriting recognition. **Self-Consistency Preference Optimization (ScPO)** was introduced to improve model consistency without human labels. Discus…
**Grok Beta** surpasses **Llama 3.1 70B** in intelligence but is less competitive due to its pricing at **$5/1M input tokens** and **$15/1M output tokens**. **Defense Llama**, developed with **Meta AI** and **Scale AI**, targets American national security applications. **SWE-Kit*…
**ChatGPT Search** was launched by **Sam Altman**, who called it his favorite feature since ChatGPT's original launch, doubling his usage. Comparisons were made between ChatGPT Search and **Perplexity** with improvements noted in Perplexity's web navigation. **Google** introduced…
**Moondream**, a **1.6b vision language model**, secured seed funding, highlighting a trend in moon-themed tiny models alongside **Moonshine** (27-61m ASR model). **Claude 3.5 Sonnet** was used for AI Twitter recaps. Discussions included **pattern recognition** vs. **intelligence…
**Liquid AI** held a launch event introducing new foundation models. **Anthropic** shared follow-up research on social bias and feature steering with their "Golden Gate Claude" feature. **Cohere** released multimodal Embed 3 embeddings models following Aya Expanse. There was misi…
**Anthropic** released upgraded **Claude 3.5 Sonnet** and **Claude 3.5 Haiku** models featuring a new **computer use capability** that allows interaction with computer interfaces via screenshots and actions like mouse movement and typing. The **Claude 3.5 Sonnet** achieved state-…
**Answer.ai** launched **fastdata**, a synthetic data generation library using "claudette" and Tencent's Billion Persona paper. **NotebookLM** became customizable, and **Motherduck** introduced notable LLMs in SQL implementations. **Perplexity** and **Dropbox** announced competit…
**Vertical SaaS agents** are gaining rapid consensus as the future of AI applications, highlighted by **Decagon's $100m funding** and **Sierra's $4b round**. **OpenAI alumni** are actively raising venture capital and forming new startups, intensifying competition in the AI market…
**Rhymes AI** released **Aria**, a new **25.3B** parameter multimodal MoE model supporting text, code, image, and video with a **64k token context window** and Apache-2.0 license. **OpenAI**'s **o1-preview** and **o1-mini** models show consistent improvement over **Anthropic** an…
**Geoffrey Hinton** and **John Hopfield** won the **Nobel Prize in Physics** for foundational work on neural networks linking AI and physics. **Meta AI** introduced a **13B parameter audio generation model** as part of Meta Movie Gen for video-synced audio. **Anthropic** launched…
**OpenAI** announced raising **$6.6B** in new funding at a **$157B valuation**, with ChatGPT reaching *250M weekly active users*. **Poolside** raised **$500M** to advance AGI development. **LiquidAI** introduced three new MoE models (1B, 3B, 40B) with a **32k context window** and…
**Meta** released **Llama 3.2**, including lightweight 1B and 3B models for on-device AI with capabilities like summarization and retrieval-augmented generation. **Molmo**, a new multimodal model, was introduced with a large dense captioning dataset. **Google DeepMind** announced…
**Meta AI** released **Llama 3.2** models including **1B, 3B text-only** and **11B, 90B vision** variants with **128K token context length** and adapter layers for image-text integration. These models outperform competitors like **Gemma 2** and **Phi 3.5-mini**, and are supported…
**Anthropic** introduced a RAG technique called Contextual Retrieval that reduces retrieval failure rates by 67% using prompt caching. **Meta** is teasing multimodal **Llama 3** ahead of Meta Connect. **OpenAI** is hiring for a multi-agent research team focusing on improved AI re…
**OpenAI's o1-preview and o1-mini models** lead benchmarks in Math, Hard Prompts, and Coding. **Qwen 2.5 72B** model shows strong performance close to **GPT-4o**. **DeepSeek-V2.5** tops Chinese LLMs, rivaling **GPT-4-Turbo-2024-04-09**. **Microsoft's GRIN MoE** achieves good resu…
**OpenAI's o1 model** faces skepticism about open-source replication due to its extreme restrictions and unique training advances like RL on CoT. **ChatGPT-4o** shows significant performance improvements across benchmarks. **Llama-3.1-405b** fp8 and bf16 versions perform similarl…
**Glean** doubled its valuation again. **Dan Hendrycks' Superforecaster AI** generates plausible election forecasts with interesting prompt engineering. A **Stanford** study found that **LLM-generated research ideas** are statistically more novel than those by expert humans. **Sa…
**Meta** announced significant adoption of **LLaMA 3.1** with nearly **350 million downloads** on Hugging Face. **Magic AI Labs** introduced **LTM-2-Mini**, a long context model with a **100 million token context window**, and a new evaluation method called HashHop. **LMSys** add…
**OpenAI** launched **GPT-4o finetuning** with a case study on Cosine. **Anthropic** released **Claude 3.5 Sonnet** with 8k token output. **Microsoft Phi** team introduced **Phi-3.5** in three variants: Mini (3.8B), MoE (16x3.8B), and Vision (4.2B), noted for sample efficiency. *…
**Anthropic** rolled out **prompt caching** in its API, reducing input costs by up to **90%** and latency by **80%**, enabling instant fine-tuning with longer prompts. **xAI** released **Grok-2**, a new model competing with frontier models from **Google DeepMind**, **OpenAI**, **…
**GPT-5** delayed again amid a quiet news day. **Nous Research** released Hermes 3 finetune of **Llama 3** base models, rivaling FAIR's instruct tunes but sparking debate over emergent existential crisis behavior with 6% roleplay data. **Nvidia** introduced Minitron finetune of *…
**Qwen2-Math-72B** outperforms **GPT-4o**, **Claude-3.5-Sonnet**, **Gemini-1.5-Pro**, and **Llama-3.1-405B** on math benchmarks using synthetic data and advanced optimization techniques. **Google AI** cuts pricing for **Gemini 1.5 Flash** by up to 78%. **Anthropic** expands its b…
**OpenAI** introduced structured outputs in their API with a new "strict" mode and a "response_format" parameter, supporting models like **gpt-4-0613**, **gpt-3.5-turbo-0613**, and the new **gpt-4o-2024-08-06**. They also halved the price of **gpt-4o** to $2.50 per million tokens…
**Meta** released **SAM 2**, a unified model for real-time object segmentation with a new dataset 4.5x larger and 53x more annotated than previous ones. **FastHTML**, a new Python web framework by **Jeremy Howard**, enables easy creation and deployment of interactive web apps. **…
**HuggingFace** released a browser-based timestamped Whisper using transformers.js. A Twitter bot by **truth_terminal** became the first "semiautonomous" bot to secure VC funding. **Microsoft** and **Apple** abruptly left the **OpenAI** board amid regulatory scrutiny. **Meta** is…
**Meta** introduced **Meta 3D Gen**, a system for end-to-end generation of 3D assets from text in under 1 minute, producing high-quality 3D assets with detailed textures. **Perplexity AI** updated Pro Search to handle deeper research with multi-step reasoning and code execution. …
**Twelve Labs** raised **$50m** in Series A funding co-led by NEA and **NVIDIA's NVentures** to advance multimodal AI. **Livekit** secured **$22m** in funding. **Groq** announced running at **800k tokens/second**. OpenAI saw a resignation from Daniel Kokotajlo. Twitter users high…
**Ilya Sutskever** steps down as Chief Scientist at **OpenAI** after nearly a decade, with **Jakub Pachocki** named as his successor. **Google DeepMind** announces **Gemini 1.5 Pro** and **Gemini 1.5 Flash** models featuring 2 million token context and improved multimodal capabil…
**Anthropic** released a team plan and iOS app about 4 months after **OpenAI**. The **Command-R 35B** model excels at creative writing, outperforming larger models like **Goliath-120** and **Miqu-120**. The **Llama-3 8B** model now supports a 1 million token context window, impro…
**RAGFlow** open sourced, a deep document understanding RAG engine with **16.3k context length** and natural language instruction support. **Jamba v0.1**, a **52B parameter** MoE model by Lightblue, released but with mixed user feedback. **Command-R** from **Cohere** available on…
The Reddit community /r/LocalLlama discusses **fine-tuning and training LLMs**, including tutorials and questions on training models with specific data like dictionaries and synthetic datasets with **25B+ tokens**. Users explore **retrieval-augmented generation (RAG)** challenges…
**DeepMind** announces **SIMA**, a generalist AI agent capable of following natural language instructions across diverse 3D environments and video games, advancing embodied AI agents. **Anthropic** releases **Claude 3 Haiku**, their fastest and most affordable model, now availabl…
**Anthropic** released **Claude 3**, replacing Claude 2.1 as the default on Perplexity AI, with **Claude 3 Opus** surpassing **GPT-4** in capability. Debate continues on whether Claude 3's performance stems from emergent properties or pattern matching. **LangChain** and **LlamaIn…
**LM Studio** users extensively discussed its performance, installation issues on macOS, and upcoming features like **Exllama2 support** and multimodality with the **Llava model**. Conversations covered **GPU offloading**, **vRAM utilization**, **MoE model expert selection**, and…
**Nous Research AI** Discord community discussed attending **NeurIPS** and organizing future AI events in Australia. Highlights include interest in open-source and decentralized AI projects, with **Richard Blythman** seeking co-founders. Users shared projects like **Photo GPT AI*…
<p>A malicious Hugging Face repository that posed as an OpenAI release delivered infostealer malware to Windows machines and recorded about 244,000 downloads before removal, according to research from AI security firm HiddenLayer. The number of downloads may have been artificiall…
A fake Hugging Face repo impersonating OpenAI’s Privacy Filter model reportedly reached #1 trending while distributing infostealer malware. Researchers say it hit ~244K downloads before removal. AI supply chain attacks are accelerating fast. Source: https:// thehackernews.com/202…
⚠️ Profili fake OpenAI su Hugging Face diffondono malware: verificate autore, repository e file prima di scaricare. Fidarsi non basta. # Cybersecurity # AI 🔗 https://www. tomshw.it/hardware/openai-fals a-hugging-face-malware-trend