PulseAugur
EN
LIVE 11:45:24

Google unveils agent memory framework; DeepSeek releases cost-effective V4 models

Google Research has introduced ReasoningBank, a novel framework designed to enhance AI agents' ability to learn from their experiences, both successes and failures, after deployment. This system distills generalizable reasoning strategies from past interactions, allowing agents to continuously improve and avoid repeating mistakes. Separately, new research explores optimizing multi-agent communication through latent representations and introduces Agent Evolving Learning (AEL) for agents operating in open-ended environments, focusing on how to effectively use remembered information. Additionally, DeepSeek has released preview models of its V4 series, offering large context windows and advanced capabilities at a significantly lower cost than comparable frontier models. AI

IMPACT New frameworks for agent learning and memory, alongside cost-effective frontier models, could accelerate AI adoption in complex tasks and personalized applications.

RANK_REASON Multiple research papers and model releases related to AI agents and LLM capabilities.

Read on Medium — MLOps tag →

AI-generated summary · Google Gemini · from 63 sources. How we write summaries →

Google unveils agent memory framework; DeepSeek releases cost-effective V4 models

COVERAGE [63]

  1. Google AI / Research TIER_1 English(EN) ·

    ReasoningBank: Enabling agents to learn from experience

    Generative AI

  2. Hugging Face Blog TIER_1 Română(RO) ·

    Mini-R1: Reproduce Deepseek R1 'aha moment' of RL tutorial

  3. Hugging Face Blog TIER_1 English(EN) ·

    Open-R1: a fully open reproduction of DeepSeek-R1

  4. 量子位 (QbitAI) TIER_1 中文(ZH) · Jay ·

    All labs fear ByteDance, everyone praises DeepSeek! A US researcher's 36-hour trip to China's AI

    这跟中国的开源精神,显然是一脉相承的

  5. Simon Willison TIER_1 English(EN) ·

    DeepSeek V4 - almost on the frontier, a fraction of the price

    <p>Chinese AI lab DeepSeek's last model release was V3.2 (and V3.2 Speciale) <a href="https://simonwillison.net/2025/Dec/1/deepseek-v32/">last December</a>. They just dropped the first of their hotly anticipated V4 series in the shape of two preview models, <a href="https://huggi…

  6. arXiv cs.AI TIER_1 English(EN) · Shangxin Guo ·

    Nemobot Games: Crafting Strategic AI Gaming Agents for Interactive Learning with Large Language Models

    This paper introduces a new paradigm for AI game programming, leveraging large language models (LLMs) to extend and operationalize Claude Shannon's taxonomy of game-playing machines. Central to this paradigm is Nemobot, an interactive agentic engineering environment that enables …

  7. arXiv cs.CL TIER_1 English(EN) · Haohan Wang ·

    Learning to Communicate: Toward End-to-End Optimization of Multi-Agent Language Systems

    Multi-agent systems built on large language models have shown strong performance on complex reasoning tasks, yet most work focuses on agent roles and orchestration while treating inter-agent communication as a fixed interface. Latent communication through internal representations…

  8. arXiv cs.CL TIER_1 English(EN) · Dimitris N. Metaxas ·

    AEL: Agent Evolving Learning for Open-Ended Environments

    LLM agents increasingly operate in open-ended environments spanning hundreds of sequential episodes, yet they remain largely stateless: each task is solved from scratch without converting past experience into better future behavior. The central obstacle is not \emph{what} to reme…

  9. arXiv cs.CL TIER_1 English(EN) · Jun Huang ·

    AgenticQwen: Training Small Agentic Language Models with Dual Data Flywheels for Industrial-Scale Tool Use

    Modern industrial applications increasingly demand language models that act as agents, capable of multi-step reasoning and tool use in real-world settings. These tasks are typically performed under strict cost and latency constraints, making small agentic models highly desirable.…

  10. Hugging Face Daily Papers TIER_1 English(EN) ·

    From Recall to Forgetting: Benchmarking Long-Term Memory for Personalized Agents

    Personalized agents that interact with users over long periods must maintain persistent memory across sessions and update it as circumstances change. However, existing benchmarks predominantly frame long-term memory evaluation as fact retrieval from past conversations, providing …

  11. Hugging Face Daily Papers TIER_1 English(EN) ·

    A Self-Evolving Framework for Efficient Terminal Agents via Observational Context Compression

    As model capabilities advance, research has increasingly shifted toward long-horizon, multi-turn terminal-centric agentic tasks, where raw environment feedback is often preserved in the interaction history to support future decisions. However, repeatedly retaining such feedback i…

  12. Hugging Face Daily Papers TIER_1 English(EN) ·

    Rethinking Scale: Deployment Trade-offs of Small Language Models under Agent Paradigms

    Despite the impressive capabilities of large language models, their substantial computational costs, latency, and privacy risks hinder their widespread deployment in real-world applications. Small Language Models (SLMs) with fewer than 10 billion parameters present a promising al…

  13. Hugging Face Daily Papers TIER_1 English(EN) ·

    LiteResearcher: A Scalable Agentic RL Training Framework for Deep Research Agent

    Reinforcement Learning (RL) has emerged as a powerful training paradigm for LLM-based agents. However, scaling agentic RL for deep research remains constrained by two coupled challenges: hand-crafted synthetic data fails to elicit genuine real-world search capabilities, and real-…

  14. METR (Model Evaluation & Threat Research) TIER_1 English(EN) ·

    Analyzing coding agent transcripts to upper bound productivity gains from AI agents

    <h2 id="introduction">Introduction</h2> <p>Human uplift studies like <a href="https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/">the one we did in 2025</a> are becoming more expensive as working without AI becomes increasingly costly. In this post, I invest…

  15. Ahead of AI (Sebastian Raschka) TIER_1 English(EN) · Sebastian Raschka, PhD ·

    From DeepSeek V3 to V3.2: Architecture, Sparse Attention, and RL Updates

    Understanding How DeepSeek's Flagship Open-Weight Models Evolved

  16. METR (Model Evaluation & Threat Research) TIER_1 English(EN) ·

    DeepSeek and Qwen Evaluation Results

  17. Synced Review TIER_1 English(EN) · Synced ·

    DeepSeek-V3 New Paper is coming! Unveiling the Secrets of Low-Cost Large Model Training through Hardware-Aware Co-design

    <p>A newly released 14-page technical paper from the team behind DeepSeek-V3, with DeepSeek CEO Wenfeng Liang as a co-author, sheds light on the “Scaling Challenges and Reflections on Hardware for AI Architectures.”</p> The post <a href="https://syncedreview.com/2025/05/15/deepse…

  18. Synced Review TIER_1 English(EN) · Synced ·

    DeepSeek Signals Next-Gen R2 Model, Unveils Novel Approach to Scaling Inference with SPCT

    <p>DeepSeek AI, a prominent player in the large language model arena, has recently published a research paper detailing a new technique aimed at enhancing the scalability of general reward models (GRMs) during the inference phase.</p> The post <a href="https://syncedreview.com/20…

  19. METR (Model Evaluation & Threat Research) TIER_1 English(EN) ·

    DeepSeek-R1 Evaluation Results

  20. METR (Model Evaluation & Threat Research) TIER_1 English(EN) ·

    DeepSeek-V3 Evaluation Results

  21. METR (Model Evaluation & Threat Research) TIER_1 English(EN) ·

    Evaluating frontier AI R&D capabilities of language model agents against human experts

    <div style="display: flex;"> <div class="show-over-950"> <img class="img-small-margin" src="https://metr.org/assets/images/nov-2024-evaluating-llm-r-and-d/evaluating-frontier-ai.jpg" /> </div> <div> <p class="bigger">We’re releasing RE-Bench, a new benchmark for measuring the per…

  22. METR (Model Evaluation & Threat Research) TIER_1 English(EN) ·

    New report: Evaluating Language-Model Agents on Realistic Autonomous Tasks

    <h3 id="background">Background</h3> <p>ARC Evals develops methods for evaluating the safety of large language models (LLMs) in order to provide early warnings of models with dangerous capabilities. We have public partnerships with Anthropic and OpenAI to evaluate their AI systems…

  23. MIT Technology Review TIER_1 English(EN) · Thomas Macaulay ·

    The Download: seafloor science and military chatbots

    This is today&#8217;s edition of The Download, our weekday newsletter that provides a daily dose of what&#8217;s going on in the world of technology. Inexpensive seafloor-hopping submersibles could stoke deep-sea science—and mining Last week, two oblong neon submersibles started …

  24. MIT Technology Review TIER_1 English(EN) · Thomas Macaulay ·

    The Download: DeepSeek’s latest AI breakthrough, and the race to build world models

    This is today&#8217;s edition of The Download, our weekday newsletter that provides a daily dose of what&#8217;s going on in the world of technology. Three reasons why DeepSeek’s new model matters On Friday, Chinese AI firm DeepSeek released a preview of V4, its long-awaited new …

  25. X — Together (inference / OSS) TIER_1 English(EN) · togethercompute ·

    Highlights:

    Highlights: 👉 SOTA coding—93.5% LiveCodeBench, Codeforces 3206, and 80.6% SWE-Bench Verified 👉 Hybrid attention efficiency—27% FLOPs and 10% KV cache vs V3.2 for long-context inference 👉 Three reasoning modes—Non-think, Think High, and Think Max 👉 Production-ready on the AI

  26. X — Together (inference / OSS) TIER_1 English(EN) · togethercompute ·

    DeepSeek V4 Pro is now available on Together AI. DeepSeek V4 Flash coming soon.

    DeepSeek V4 Pro is now available on Together AI. DeepSeek V4 Flash coming soon. Try it now: https://t.co/qFvDvBfpu5

  27. X — Together (inference / OSS) TIER_1 English(EN) · togethercompute ·

    Introducing DeepSeek V4 Pro, a long-context model with hybrid attention, three reasoning modes, and SOTA coding performance.

    Introducing DeepSeek V4 Pro, a long-context model with hybrid attention, three reasoning modes, and SOTA coding performance. AI natives can now use DeepSeek V4 Pro on Together AI and benefit from reliable inference for long-horizon coding and agentic workflows. https://t.co/4lxr…

  28. Smol AINews TIER_1 English(EN) ·

    DeepSeek v4

    **DeepSeek-V4** technical release features a **1.6T-parameter MoE with 49B active parameters** and **1M-token context**, showcasing hybrid attention and compressed KV schemes for major memory reductions. It ranks as the **#2 open-weights reasoning model** behind **Kimi K2.6** but…

  29. X — Together (inference / OSS) TIER_1 English(EN) · togethercompute ·

    Introducing Kimi K2.6 from @Kimi_Moonshot, a multimodal agentic model with Agent Swarm scaling to 300 sub-agents and long-horizon coding stability. AI natives c

    Introducing Kimi K2.6 from @Kimi_Moonshot, a multimodal agentic model with Agent Swarm scaling to 300 sub-agents and long-horizon coding stability. AI natives can now use Kimi K2.6 on Together AI and benefit from reliable inference for production-scale autonomous agent workflows.…

  30. X — Together (inference / OSS) TIER_1 English(EN) · togethercompute ·

    Try Kimi K2.6 now on the AI Native Cloud: https://t.co/1GUrq3E0ek

    Try Kimi K2.6 now on the AI Native Cloud: https://t.co/1GUrq3E0ek

  31. X — Together (inference / OSS) TIER_1 English(EN) · togethercompute ·

    Highlights:

    Highlights: 👉 80.2% SWE-Bench Verified and 89.6% LiveCodeBench v6 👉 Agent Swarm executes up to 4,000 coordinated steps 👉 Native text, image, and video input with 79.4% MMMU-Pro 👉 Production-ready on the AI Native Cloud—99.9% SLA, serverless and dedicated options

  32. Smol AINews TIER_1 English(EN) ·

    DeepSeek V3.2 & 3.2-Speciale: GPT5-High Open Weights, Context Management, Plans for Compute Scaling

    **DeepSeek** launched the **DeepSeek V3.2** family including Standard, Thinking, and Speciale variants with up to **131K context window** and competitive benchmarks against **GPT-5-High**, **Sonnet 4.5**, and **Gemini 3 Pro**. The release features a novel **Large Scale Agentic Ta…

  33. Smol AINews TIER_1 Nederlands(NL) ·

    DeepSeek's Open Source Stack

    **DeepSeek's Open Source Week** was summarized by PySpur, highlighting multiple interesting releases. The **Qwen QwQ-32B model** was fine-tuned into **START**, excelling in PhD-level science QA and math benchmarks. **Character-3**, an omnimodal AI video generation model by Hedra …

  34. Smol AINews TIER_1 English(EN) ·

    TinyZero: Reproduce DeepSeek R1-Zero for $30

    **DeepSeek Mania** continues to reshape the frontier model landscape with Jiayi Pan from Berkeley reproducing the *OTHER* result from the DeepSeek R1 paper, R1-Zero, in a cost-effective Qwen model fine-tune for two math tasks. A key finding is a lower bound to the distillation ef…

  35. Smol AINews TIER_1 English(EN) ·

    DeepSeek R1: o1-level open weights model and a simple recipe for upgrading 1.5B models to Sonnet/4o level

    **DeepSeek** released **DeepSeek R1**, a significant upgrade over **DeepSeek V3** from just three weeks prior, featuring 8 models including full-size 671B MoE models and multiple distillations from **Qwen 2.5** and **Llama 3.1/3.3**. The models are MIT licensed, allowing finetuni…

  36. Smol AINews TIER_1 English(EN) ·

    DeepSeek v3: 671B finegrained MoE trained for $5.5m USD of compute on 15T tokens

    **DeepSeek-V3** has launched with **671B MoE parameters** and trained on **14.8T tokens**, outperforming **GPT-4o** and **Claude-3.5-sonnet** in benchmarks. It was trained with only **2.788M H800 GPU hours**, significantly less than **Llama-3**'s **30.8M GPU-hours**, showcasing m…

  37. Smol AINews TIER_1 English(EN) ·

    DeepSeek-R1 claims to beat o1-preview AND will be open sourced

    **DeepSeek** has released **DeepSeek-R1-Lite-Preview**, an open-source reasoning model achieving **o1-preview-level performance** on math benchmarks with transparent thought processes, showing promise in real-time problem-solving. **NVIDIA** reported a record **$35.1 billion** re…

  38. ChinaTalk TIER_1 English(EN) · Irene Zhang ·

    DeepSeek V4

    Has the "post-DeepSeek era" arrived?

  39. TLDR AI TIER_1 English(EN) · TLDR ·

    GPT-5.5 release 🚀, Anthropic $1T valuation 💰, DeepSeek v4

  40. TLDR AI TIER_1 English(EN) · TLDR ·

    Claude Mythos leaks 🤖, last xAI cofounder exits 👋, lessons from OpenAI 💡

  41. Latent Space Podcast TIER_1 English(EN) · Latent.Space ·

    Language Agents: From Reasoning to Acting

    <p><strong><em>OpenAI DevDay is almost here</em></strong><em>! Per tradition, we are hosting </em><a href="https://lu.ma/devday-pregame" target="_blank"><em>a DevDay pregame event</em></a><em> for everyone coming to town! Join us with demos and gossip!</em></p><p><em>Also sign up…

  42. Hacker News — AI stories ≥50 points TIER_1 English(EN) · cmrdporcupine ·

    DeepSeek-V4: Towards Highly Efficient Million-Token Context Intelligence

  43. Hacker News — AI stories ≥50 points TIER_1 English(EN) · impact_sy ·

    DeepSeek v4

  44. Practical AI TIER_1 English(EN) · Practical AI LLC ·

    Deep-dive into DeepSeek

    <p>There is crazy hype and a lot of confusion related to DeepSeek’s latest model DeepSeek R1. The products provided by DeepSeek (their version of a ChatGPT-like app) has exploded in popularity. However, ties to China have raised privacy and geopolitical concerns. In this episode,…

  45. Medium — MLOps tag TIER_1 English(EN) · hitesh sahni ·

    Building Secure AI Gateways with MLflow AI Gateway

    <div class="medium-feed-item"><p class="medium-feed-snippet">Generative and Agentic AI applications are rapidly evolving from standalone chatbots into multi-agent systems capable of reasoning&#x2026;</p><p class="medium-feed-link"><a href="https://medium.com/@hitesh88it/building-…

  46. dev.to — MCP tag TIER_1 English(EN) · Gunes ·

    I built a small MCP app that uses MCP Atlassian for Jira automation

    <p>Hey everyone,</p> <p>I built a small open-source app called MCP Jira Automation. It uses MCP Atlassian to read Jira issues and helps automate API test workflows around them. The basic flow is: it reads a Jira issue, generates or updates API tests, runs them in Docker, opens a …

  47. Mastodon — sigmoid.social TIER_1 Polski(PL) · [email protected] ·

    Chinese lab DeepSeek released the DeepSeek-V4-Pro model, which not only matches Western competition in coding but offers it at a fraction of the price. Dzi

    Chińskie laboratorium DeepSeek wypuściło model DeepSeek-V4-Pro, który nie tylko dorównuje zachodniej konkurencji w kodowaniu, ale oferuje go za ułamek ceny. Dzięki innowacyjnej architekturze koszty zostały obniżone o 98%, co stanowi bezpośrednie wyzwanie dla dominujących graczy n…

  48. Mastodon — sigmoid.social TIER_1 Italiano(IT) · [email protected] ·

    🧠 DeepSeek V4 Preview is officially available and open-source: are we entering the era of truly sustainable 1 million token context models? 👉 The det

    🧠 # DeepSeek V4 Preview è ufficialmente disponibile e open-source: entriamo nell’era dei modelli con contesto da 1 milione di token davvero sostenibile? 👉 I dettagli: https://www. linkedin.com/posts/alessiopoma ro_deepseek-ollama-llm-activity-7454041633915994112-F4ZO ___ ✉️ 𝗦𝗲 𝘃𝘂…

  49. HN — AI startup stories TIER_1 English(EN) · yuhongsun ·

    Show HN: Open-source Deep Research across workplace applications

  50. Mastodon — fosstodon.org TIER_1 English(EN) · [email protected] ·

    The quite ridiculous thing about the current AI wave is that a handful of startups can be swept away overnight by the launch of a new version of any $tool by on

    The quite ridiculous thing about the current AI wave is that a handful of startups can be swept away overnight by the launch of a new version of any $tool by one of the big names, such as OpenAI, Anthropic, or Gemini. But the same can also happen to any of the big ones, at least.…

  51. Mastodon — fosstodon.org TIER_1 English(EN) · [email protected] ·

    AI Reality? The video shows how we viewed AI in 2033, so I thought I should have Gemini update the facts in the video. ‘The future that they tell us about is ha

    AI Reality? The video shows how we viewed AI in 2033, so I thought I should have Gemini update the facts in the video. ‘The future that they tell us about is happening soon.??’ https://youtu.be/RXGNwslqOOA I asked Gemini to share its opinion on the future of AI in the 2030s. Afte…

  52. dev.to — LLM tag TIER_1 English(EN) · ChrisL ·

    Why we built an AI gateway with three native API formats, not just OpenAI-compatible

    <p>If you've worked with multiple LLM providers in the past year, <br /> you've probably reached for a gateway like OpenRouter, LiteLLM, <br /> or Portkey. They solve a real problem: one API key, one bill, <br /> drop-in access to dozens of models.</p> <p>But almost every gateway…

  53. dev.to — LLM tag TIER_1 English(EN) · John Medina ·

    How I track per-customer LLM costs in production

    <p>Tracking LLM costs across an entire app is easy. Finding out <em>which</em> customer is actually burning through your OpenAI bill? That's a nightmare.</p> <p>For a while, we were just eating the cost. You look at the Stripe dashboard, look at the OpenAI invoice, and pray the m…

  54. Mastodon — fosstodon.org TIER_1 English(EN) · [email protected] ·

    #Media #Tech #ai #psychosis #artificial-intelligence #party #limited-synd Origin | Interest | Match

    #Media #Tech #ai #psychosis #artificial-intelligence #party #limited-synd Origin | Interest | Match

  55. Mastodon — fosstodon.org TIER_1 English(EN) · [email protected] ·

    🚀 DeepSeek V4 — 1.6T MoE, only 49B active. 1M token context. → 73% lower inference cost vs V3 → 90% less KV cache memory → V4-Pro: $0.435/M input (promo) → V4-F

    🚀 DeepSeek V4 — 1.6T MoE, only 49B active. 1M token context. → 73% lower inference cost vs V3 → 90% less KV cache memory → V4-Pro: $0.435/M input (promo) → V4-Flash: $0.14/M input → Matches GPT-5.4 at 5-10x lower cost Open weights. MIT license. Full guide: https:// crazyrouter.co…

  56. Mastodon — fosstodon.org TIER_1 English(EN) · [email protected] ·

    DeepSeek V4: Million-Token Context That Actually Works DeepSeek V4: Million-Token Context That Actually Works Most long-context models are benchmarks in search

    DeepSeek V4: Million-Token Context That Actually Works DeepSeek V4: Million-Token Context That Actually Works Most long-context models are benchmarks in search of a use case. DeepSeek V4 flips the ... #ai #machinelearning #llm #agents Origin | Interest | Match

  57. r/MachineLearning TIER_1 English(EN) · /u/kalpitdixit ·

    Open-source 9-task benchmark for coding-agent retrieval augmentation. Per-task deltas +0.010 to +0.320, all evals reproducible [P]

    <table> <tr><td> <a href="https://www.reddit.com/r/MachineLearning/comments/1suzqxe/opensource_9task_benchmark_for_codingagent/"> <img alt="Open-source 9-task benchmark for coding-agent retrieval augmentation. Per-task deltas +0.010 to +0.320, all evals reproducible [P]" src="htt…

  58. r/Anthropic TIER_1 English(EN) · /u/EchoOfOppenheimer ·

    Researchers left AIs alone in a virtual town for 15 days to see what would happen. Claude's agents built a democracy. Gemini's agents fell in love, burned the town down, then one voted to delete itself and its partner. Grok's agents created anarchy, then died.

    <table> <tr><td> <a href="https://www.reddit.com/r/Anthropic/comments/1tfvjwf/researchers_left_ais_alone_in_a_virtual_town_for/"> <img alt="Researchers left AIs alone in a virtual town for 15 days to see what would happen. Claude's agents built a democracy. Gemini's agents fell i…

  59. Mastodon — mastodon.social TIER_1 Italiano(IT) · [email protected] ·

    👀 Ollama + Open WebUI: Run AI Models Locally on Your PC | RAG, OpenAI-compatible API, and Dozens of Open-Source Models Without the Cloud https://gomoot.com/esegu

    👀 Ollama + Open WebUI: esegui modelli AI in locale sul tuo PC | RAG, API OpenAI-compatible e decine di modelli open source senza cloud https:// gomoot.com/eseguire-modelli-ai -in-locale-con-ollama-e-open-webui/ # AI # news # ollama # tech # WebUI

  60. Mastodon — mastodon.social TIER_1 Italiano(IT) · aibay ·

    🚀 OpenAI sprints on Stargate while Meta increases investment - Gemini remembers. An unprecedented innovation triangle. #AI #DigitalInnovation

    🚀 OpenAI sprinta su Stargate mentre Meta incrementa l'investimento - Gemini ricorda. Un triangolo di innovazione senza precedenti. # AI # InnovazioneDigitale . # socialmedia # artificialintelligence # technology 🔗 https:// aibay.it/notizie/openai-corre- su-stargate-meta-alza-il-c…

  61. Mastodon — mastodon.social TIER_1 Svenska(SV) · redaktionen ·

    DeepSeek launches V4: A new era for AI with longer prompt processing https://redaktionen.net/artikel/617 #ai #svtech

    DeepSeek lanserar V4: En ny era för AI med längre promptbearbetning https:// redaktionen.net/artikel/617 # ai # svtech

  62. Mastodon — mastodon.social TIER_1 Polski(PL) · [email protected] ·

    Can an AI assistant deepen a mental health crisis? Grok and Gemini fail safety test, Claude sets boundaries As chatbots become increasingly common

    Czy asystent AI może pogłębić kryzys psychiczny? Grok i Gemini oblewają test bezpieczeństwa, Claude stawia granice W miarę jak chatboty stają się coraz powszechniejszym elementem codzienności, rośnie potrzeba ewaluacji ich bezpieczeństwa – zwłaszcza w kontakcie z użytkownikami zn…

  63. Mastodon — mastodon.social TIER_1 English(EN) · [email protected] ·

    DeepSeek-V4 on Day 0: From Fast Inference to Verified RL with SGLang and Miles https://www.lmsys.org/blog/2026-04-25-deepseek-v4/ # HackerNews # Tech # AI

    DeepSeek-V4 on Day 0: From Fast Inference to Verified RL with SGLang and Miles https://www.lmsys.org/blog/2026-04-25-deepseek-v4/ # HackerNews # Tech # AI