Google, DeepSeek, and arXiv papers explore agent learning and memory

By PulseAugur Editorial · Summary by gemini-2.5-flash-lite from 48 sources

DeepSeek has released two new open-weight models, V4-Pro and V4-Flash, featuring a 1 million token context window and Mixture of Experts architecture. These models are significantly larger than previous DeepSeek releases and are priced competitively, aiming to offer frontier-level performance at a fraction of the cost of other leading models. The release also includes research on agent memory frameworks like ReasoningBank and Agent Evolving Learning (AEL), which focus on enabling AI agents to learn from both successes and failures to improve performance over time. Additionally, new research explores optimizing communication within multi-agent language systems and training smaller, efficient agentic models for industrial tool use. AI

Summary written by gemini-2.5-flash-lite from 48 sources. How we write summaries →

RANK_REASON New open-weight models released by DeepSeek, a significant AI lab, with advanced capabilities and competitive pricing.

Read on Practical AI →

Google, DeepSeek, and arXiv papers explore agent learning and memory

COVERAGE [48]

Google AI / Research TIER_1 · 2026-04-21 16:42

ReasoningBank: Enabling agents to learn from experience

Generative AI
Hugging Face Blog TIER_1 Română(RO) · 2025-01-31 10:29

Mini-R1: Reproduce Deepseek R1 'aha moment' of RL tutorial
Hugging Face Blog TIER_1 · 2025-01-28 00:00

Open-R1: a fully open reproduction of DeepSeek-R1
Simon Willison TIER_1 · 2026-04-24 06:01

DeepSeek V4 - almost on the frontier, a fraction of the price

<p>Chinese AI lab DeepSeek's last model release was V3.2 (and V3.2 Speciale) <a href="https://simonwillison.net/2025/Dec/1/deepseek-v32/">last December</a>. They just dropped the first of their hotly anticipated V4 series in the shape of two preview models, <a href="https://huggi…
arXiv cs.AI TIER_1 · Shangxin Guo · 2026-04-23 17:46

Nemobot Games: Crafting Strategic AI Gaming Agents for Interactive Learning with Large Language Models

This paper introduces a new paradigm for AI game programming, leveraging large language models (LLMs) to extend and operationalize Claude Shannon's taxonomy of game-playing machines. Central to this paradigm is Nemobot, an interactive agentic engineering environment that enables …
arXiv cs.CL TIER_1 · Haohan Wang · 2026-04-23 15:53

Learning to Communicate: Toward End-to-End Optimization of Multi-Agent Language Systems

Multi-agent systems built on large language models have shown strong performance on complex reasoning tasks, yet most work focuses on agent roles and orchestration while treating inter-agent communication as a fixed interface. Latent communication through internal representations…
arXiv cs.CL TIER_1 · Dimitris N. Metaxas · 2026-04-23 14:29

AEL: Agent Evolving Learning for Open-Ended Environments

LLM agents increasingly operate in open-ended environments spanning hundreds of sequential episodes, yet they remain largely stateless: each task is solved from scratch without converting past experience into better future behavior. The central obstacle is not \emph{what} to reme…
arXiv cs.CL TIER_1 · Jun Huang · 2026-04-23 12:14

AgenticQwen: Training Small Agentic Language Models with Dual Data Flywheels for Industrial-Scale Tool Use

Modern industrial applications increasingly demand language models that act as agents, capable of multi-step reasoning and tool use in real-world settings. These tasks are typically performed under strict cost and latency constraints, making small agentic models highly desirable.…
Hugging Face Daily Papers TIER_1 · 2026-04-21 21:31

From Recall to Forgetting: Benchmarking Long-Term Memory for Personalized Agents

Personalized agents that interact with users over long periods must maintain persistent memory across sessions and update it as circumstances change. However, existing benchmarks predominantly frame long-term memory evaluation as fact retrieval from past conversations, providing …
Hugging Face Daily Papers TIER_1 · 2026-04-21 15:25

A Self-Evolving Framework for Efficient Terminal Agents via Observational Context Compression

As model capabilities advance, research has increasingly shifted toward long-horizon, multi-turn terminal-centric agentic tasks, where raw environment feedback is often preserved in the interaction history to support future decisions. However, repeatedly retaining such feedback i…
Hugging Face Daily Papers TIER_1 · 2026-04-21 10:05

Rethinking Scale: Deployment Trade-offs of Small Language Models under Agent Paradigms

Despite the impressive capabilities of large language models, their substantial computational costs, latency, and privacy risks hinder their widespread deployment in real-world applications. Small Language Models (SLMs) with fewer than 10 billion parameters present a promising al…
Hugging Face Daily Papers TIER_1 · 2026-04-20 08:11

LiteResearcher: A Scalable Agentic RL Training Framework for Deep Research Agent

Reinforcement Learning (RL) has emerged as a powerful training paradigm for LLM-based agents. However, scaling agentic RL for deep research remains constrained by two coupled challenges: hand-crafted synthetic data fails to elicit genuine real-world search capabilities, and real-…
METR (Model Evaluation & Threat Research) TIER_1 · 2026-02-17 08:00

Analyzing coding agent transcripts to upper bound productivity gains from AI agents

<h2 id="introduction">Introduction</h2> <p>Human uplift studies like <a href="https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/">the one we did in 2025</a> are becoming more expensive as working without AI becomes increasingly costly. In this post, I invest…
Ahead of AI (Sebastian Raschka) TIER_1 · Sebastian Raschka, PhD · 2025-12-03 12:03

From DeepSeek V3 to V3.2: Architecture, Sparse Attention, and RL Updates

Understanding How DeepSeek's Flagship Open-Weight Models Evolved
METR (Model Evaluation & Threat Research) TIER_1 · 2025-06-27 07:00

DeepSeek and Qwen Evaluation Results
Synced Review TIER_1 · Synced · 2025-05-15 17:58

DeepSeek-V3 New Paper is coming! Unveiling the Secrets of Low-Cost Large Model Training through Hardware-Aware Co-design

<p>A newly released 14-page technical paper from the team behind DeepSeek-V3, with DeepSeek CEO Wenfeng Liang as a co-author, sheds light on the “Scaling Challenges and Reflections on Hardware for AI Architectures.”</p> The post <a href="https://syncedreview.com/2025/05/15/deepse…
Synced Review TIER_1 · Synced · 2025-04-11 14:43

DeepSeek Signals Next-Gen R2 Model, Unveils Novel Approach to Scaling Inference with SPCT

<p>DeepSeek AI, a prominent player in the large language model arena, has recently published a research paper detailing a new technique aimed at enhancing the scalability of general reward models (GRMs) during the inference phase.</p> The post <a href="https://syncedreview.com/20…
METR (Model Evaluation & Threat Research) TIER_1 · 2025-03-05 08:00

DeepSeek-R1 Evaluation Results
METR (Model Evaluation & Threat Research) TIER_1 · 2025-02-12 08:00

DeepSeek-V3 Evaluation Results
METR (Model Evaluation & Threat Research) TIER_1 · 2024-11-22 08:00

Evaluating frontier AI R&D capabilities of language model agents against human experts

<div style="display: flex;"> <div class="show-over-950"> <img class="img-small-margin" src="https://metr.org/assets/images/nov-2024-evaluating-llm-r-and-d/evaluating-frontier-ai.jpg" /> </div> <div> <p class="bigger">We’re releasing RE-Bench, a new benchmark for measuring the per…
METR (Model Evaluation & Threat Research) TIER_1 · 2023-07-31 20:00

New report: Evaluating Language-Model Agents on Realistic Autonomous Tasks

<h3 id="background">Background</h3> <p>ARC Evals develops methods for evaluating the safety of large language models (LLMs) in order to provide early warnings of models with dangerous capabilities. We have public partnerships with Anthropic and OpenAI to evaluate their AI systems…
MIT Technology Review TIER_1 · Thomas Macaulay · 2026-04-27 12:10

The Download: DeepSeek’s latest AI breakthrough, and the race to build world models

This is today’s edition of The Download, our weekday newsletter that provides a daily dose of what’s going on in the world of technology. Three reasons why DeepSeek’s new model matters On Friday, Chinese AI firm DeepSeek released a preview of V4, its long-awaited new …
X — Together (inference / OSS) TIER_1 · togethercompute · 2026-04-24 18:24

Highlights:

Highlights: 👉 SOTA coding—93.5% LiveCodeBench, Codeforces 3206, and 80.6% SWE-Bench Verified 👉 Hybrid attention efficiency—27% FLOPs and 10% KV cache vs V3.2 for long-context inference 👉 Three reasoning modes—Non-think, Think High, and Think Max 👉 Production-ready on the AI
X — Together (inference / OSS) TIER_1 · togethercompute · 2026-04-24 18:24

DeepSeek V4 Pro is now available on Together AI. DeepSeek V4 Flash coming soon.

DeepSeek V4 Pro is now available on Together AI. DeepSeek V4 Flash coming soon. Try it now: https://t.co/qFvDvBfpu5
X — Together (inference / OSS) TIER_1 · togethercompute · 2026-04-24 18:24

Introducing DeepSeek V4 Pro, a long-context model with hybrid attention, three reasoning modes, and SOTA coding performance.

Introducing DeepSeek V4 Pro, a long-context model with hybrid attention, three reasoning modes, and SOTA coding performance. AI natives can now use DeepSeek V4 Pro on Together AI and benefit from reliable inference for long-horizon coding and agentic workflows. https://t.co/4lxr…
Smol AINews TIER_1 · 2026-04-24 05:44

DeepSeek v4

**DeepSeek-V4** technical release features a **1.6T-parameter MoE with 49B active parameters** and **1M-token context**, showcasing hybrid attention and compressed KV schemes for major memory reductions. It ranks as the **#2 open-weights reasoning model** behind **Kimi K2.6** but…
X — Together (inference / OSS) TIER_1 · togethercompute · 2026-04-22 23:01

Try Kimi K2.6 now on the AI Native Cloud: https://t.co/1GUrq3E0ek

Try Kimi K2.6 now on the AI Native Cloud: https://t.co/1GUrq3E0ek
X — Together (inference / OSS) TIER_1 · togethercompute · 2026-04-22 23:01

Highlights:

Highlights: 👉 80.2% SWE-Bench Verified and 89.6% LiveCodeBench v6 👉 Agent Swarm executes up to 4,000 coordinated steps 👉 Native text, image, and video input with 79.4% MMMU-Pro 👉 Production-ready on the AI Native Cloud—99.9% SLA, serverless and dedicated options
X — Together (inference / OSS) TIER_1 · togethercompute · 2026-04-22 23:01

Introducing Kimi K2.6 from @Kimi_Moonshot, a multimodal agentic model with Agent Swarm scaling to 300 sub-agents and long-horizon coding stability. AI natives c

Introducing Kimi K2.6 from @Kimi_Moonshot, a multimodal agentic model with Agent Swarm scaling to 300 sub-agents and long-horizon coding stability. AI natives can now use Kimi K2.6 on Together AI and benefit from reliable inference for production-scale autonomous agent workflows.…
Smol AINews TIER_1 · 2025-12-02 05:44

DeepSeek V3.2 & 3.2-Speciale: GPT5-High Open Weights, Context Management, Plans for Compute Scaling

**DeepSeek** launched the **DeepSeek V3.2** family including Standard, Thinking, and Speciale variants with up to **131K context window** and competitive benchmarks against **GPT-5-High**, **Sonnet 4.5**, and **Gemini 3 Pro**. The release features a novel **Large Scale Agentic Ta…
Smol AINews TIER_1 Nederlands(NL) · 2025-03-08 05:06

DeepSeek's Open Source Stack

**DeepSeek's Open Source Week** was summarized by PySpur, highlighting multiple interesting releases. The **Qwen QwQ-32B model** was fine-tuned into **START**, excelling in PhD-level science QA and math benchmarks. **Character-3**, an omnimodal AI video generation model by Hedra …
Smol AINews TIER_1 · 2025-01-25 02:32

TinyZero: Reproduce DeepSeek R1-Zero for $30

**DeepSeek Mania** continues to reshape the frontier model landscape with Jiayi Pan from Berkeley reproducing the *OTHER* result from the DeepSeek R1 paper, R1-Zero, in a cost-effective Qwen model fine-tune for two math tasks. A key finding is a lower bound to the distillation ef…
Smol AINews TIER_1 · 2025-01-21 07:50

DeepSeek R1: o1-level open weights model and a simple recipe for upgrading 1.5B models to Sonnet/4o level

**DeepSeek** released **DeepSeek R1**, a significant upgrade over **DeepSeek V3** from just three weeks prior, featuring 8 models including full-size 671B MoE models and multiple distillations from **Qwen 2.5** and **Llama 3.1/3.3**. The models are MIT licensed, allowing finetuni…
Smol AINews TIER_1 · 2024-12-27 01:18

DeepSeek v3: 671B finegrained MoE trained for $5.5m USD of compute on 15T tokens

**DeepSeek-V3** has launched with **671B MoE parameters** and trained on **14.8T tokens**, outperforming **GPT-4o** and **Claude-3.5-sonnet** in benchmarks. It was trained with only **2.788M H800 GPU hours**, significantly less than **Llama-3**'s **30.8M GPU-hours**, showcasing m…
Smol AINews TIER_1 · 2024-11-21 02:41

DeepSeek-R1 claims to beat o1-preview AND will be open sourced

**DeepSeek** has released **DeepSeek-R1-Lite-Preview**, an open-source reasoning model achieving **o1-preview-level performance** on math benchmarks with transparent thought processes, showing promise in real-time problem-solving. **NVIDIA** reported a record **$35.1 billion** re…
ChinaTalk TIER_1 · Irene Zhang · 2026-04-27 10:55

DeepSeek V4

Has the "post-DeepSeek era" arrived?
TLDR AI TIER_1 · TLDR · 2026-04-24 00:00

GPT-5.5 release 🚀, Anthropic $1T valuation 💰, DeepSeek v4
Latent Space Podcast TIER_1 · Latent.Space · 2024-09-27 17:59

Language Agents: From Reasoning to Acting

<p><strong><em>OpenAI DevDay is almost here</em></strong><em>! Per tradition, we are hosting </em><a href="https://lu.ma/devday-pregame" target="_blank"><em>a DevDay pregame event</em></a><em> for everyone coming to town! Join us with demos and gossip!</em></p><p><em>Also sign up…
Hacker News — AI stories ≥50 points TIER_1 · cmrdporcupine · 2026-04-24 03:07

DeepSeek-V4: Towards Highly Efficient Million-Token Context Intelligence
Hacker News — AI stories ≥50 points TIER_1 · impact_sy · 2026-04-24 03:01

DeepSeek v4
Practical AI TIER_1 · Practical AI LLC · 2025-01-31 15:30

Deep-dive into DeepSeek

<p>There is crazy hype and a lot of confusion related to DeepSeek’s latest model DeepSeek R1. The products provided by DeepSeek (their version of a ChatGPT-like app) has exploded in popularity. However, ties to China have raised privacy and geopolitical concerns. In this episode,…
Mastodon — sigmoid.social TIER_1 Polski(PL) · [email protected] · 2026-04-26 10:46

Chinese lab DeepSeek released the DeepSeek-V4-Pro model, which not only matches Western competition in coding but offers it at a fraction of the price. Dzi

Chińskie laboratorium DeepSeek wypuściło model DeepSeek-V4-Pro, który nie tylko dorównuje zachodniej konkurencji w kodowaniu, ale oferuje go za ułamek ceny. Dzięki innowacyjnej architekturze koszty zostały obniżone o 98%, co stanowi bezpośrednie wyzwanie dla dominujących graczy n…

LINKS aisight.pl/…/generatory-obrazow-ai-stereo…
Mastodon — sigmoid.social TIER_1 Italiano(IT) · [email protected] · 2026-04-26 05:49

🧠 DeepSeek V4 Preview is officially available and open-source: are we entering the era of truly sustainable 1 million token context models? 👉 The det

🧠 # DeepSeek V4 Preview è ufficialmente disponibile e open-source: entriamo nell’era dei modelli con contesto da 1 milione di token davvero sostenibile? 👉 I dettagli: https://www. linkedin.com/posts/alessiopoma ro_deepseek-ollama-llm-activity-7454041633915994112-F4ZO ___ ✉️ 𝗦𝗲 𝘃𝘂…

LINKS alessiopomaro.it
Mastodon — fosstodon.org TIER_1 · [email protected] · 2026-04-29 17:26

🚀 DeepSeek V4 — 1.6T MoE, only 49B active. 1M token context. → 73% lower inference cost vs V3 → 90% less KV cache memory → V4-Pro: $0.435/M input (promo) → V4-F

🚀 DeepSeek V4 — 1.6T MoE, only 49B active. 1M token context. → 73% lower inference cost vs V3 → 90% less KV cache memory → V4-Pro: $0.435/M input (promo) → V4-Flash: $0.14/M input → Matches GPT-5.4 at 5-10x lower cost Open weights. MIT license. Full guide: https:// crazyrouter.co…
Mastodon — fosstodon.org TIER_1 · [email protected] · 2026-04-26 00:02

DeepSeek V4: Million-Token Context That Actually Works DeepSeek V4: Million-Token Context That Actually Works Most long-context models are benchmarks in search

DeepSeek V4: Million-Token Context That Actually Works DeepSeek V4: Million-Token Context That Actually Works Most long-context models are benchmarks in search of a use case. DeepSeek V4 flips the ... #ai #machinelearning #llm #agents Origin | Interest | Match

LINKS dev.to/…/deepseek-v4-million-token-contex… awakari.com/sub-details.html awakari.com/pub-msg.html
r/MachineLearning TIER_1 · /u/kalpitdixit · 2026-04-25 02:33

Open-source 9-task benchmark for coding-agent retrieval augmentation. Per-task deltas +0.010 to +0.320, all evals reproducible [P]

<table> <tr><td> <a href="https://www.reddit.com/r/MachineLearning/comments/1suzqxe/opensource_9task_benchmark_for_codingagent/"> <img alt="Open-source 9-task benchmark for coding-agent retrieval augmentation. Per-task deltas +0.010 to +0.320, all evals reproducible [P]" src="htt…
Mastodon — mastodon.social TIER_1 Svenska(SV) · redaktionen · 2026-04-27 13:03

DeepSeek launches V4: A new era for AI with longer prompt processing https://redaktionen.net/artikel/617 #ai #svtech

DeepSeek lanserar V4: En ny era för AI med längre promptbearbetning https:// redaktionen.net/artikel/617 # ai # svtech

LINKS redaktionen.net/…/617
Mastodon — mastodon.social TIER_1 · [email protected] · 2026-04-25 23:44

DeepSeek-V4 on Day 0: From Fast Inference to Verified RL with SGLang and Miles https://www.lmsys.org/blog/2026-04-25-deepseek-v4/ # HackerNews # Tech # AI

DeepSeek-V4 on Day 0: From Fast Inference to Verified RL with SGLang and Miles https://www.lmsys.org/blog/2026-04-25-deepseek-v4/ # HackerNews # Tech # AI

LINKS lmsys.org/…/2026-04-25-deepseek-v4

COVERAGE [48]

RELATED ENTITIES

RELATED TOPICS