Google, DeepSeek, and arXiv papers explore agent learning and memory
ByPulseAugur Editorial·
Summary by gemini-2.5-flash-lite
from 48 sources
DeepSeek has released two new open-weight models, V4-Pro and V4-Flash, featuring a 1 million token context window and Mixture of Experts architecture. These models are significantly larger than previous DeepSeek releases and are priced competitively, aiming to offer frontier-level performance at a fraction of the cost of other leading models. The release also includes research on agent memory frameworks like ReasoningBank and Agent Evolving Learning (AEL), which focus on enabling AI agents to learn from both successes and failures to improve performance over time. Additionally, new research explores optimizing communication within multi-agent language systems and training smaller, efficient agentic models for industrial tool use.
AI
<p>Chinese AI lab DeepSeek's last model release was V3.2 (and V3.2 Speciale) <a href="https://simonwillison.net/2025/Dec/1/deepseek-v32/">last December</a>. They just dropped the first of their hotly anticipated V4 series in the shape of two preview models, <a href="https://huggi…
This paper introduces a new paradigm for AI game programming, leveraging large language models (LLMs) to extend and operationalize Claude Shannon's taxonomy of game-playing machines. Central to this paradigm is Nemobot, an interactive agentic engineering environment that enables …
Multi-agent systems built on large language models have shown strong performance on complex reasoning tasks, yet most work focuses on agent roles and orchestration while treating inter-agent communication as a fixed interface. Latent communication through internal representations…
LLM agents increasingly operate in open-ended environments spanning hundreds of sequential episodes, yet they remain largely stateless: each task is solved from scratch without converting past experience into better future behavior. The central obstacle is not \emph{what} to reme…
Modern industrial applications increasingly demand language models that act as agents, capable of multi-step reasoning and tool use in real-world settings. These tasks are typically performed under strict cost and latency constraints, making small agentic models highly desirable.…
Personalized agents that interact with users over long periods must maintain persistent memory across sessions and update it as circumstances change. However, existing benchmarks predominantly frame long-term memory evaluation as fact retrieval from past conversations, providing …
As model capabilities advance, research has increasingly shifted toward long-horizon, multi-turn terminal-centric agentic tasks, where raw environment feedback is often preserved in the interaction history to support future decisions. However, repeatedly retaining such feedback i…
Despite the impressive capabilities of large language models, their substantial computational costs, latency, and privacy risks hinder their widespread deployment in real-world applications. Small Language Models (SLMs) with fewer than 10 billion parameters present a promising al…
Reinforcement Learning (RL) has emerged as a powerful training paradigm for LLM-based agents. However, scaling agentic RL for deep research remains constrained by two coupled challenges: hand-crafted synthetic data fails to elicit genuine real-world search capabilities, and real-…
<h2 id="introduction">Introduction</h2> <p>Human uplift studies like <a href="https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/">the one we did in 2025</a> are becoming more expensive as working without AI becomes increasingly costly. In this post, I invest…
Ahead of AI (Sebastian Raschka)
TIER_1·Sebastian Raschka, PhD·
<p>A newly released 14-page technical paper from the team behind DeepSeek-V3, with DeepSeek CEO Wenfeng Liang as a co-author, sheds light on the “Scaling Challenges and Reflections on Hardware for AI Architectures.”</p> The post <a href="https://syncedreview.com/2025/05/15/deepse…
<p>DeepSeek AI, a prominent player in the large language model arena, has recently published a research paper detailing a new technique aimed at enhancing the scalability of general reward models (GRMs) during the inference phase.</p> The post <a href="https://syncedreview.com/20…
<h3 id="background">Background</h3> <p>ARC Evals develops methods for evaluating the safety of large language models (LLMs) in order to provide early warnings of models with dangerous capabilities. We have public partnerships with Anthropic and OpenAI to evaluate their AI systems…
This is today’s edition of The Download, our weekday newsletter that provides a daily dose of what’s going on in the world of technology. Three reasons why DeepSeek’s new model matters On Friday, Chinese AI firm DeepSeek released a preview of V4, its long-awaited new …
X — Together (inference / OSS)
TIER_1·togethercompute·
Highlights:
👉 SOTA coding—93.5% LiveCodeBench, Codeforces 3206, and 80.6% SWE-Bench Verified
👉 Hybrid attention efficiency—27% FLOPs and 10% KV cache vs V3.2 for long-context inference
👉 Three reasoning modes—Non-think, Think High, and Think Max
👉 Production-ready on the AI
X — Together (inference / OSS)
TIER_1·togethercompute·
Introducing DeepSeek V4 Pro, a long-context model with hybrid attention, three reasoning modes, and SOTA coding performance.
AI natives can now use DeepSeek V4 Pro on Together AI and benefit from reliable inference for long-horizon coding and agentic workflows. https://t.co/4lxr…
**DeepSeek-V4** technical release features a **1.6T-parameter MoE with 49B active parameters** and **1M-token context**, showcasing hybrid attention and compressed KV schemes for major memory reductions. It ranks as the **#2 open-weights reasoning model** behind **Kimi K2.6** but…
X — Together (inference / OSS)
TIER_1·togethercompute·
Highlights:
👉 80.2% SWE-Bench Verified and 89.6% LiveCodeBench v6
👉 Agent Swarm executes up to 4,000 coordinated steps
👉 Native text, image, and video input with 79.4% MMMU-Pro
👉 Production-ready on the AI Native Cloud—99.9% SLA, serverless and dedicated options
X — Together (inference / OSS)
TIER_1·togethercompute·
Introducing Kimi K2.6 from @Kimi_Moonshot, a multimodal agentic model with Agent Swarm scaling to 300 sub-agents and long-horizon coding stability. AI natives can now use Kimi K2.6 on Together AI and benefit from reliable inference for production-scale autonomous agent workflows.…
**DeepSeek** launched the **DeepSeek V3.2** family including Standard, Thinking, and Speciale variants with up to **131K context window** and competitive benchmarks against **GPT-5-High**, **Sonnet 4.5**, and **Gemini 3 Pro**. The release features a novel **Large Scale Agentic Ta…
**DeepSeek's Open Source Week** was summarized by PySpur, highlighting multiple interesting releases. The **Qwen QwQ-32B model** was fine-tuned into **START**, excelling in PhD-level science QA and math benchmarks. **Character-3**, an omnimodal AI video generation model by Hedra …
**DeepSeek Mania** continues to reshape the frontier model landscape with Jiayi Pan from Berkeley reproducing the *OTHER* result from the DeepSeek R1 paper, R1-Zero, in a cost-effective Qwen model fine-tune for two math tasks. A key finding is a lower bound to the distillation ef…
**DeepSeek** released **DeepSeek R1**, a significant upgrade over **DeepSeek V3** from just three weeks prior, featuring 8 models including full-size 671B MoE models and multiple distillations from **Qwen 2.5** and **Llama 3.1/3.3**. The models are MIT licensed, allowing finetuni…
**DeepSeek-V3** has launched with **671B MoE parameters** and trained on **14.8T tokens**, outperforming **GPT-4o** and **Claude-3.5-sonnet** in benchmarks. It was trained with only **2.788M H800 GPU hours**, significantly less than **Llama-3**'s **30.8M GPU-hours**, showcasing m…
**DeepSeek** has released **DeepSeek-R1-Lite-Preview**, an open-source reasoning model achieving **o1-preview-level performance** on math benchmarks with transparent thought processes, showing promise in real-time problem-solving. **NVIDIA** reported a record **$35.1 billion** re…
<p><strong><em>OpenAI DevDay is almost here</em></strong><em>! Per tradition, we are hosting </em><a href="https://lu.ma/devday-pregame" target="_blank"><em>a DevDay pregame event</em></a><em> for everyone coming to town! Join us with demos and gossip!</em></p><p><em>Also sign up…
Hacker News — AI stories ≥50 points
TIER_1·cmrdporcupine·
<p>There is crazy hype and a lot of confusion related to DeepSeek’s latest model DeepSeek R1. The products provided by DeepSeek (their version of a ChatGPT-like app) has exploded in popularity. However, ties to China have raised privacy and geopolitical concerns. In this episode,…
Chińskie laboratorium DeepSeek wypuściło model DeepSeek-V4-Pro, który nie tylko dorównuje zachodniej konkurencji w kodowaniu, ale oferuje go za ułamek ceny. Dzięki innowacyjnej architekturze koszty zostały obniżone o 98%, co stanowi bezpośrednie wyzwanie dla dominujących graczy n…
🧠 # DeepSeek V4 Preview è ufficialmente disponibile e open-source: entriamo nell’era dei modelli con contesto da 1 milione di token davvero sostenibile? 👉 I dettagli: https://www. linkedin.com/posts/alessiopoma ro_deepseek-ollama-llm-activity-7454041633915994112-F4ZO ___ ✉️ 𝗦𝗲 𝘃𝘂…
DeepSeek V4: Million-Token Context That Actually Works DeepSeek V4: Million-Token Context That Actually Works Most long-context models are benchmarks in search of a use case. DeepSeek V4 flips the ... #ai #machinelearning #llm #agents Origin | Interest | Match
DeepSeek-V4 on Day 0: From Fast Inference to Verified RL with SGLang and Miles https://www.lmsys.org/blog/2026-04-25-deepseek-v4/ # HackerNews # Tech # AI