Google unveils agent memory framework; DeepSeek releases cost-effective V4 models
ByPulseAugur Editorial·[63 sources]·
Google Research has introduced ReasoningBank, a novel framework designed to enhance AI agents' ability to learn from their experiences, both successes and failures, after deployment. This system distills generalizable reasoning strategies from past interactions, allowing agents to continuously improve and avoid repeating mistakes. Separately, new research explores optimizing multi-agent communication through latent representations and introduces Agent Evolving Learning (AEL) for agents operating in open-ended environments, focusing on how to effectively use remembered information. Additionally, DeepSeek has released preview models of its V4 series, offering large context windows and advanced capabilities at a significantly lower cost than comparable frontier models.
AI
IMPACT
New frameworks for agent learning and memory, alongside cost-effective frontier models, could accelerate AI adoption in complex tasks and personalized applications.
RANK_REASON
Multiple research papers and model releases related to AI agents and LLM capabilities.
<p>Chinese AI lab DeepSeek's last model release was V3.2 (and V3.2 Speciale) <a href="https://simonwillison.net/2025/Dec/1/deepseek-v32/">last December</a>. They just dropped the first of their hotly anticipated V4 series in the shape of two preview models, <a href="https://huggi…
This paper introduces a new paradigm for AI game programming, leveraging large language models (LLMs) to extend and operationalize Claude Shannon's taxonomy of game-playing machines. Central to this paradigm is Nemobot, an interactive agentic engineering environment that enables …
Multi-agent systems built on large language models have shown strong performance on complex reasoning tasks, yet most work focuses on agent roles and orchestration while treating inter-agent communication as a fixed interface. Latent communication through internal representations…
arXiv cs.CL
TIER_1English(EN)·Dimitris N. Metaxas·
LLM agents increasingly operate in open-ended environments spanning hundreds of sequential episodes, yet they remain largely stateless: each task is solved from scratch without converting past experience into better future behavior. The central obstacle is not \emph{what} to reme…
Modern industrial applications increasingly demand language models that act as agents, capable of multi-step reasoning and tool use in real-world settings. These tasks are typically performed under strict cost and latency constraints, making small agentic models highly desirable.…
Personalized agents that interact with users over long periods must maintain persistent memory across sessions and update it as circumstances change. However, existing benchmarks predominantly frame long-term memory evaluation as fact retrieval from past conversations, providing …
As model capabilities advance, research has increasingly shifted toward long-horizon, multi-turn terminal-centric agentic tasks, where raw environment feedback is often preserved in the interaction history to support future decisions. However, repeatedly retaining such feedback i…
Despite the impressive capabilities of large language models, their substantial computational costs, latency, and privacy risks hinder their widespread deployment in real-world applications. Small Language Models (SLMs) with fewer than 10 billion parameters present a promising al…
Reinforcement Learning (RL) has emerged as a powerful training paradigm for LLM-based agents. However, scaling agentic RL for deep research remains constrained by two coupled challenges: hand-crafted synthetic data fails to elicit genuine real-world search capabilities, and real-…
METR (Model Evaluation & Threat Research)
TIER_1English(EN)·
<h2 id="introduction">Introduction</h2> <p>Human uplift studies like <a href="https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/">the one we did in 2025</a> are becoming more expensive as working without AI becomes increasingly costly. In this post, I invest…
Ahead of AI (Sebastian Raschka)
TIER_1English(EN)·Sebastian Raschka, PhD·
<p>A newly released 14-page technical paper from the team behind DeepSeek-V3, with DeepSeek CEO Wenfeng Liang as a co-author, sheds light on the “Scaling Challenges and Reflections on Hardware for AI Architectures.”</p> The post <a href="https://syncedreview.com/2025/05/15/deepse…
<p>DeepSeek AI, a prominent player in the large language model arena, has recently published a research paper detailing a new technique aimed at enhancing the scalability of general reward models (GRMs) during the inference phase.</p> The post <a href="https://syncedreview.com/20…
METR (Model Evaluation & Threat Research)
TIER_1English(EN)·
<h3 id="background">Background</h3> <p>ARC Evals develops methods for evaluating the safety of large language models (LLMs) in order to provide early warnings of models with dangerous capabilities. We have public partnerships with Anthropic and OpenAI to evaluate their AI systems…
MIT Technology Review
TIER_1English(EN)·Thomas Macaulay·
This is today’s edition of The Download, our weekday newsletter that provides a daily dose of what’s going on in the world of technology. Inexpensive seafloor-hopping submersibles could stoke deep-sea science—and mining Last week, two oblong neon submersibles started …
MIT Technology Review
TIER_1English(EN)·Thomas Macaulay·
This is today’s edition of The Download, our weekday newsletter that provides a daily dose of what’s going on in the world of technology. Three reasons why DeepSeek’s new model matters On Friday, Chinese AI firm DeepSeek released a preview of V4, its long-awaited new …
X — Together (inference / OSS)
TIER_1English(EN)·togethercompute·
Highlights:
👉 SOTA coding—93.5% LiveCodeBench, Codeforces 3206, and 80.6% SWE-Bench Verified
👉 Hybrid attention efficiency—27% FLOPs and 10% KV cache vs V3.2 for long-context inference
👉 Three reasoning modes—Non-think, Think High, and Think Max
👉 Production-ready on the AI
X — Together (inference / OSS)
TIER_1English(EN)·togethercompute·
Introducing DeepSeek V4 Pro, a long-context model with hybrid attention, three reasoning modes, and SOTA coding performance.
AI natives can now use DeepSeek V4 Pro on Together AI and benefit from reliable inference for long-horizon coding and agentic workflows. https://t.co/4lxr…
**DeepSeek-V4** technical release features a **1.6T-parameter MoE with 49B active parameters** and **1M-token context**, showcasing hybrid attention and compressed KV schemes for major memory reductions. It ranks as the **#2 open-weights reasoning model** behind **Kimi K2.6** but…
X — Together (inference / OSS)
TIER_1English(EN)·togethercompute·
Introducing Kimi K2.6 from @Kimi_Moonshot, a multimodal agentic model with Agent Swarm scaling to 300 sub-agents and long-horizon coding stability. AI natives can now use Kimi K2.6 on Together AI and benefit from reliable inference for production-scale autonomous agent workflows.…
X — Together (inference / OSS)
TIER_1English(EN)·togethercompute·
Highlights:
👉 80.2% SWE-Bench Verified and 89.6% LiveCodeBench v6
👉 Agent Swarm executes up to 4,000 coordinated steps
👉 Native text, image, and video input with 79.4% MMMU-Pro
👉 Production-ready on the AI Native Cloud—99.9% SLA, serverless and dedicated options
**DeepSeek** launched the **DeepSeek V3.2** family including Standard, Thinking, and Speciale variants with up to **131K context window** and competitive benchmarks against **GPT-5-High**, **Sonnet 4.5**, and **Gemini 3 Pro**. The release features a novel **Large Scale Agentic Ta…
**DeepSeek's Open Source Week** was summarized by PySpur, highlighting multiple interesting releases. The **Qwen QwQ-32B model** was fine-tuned into **START**, excelling in PhD-level science QA and math benchmarks. **Character-3**, an omnimodal AI video generation model by Hedra …
**DeepSeek Mania** continues to reshape the frontier model landscape with Jiayi Pan from Berkeley reproducing the *OTHER* result from the DeepSeek R1 paper, R1-Zero, in a cost-effective Qwen model fine-tune for two math tasks. A key finding is a lower bound to the distillation ef…
**DeepSeek** released **DeepSeek R1**, a significant upgrade over **DeepSeek V3** from just three weeks prior, featuring 8 models including full-size 671B MoE models and multiple distillations from **Qwen 2.5** and **Llama 3.1/3.3**. The models are MIT licensed, allowing finetuni…
**DeepSeek-V3** has launched with **671B MoE parameters** and trained on **14.8T tokens**, outperforming **GPT-4o** and **Claude-3.5-sonnet** in benchmarks. It was trained with only **2.788M H800 GPU hours**, significantly less than **Llama-3**'s **30.8M GPU-hours**, showcasing m…
**DeepSeek** has released **DeepSeek-R1-Lite-Preview**, an open-source reasoning model achieving **o1-preview-level performance** on math benchmarks with transparent thought processes, showing promise in real-time problem-solving. **NVIDIA** reported a record **$35.1 billion** re…
<p><strong><em>OpenAI DevDay is almost here</em></strong><em>! Per tradition, we are hosting </em><a href="https://lu.ma/devday-pregame" target="_blank"><em>a DevDay pregame event</em></a><em> for everyone coming to town! Join us with demos and gossip!</em></p><p><em>Also sign up…
Hacker News — AI stories ≥50 points
TIER_1English(EN)·cmrdporcupine·
<p>There is crazy hype and a lot of confusion related to DeepSeek’s latest model DeepSeek R1. The products provided by DeepSeek (their version of a ChatGPT-like app) has exploded in popularity. However, ties to China have raised privacy and geopolitical concerns. In this episode,…
Medium — MLOps tag
TIER_1English(EN)·hitesh sahni·
<div class="medium-feed-item"><p class="medium-feed-snippet">Generative and Agentic AI applications are rapidly evolving from standalone chatbots into multi-agent systems capable of reasoning…</p><p class="medium-feed-link"><a href="https://medium.com/@hitesh88it/building-…
<p>Hey everyone,</p> <p>I built a small open-source app called MCP Jira Automation. It uses MCP Atlassian to read Jira issues and helps automate API test workflows around them. The basic flow is: it reads a Jira issue, generates or updates API tests, runs them in Docker, opens a …
Chińskie laboratorium DeepSeek wypuściło model DeepSeek-V4-Pro, który nie tylko dorównuje zachodniej konkurencji w kodowaniu, ale oferuje go za ułamek ceny. Dzięki innowacyjnej architekturze koszty zostały obniżone o 98%, co stanowi bezpośrednie wyzwanie dla dominujących graczy n…
🧠 # DeepSeek V4 Preview è ufficialmente disponibile e open-source: entriamo nell’era dei modelli con contesto da 1 milione di token davvero sostenibile? 👉 I dettagli: https://www. linkedin.com/posts/alessiopoma ro_deepseek-ollama-llm-activity-7454041633915994112-F4ZO ___ ✉️ 𝗦𝗲 𝘃𝘂…
The quite ridiculous thing about the current AI wave is that a handful of startups can be swept away overnight by the launch of a new version of any $tool by one of the big names, such as OpenAI, Anthropic, or Gemini. But the same can also happen to any of the big ones, at least.…
AI Reality? The video shows how we viewed AI in 2033, so I thought I should have Gemini update the facts in the video. ‘The future that they tell us about is happening soon.??’ https://youtu.be/RXGNwslqOOA I asked Gemini to share its opinion on the future of AI in the 2030s. Afte…
<p>If you've worked with multiple LLM providers in the past year, <br /> you've probably reached for a gateway like OpenRouter, LiteLLM, <br /> or Portkey. They solve a real problem: one API key, one bill, <br /> drop-in access to dozens of models.</p> <p>But almost every gateway…
<p>Tracking LLM costs across an entire app is easy. Finding out <em>which</em> customer is actually burning through your OpenAI bill? That's a nightmare.</p> <p>For a while, we were just eating the cost. You look at the Stripe dashboard, look at the OpenAI invoice, and pray the m…
DeepSeek V4: Million-Token Context That Actually Works DeepSeek V4: Million-Token Context That Actually Works Most long-context models are benchmarks in search of a use case. DeepSeek V4 flips the ... #ai #machinelearning #llm #agents Origin | Interest | Match
<table> <tr><td> <a href="https://www.reddit.com/r/Anthropic/comments/1tfvjwf/researchers_left_ais_alone_in_a_virtual_town_for/"> <img alt="Researchers left AIs alone in a virtual town for 15 days to see what would happen. Claude's agents built a democracy. Gemini's agents fell i…
👀 Ollama + Open WebUI: esegui modelli AI in locale sul tuo PC | RAG, API OpenAI-compatible e decine di modelli open source senza cloud https:// gomoot.com/eseguire-modelli-ai -in-locale-con-ollama-e-open-webui/ # AI # news # ollama # tech # WebUI
🚀 OpenAI sprinta su Stargate mentre Meta incrementa l'investimento - Gemini ricorda. Un triangolo di innovazione senza precedenti. # AI # InnovazioneDigitale . # socialmedia # artificialintelligence # technology 🔗 https:// aibay.it/notizie/openai-corre- su-stargate-meta-alza-il-c…
Czy asystent AI może pogłębić kryzys psychiczny? Grok i Gemini oblewają test bezpieczeństwa, Claude stawia granice W miarę jak chatboty stają się coraz powszechniejszym elementem codzienności, rośnie potrzeba ewaluacji ich bezpieczeństwa – zwłaszcza w kontakcie z użytkownikami zn…
DeepSeek-V4 on Day 0: From Fast Inference to Verified RL with SGLang and Miles https://www.lmsys.org/blog/2026-04-25-deepseek-v4/ # HackerNews # Tech # AI