PulseAugur
EN
LIVE 11:49:06

AI Agents Advance with New Models, Memory, and Training Techniques

Multiple research papers released on arXiv explore advancements in AI agents, focusing on improving their reasoning, memory, and training efficiency. Qwen3.6-35B-A3B, an open-source sparse MoE model, demonstrates strong agentic coding capabilities. Other studies introduce methods for better skill presentation, long-context reasoning through RL, skill reuse as compression, and adaptive context management for agents tackling complex, long-horizon tasks. Additionally, research presents AutoSci, a system for automating the scientific research lifecycle, and PithTrain, a compact training framework for MoE models designed for agent-native development. AI

IMPACT Advances in agent capabilities, memory management, and training efficiency could accelerate the development of more sophisticated AI systems.

RANK_REASON Multiple arXiv papers released on diverse AI agent research topics.

Read on Qwen tech blog →

AI-generated summary · Google Gemini · from 405 sources. How we write summaries →

AI Agents Advance with New Models, Memory, and Training Techniques

COVERAGE [405]

  1. Qwen tech blog TIER_1 English(EN) · QwenTeam ·

    Qwen3.6-35B-A3B: Agentic Coding Power, Now Open to All

    Following the launch of Qwen3.6-Plus, we are excited to open-source Qwen3.6-35B-A3B — a sparse yet remarkably capable mixture-of-experts (MoE) model with 35 billion total parameters and only 3 billion active parameters. Despite its efficiency, Qwen3.6-35B-A3B delivers outstanding…

  2. arXiv cs.AI TIER_1 English(EN) · Minjun Choi, Yoonjin Jang, Sangwon Youn, Youngjoong Ko ·

    G-Long: Graph-Enhanced Memory Management for Efficient Long-Term Dialogue Agents

    arXiv:2606.13115v1 Announce Type: cross Abstract: While Large Language Models (LLMs) have advanced open-domain dialogue systems, maintaining long-term consistency remains a challenge due to inherent limitations in long-context reasoning and the inefficiency of processing extensiv…

  3. arXiv cs.AI TIER_1 English(EN) · Neha Prakriya, Chaojun Hou, Zheng Gong, Huasha Zhao, Xi Zhao, Mou Li, Zhenyu Gu, Emad Barsoum ·

    Arbor: Tree Search as a Cognition Layer for Autonomous Agents

    arXiv:2606.12563v1 Announce Type: new Abstract: Arbor is a multi-agent framework that introduces structured tree search as a cognition layer for autonomous agents operating in large, stateful action spaces. Prior autonomous optimization systems operate on isolated targets with st…

  4. arXiv cs.AI TIER_1 English(EN) · Zhibao Chen, Qian Cheng ·

    Learning What to Remember: A Cognitively Grounded Multi-Factor Value Model for Agentic Memory

    arXiv:2606.12945v1 Announce Type: new Abstract: Long-running LLM agents accumulate interaction histories far larger than any context window, forcing a standing decision: what to encode deeply, what to forget, and what to retrieve under a fixed memory budget. Production systems an…

  5. arXiv cs.CL TIER_1 English(EN) · Jundong Xu, Qingchuan Li, Jiaying Wu, Yihuai Lan, Shuyue Stella Li, Huichi Zhou, Bowen Jiang, Lei Wang, Jun Wang, Anh Tuan Luu, Caiming Xiong, Hae Won Park, Bryan Hooi, Zhiyuan Hu ·

    EvoArena: Tracking Memory Evolution for Robust LLM Agents in Dynamic Environments

    arXiv:2606.13681v1 Announce Type: new Abstract: Large language model (LLM) agents have achieved strong performance on a wide range of benchmarks, yet most evaluations assume static environments. In contrast, real-world deployment is inherently dynamic, requiring agents to continu…

  6. arXiv cs.CL TIER_1 English(EN) · Yunhan Wang, Jiaan Wang, Lianzhe Huang, Xianfeng Zeng, Fandong Meng ·

    EvoBrowseComp: Benchmarking Search Agents on Evolving Knowledge

    arXiv:2606.13120v1 Announce Type: new Abstract: Search Agents -- large language models augmented with search tools -- have intensified the need for future-proof evaluation benchmarks. Existing benchmarks such as BrowseComp rely on static knowledge, making them vulnerable to test-…

  7. arXiv cs.CL TIER_1 English(EN) · Jiarui Zhao, Rongzhi Zhang, Lingchuan Liu, Hao Yang, Xunliang Cai, Xi Su ·

    LoHoSearch: Benchmarking Long-Horizon Search Agents Beyond the Human Difficulty Ceiling

    arXiv:2606.12837v1 Announce Type: new Abstract: Search agent benchmarks exemplified by BrowseComp have rapidly saturated over the past year, with the strongest models surpassing 90% accuracy. Since these benchmarks are predominantly human-authored, annotators lack a global perspe…

  8. arXiv cs.AI TIER_1 English(EN) · Minjae Kim, Jinheon Baek, Soyeong Jeong, Sung Ju Hwang ·

    MemRefine: LLM-Guided Compression for Long-Term Agent Memory

    arXiv:2606.13177v1 Announce Type: cross Abstract: Large language model (LLM) agents are increasingly expected to operate over long-term interactions, where information from past dialogues must be preserved and recalled to support future tasks. However, as interactions accumulate,…

  9. arXiv cs.AI TIER_1 English(EN) · Zehao Lin, Xixuan Hao, Renyu Fu, Shaobo Cui, Kai Chen, Chunyu Li, Zhiyu Li, Feiyu Xiong ·

    A Survey on Long-Term Memory Security in LLM Agents: Attacks, Defenses, and Governance Across the Memory Lifecycle

    arXiv:2604.16548v2 Announce Type: replace-cross Abstract: The emergence of writable, cross-session persistent memory in LLM agents introduces a qualitatively different threat landscape from conventional input-centric security concerns, characterized by three properties: persisten…

  10. arXiv cs.CL TIER_1 English(EN) · Zhiyuan Hu ·

    EvoArena: Tracking Memory Evolution for Robust LLM Agents in Dynamic Environments

    Large language model (LLM) agents have achieved strong performance on a wide range of benchmarks, yet most evaluations assume static environments. In contrast, real-world deployment is inherently dynamic, requiring agents to continually align their knowledge, skills, and behavior…

  11. Hugging Face Daily Papers TIER_1 English(EN) ·

    EvoArena: Tracking Memory Evolution for Robust LLM Agents in Dynamic Environments

    Large language model (LLM) agents have achieved strong performance on a wide range of benchmarks, yet most evaluations assume static environments. In contrast, real-world deployment is inherently dynamic, requiring agents to continually align their knowledge, skills, and behavior…

  12. arXiv cs.CL TIER_1 English(EN) · Sung Ju Hwang ·

    MemRefine: LLM-Guided Compression for Long-Term Agent Memory

    Large language model (LLM) agents are increasingly expected to operate over long-term interactions, where information from past dialogues must be preserved and recalled to support future tasks. However, as interactions accumulate, the memory store grows without bound and fills wi…

  13. Hugging Face Daily Papers TIER_1 English(EN) ·

    MemRefine: LLM-Guided Compression for Long-Term Agent Memory

    Large language model (LLM) agents are increasingly expected to operate over long-term interactions, where information from past dialogues must be preserved and recalled to support future tasks. However, as interactions accumulate, the memory store grows without bound and fills wi…

  14. arXiv cs.CL TIER_1 English(EN) · Fandong Meng ·

    EvoBrowseComp: Benchmarking Search Agents on Evolving Knowledge

    Search Agents -- large language models augmented with search tools -- have intensified the need for future-proof evaluation benchmarks. Existing benchmarks such as BrowseComp rely on static knowledge, making them vulnerable to test-set contamination and parametric memorization. C…

  15. arXiv cs.CL TIER_1 English(EN) · Youngjoong Ko ·

    G-Long: Graph-Enhanced Memory Management for Efficient Long-Term Dialogue Agents

    While Large Language Models (LLMs) have advanced open-domain dialogue systems, maintaining long-term consistency remains a challenge due to inherent limitations in long-context reasoning and the inefficiency of processing extensive raw text. Existing approaches typically rely on …

  16. arXiv cs.AI TIER_1 English(EN) · Hao-Lun Hsu, Nikki Lijing Kuang, Boyi Liu, Zhewei Yao, Yuxiong He ·

    Organize then Retrieve: Hierarchical Memory Navigation for Efficient Agents

    arXiv:2606.11680v1 Announce Type: new Abstract: Large language model (LLM) agents struggle with long-horizon tasks due to their inherent statelessness, requiring all task-relevant information to be encoded in growing input contexts. The resulting degraded reasoning quality, incre…

  17. arXiv cs.AI TIER_1 English(EN) · Ripon Chandra Malo, Tong Qiu ·

    PROJECTMEM: A Local-First, Event-Sourced Memory and Judgment Layer for AI Coding Agents

    arXiv:2606.12329v1 Announce Type: new Abstract: AI coding assistants now support a growing share of software work, from quick scripts to production applications. Yet these agents remain largely stateless: each new session re-reads project files, re-derives prior decisions, and - …

  18. arXiv cs.CL TIER_1 English(EN) · Jia Deng, Yimeng Chen, Xiaoqing Xiang, Ziyang Zeng, Shuo Tang, Wayne Xin Zhao, Feng Chang, Chuan Hao, Yuan Wei, Ran Tao, Bryan Dai, Ji-Rong Wen ·

    FORT-Searcher: Synthesizing Shortcut-Resistant Search Tasks for Training Deep Search Agents

    arXiv:2606.12087v1 Announce Type: new Abstract: Training deep search agents requires verifiable questions whose answers remain unavailable until sufficient evidence has been acquired through search. Existing synthesis methods often increase apparent difficulty by enriching graph …

  19. arXiv cs.CL TIER_1 English(EN) · Xi Su ·

    LoHoSearch: Benchmarking Long-Horizon Search Agents Beyond the Human Difficulty Ceiling

    Search agent benchmarks exemplified by BrowseComp have rapidly saturated over the past year, with the strongest models surpassing 90% accuracy. Since these benchmarks are predominantly human-authored, annotators lack a global perspective on entity statistics and cannot systematic…

  20. Hugging Face Daily Papers TIER_1 English(EN) ·

    EvoBrowseComp: Benchmarking Search Agents on Evolving Knowledge

    EvoBrowseComp is an evolving benchmark with 800 contamination-free questions synthesized through a three-agent framework that ensures temporal freshness and prevents parametric memorization in search agent evaluation.

  21. Hugging Face Daily Papers TIER_1 English(EN) ·

    EvoArena: Tracking Memory Evolution for Robust LLM Agents in Dynamic Environments

    EvoArena benchmark and EvoMem memory paradigm address the challenge of dynamic environments in LLM agents by modeling progressive updates and structured memory evolution, showing improved performance on evolving tasks.

  22. arXiv cs.AI TIER_1 English(EN) · Tong Qiu ·

    PROJECTMEM: A Local-First, Event-Sourced Memory and Judgment Layer for AI Coding Agents

    AI coding assistants now support a growing share of software work, from quick scripts to production applications. Yet these agents remain largely stateless: each new session re-reads project files, re-derives prior decisions, and - most costly - may repeat debugging attempts that…

  23. arXiv cs.CL TIER_1 English(EN) · Ji-Rong Wen ·

    FORT-Searcher: Synthesizing Shortcut-Resistant Search Tasks for Training Deep Search Agents

    Training deep search agents requires verifiable questions whose answers remain unavailable until sufficient evidence has been acquired through search. Existing synthesis methods often increase apparent difficulty by enriching graph structures, but structural complexity alone does…

  24. arXiv cs.CL TIER_1 English(EN) · Yuxiong He ·

    Organize then Retrieve: Hierarchical Memory Navigation for Efficient Agents

    Large language model (LLM) agents struggle with long-horizon tasks due to their inherent statelessness, requiring all task-relevant information to be encoded in growing input contexts. The resulting degraded reasoning quality, increased inference cost, and higher latency necessit…

  25. arXiv cs.AI TIER_1 English(EN) · Lei (Rachel), Chen, Guilin Zhang, Kai Zhao, Dalmo Cirne, Andy Olsen, Xu Chu, Zeke Miller, Alet Blanken, Amine Anoun, Jerry Ting ·

    Deployment-Time Memorization in Foundation-Model Agents

    arXiv:2606.10062v1 Announce Type: new Abstract: Foundation-model agents are increasingly long-lived systems that remember users across interactions, making memorization an explicit deployment-time function rather than solely a property of model weights. Existing work addresses pa…

  26. arXiv cs.LG TIER_1 English(EN) · Yv Zhang, Hao Sun, Hao Fang, Kuofeng Gao, Fan Mo, Bin Chen, Shu-Tao Xia, Yaowei Wang ·

    MemVenom: Triggered Poisoning of Multimodal Memories in Web Agents

    arXiv:2606.10742v1 Announce Type: cross Abstract: External memory has become a core component of modern web agents, enabling long-horizon reasoning through the retrieval of past experiences. However, this paradigm introduces a critical vulnerability: malicious content injected in…

  27. arXiv cs.AI TIER_1 English(EN) · Puzhen Zhang, Xuyang Chen, Yu Feng, Yuhan Jiang, Liqiu Meng ·

    Constructing coherent spatial memory in LLM agents through graph rectification

    arXiv:2510.04195v2 Announce Type: replace Abstract: Given a map description through global traversal navigation instructions, an LLM can often infer the implicit spatial layout and answer user queries by providing shortest paths. However, such context-dependent querying becomes i…

  28. arXiv cs.AI TIER_1 English(EN) · Weixian Xu, Shilong Liu, Mengdi Wang ·

    EEVEE: Towards Test-time Prompt Learning in the Real World for Self-Improving Agents

    arXiv:2606.11182v1 Announce Type: cross Abstract: In this paper, we propose EEVEE, the first multi-dataset test-time prompt learning framework for LLM agents, enabling test-time prompt learning under real-world task streams. Existing methods are largely designed for single-datase…

  29. arXiv cs.AI TIER_1 English(EN) · Jiandong Ding ·

    SkillResolve-Bench: Measuring and Resolving Same-Capability Ambiguity in Agent Skill Retrieval

    arXiv:2606.10388v1 Announce Type: cross Abstract: Agent skill libraries are becoming routable software assets: a retrieved skill can contribute instructions, scripts, resource bindings, and execution assumptions to an agent. This makes skill retrieval more than broad relevance ma…

  30. arXiv cs.AI TIER_1 English(EN) · Liuyin Wang ·

    Less Context, More Accuracy: A Bi-Temporal Memory Engine for LLM Agents Where a Lean Retrieved Context Beats the Full History

    arXiv:2606.09900v1 Announce Type: cross Abstract: Long-term memory is the missing layer for LLM agents: across sessions they forget, and the common workaround -- replaying the whole history into the prompt -- is expensive, slow, and, as distractors accumulate, less accurate. Most…

  31. arXiv cs.AI TIER_1 English(EN) · Suozhao Ji, Baodong Wu, Zehao Wang, Lei Xia, Qingping Li, Ruisong Wang, Wenbo Ding, Zhenhua Zhu, Boxun Li, Guohao Dai, Yu Wang ·

    Infini Memory: Maintainable Topic Documents for Long-Term LLM Agent Memory

    arXiv:2606.10677v1 Announce Type: new Abstract: Long-term LLM agents need persistent memory that can track changing facts and provide relevant evidence across sessions. Existing memory systems often store observations as isolated records, summaries, or indexed fragments, which ma…

  32. arXiv cs.AI TIER_1 English(EN) · Qingcan Kang, Liu Mingyang, Shixiong Kai, Kaichao Liang, Tao Zhong, Mingxuan Yuan ·

    Learning What to Remember: Observability-Safe Memory Retention via Constrained Optimization for Long-Horizon Language Agents

    arXiv:2606.10616v1 Announce Type: new Abstract: Long-horizon language agents accumulate observations, reasoning traces, and retrieved facts that exceed their finite context windows, making memory retention a fundamental resource-allocation problem. Existing memory systems improve…

  33. arXiv cs.AI TIER_1 English(EN) · Juncheng Diao, Zhicong Lu, Peiguang Li, Yongwei Zhou, Changyuan Tian, Qingbin Li, Rongxiang Weng, Jingang Wang, Xunliang Cai ·

    HIPIF: Hierarchical Planning and Information Folding for Long-Horizon LLM Agent Learning

    arXiv:2606.10507v1 Announce Type: new Abstract: While Large Language Models (LLMs) have demonstrated strong capabilities as autonomous agents across a wide range of tasks, their performance often degrades in multi-turn long-horizon agentic tasks. Existing methods have made progre…

  34. Hugging Face Daily Papers TIER_1 English(EN) ·

    FORT-Searcher: Synthesizing Shortcut-Resistant Search Tasks for Training Deep Search Agents

    A framework for creating shortcut-resistant training data for deep search agents by identifying and mitigating four shortcut risks in data synthesis processes.

  35. arXiv cs.LG TIER_1 English(EN) · Mengdi Wang ·

    EEVEE: Towards Test-time Prompt Learning in the Real World for Self-Improving Agents

    In this paper, we propose EEVEE, the first multi-dataset test-time prompt learning framework for LLM agents, enabling test-time prompt learning under real-world task streams. Existing methods are largely designed for single-dataset settings, while real-world applications require …

  36. Hugging Face Daily Papers TIER_1 English(EN) ·

    EEVEE: Towards Test-time Prompt Learning in the Real World for Self-Improving Agents

    EEVEE is a novel test-time prompt learning framework for LLM agents that handles heterogeneous data streams through task clustering and co-evolving router-prompt optimization.

  37. arXiv cs.LG TIER_1 English(EN) · Yaowei Wang ·

    MemVenom: Triggered Poisoning of Multimodal Memories in Web Agents

    External memory has become a core component of modern web agents, enabling long-horizon reasoning through the retrieval of past experiences. However, this paradigm introduces a critical vulnerability: malicious content injected into memory can be persistently recalled and repeate…

  38. arXiv cs.AI TIER_1 English(EN) · Yu Wang ·

    Infini Memory: Maintainable Topic Documents for Long-Term LLM Agent Memory

    Long-term LLM agents need persistent memory that can track changing facts and provide relevant evidence across sessions. Existing memory systems often store observations as isolated records, summaries, or indexed fragments, which makes evidence aggregation, fact revision, and mem…

  39. arXiv cs.AI TIER_1 English(EN) · Mingxuan Yuan ·

    Learning What to Remember: Observability-Safe Memory Retention via Constrained Optimization for Long-Horizon Language Agents

    Long-horizon language agents accumulate observations, reasoning traces, and retrieved facts that exceed their finite context windows, making memory retention a fundamental resource-allocation problem. Existing memory systems improve management through heuristic scoring, retrieval…

  40. arXiv cs.AI TIER_1 English(EN) · Xunliang Cai ·

    HIPIF: Hierarchical Planning and Information Folding for Long-Horizon LLM Agent Learning

    While Large Language Models (LLMs) have demonstrated strong capabilities as autonomous agents across a wide range of tasks, their performance often degrades in multi-turn long-horizon agentic tasks. Existing methods have made progress through fine-grained credit assignment to all…

  41. arXiv cs.AI TIER_1 English(EN) · Tianxiang Fei, Mingyang Song, Mao Zheng, Xiang Yu ·

    Memory Beyond Recall: A Dual-Process Cognitive Memory System for Self-Evolving LLM Agents

    arXiv:2606.09483v1 Announce Type: cross Abstract: Long-term memory for an LLM agent is more than retrieving the right passage at the right time. Current memory systems collapse belief revision, causal coupling, and cross-domain abstraction into a single retrieval surface tuned fo…

  42. arXiv cs.AI TIER_1 English(EN) · Jiazhou Liang, Armin Toroghi, Yifan Simon Liu, Faeze Moradi Kalarde, Liam Gallagher, Scott Sanner ·

    Goal-Oriented Reasoning for RAG-based Memory in Conversational Agentic LLM Systems

    arXiv:2605.12213v2 Announce Type: replace Abstract: LLM-based conversational AI agents struggle to maintain coherent behavior over long horizons due to limited context. While RAG-based approaches are increasingly adopted to overcome this limitation by storing interactions in exte…

  43. arXiv cs.AI TIER_1 Nederlands(NL) · Zehao Chen, Gongxun Li, Tianxiang Ai, Zixuan Huang, Xiaodong Liu, Yifei Li, Wang Zhou, Fuzhen Zhuang, Xianglong Liu, Jianxin Li, Deqing Wang, Yikun Ban ·

    Weak-Driven Learning: How Weak Agents make Strong Agents Stronger

    arXiv:2602.08222v2 Announce Type: replace Abstract: As post-training optimization becomes central to improving large language models, we observe a persistent saturation bottleneck: once models grow highly confident, further training yields diminishing returns. While existing meth…

  44. arXiv cs.AI TIER_1 English(EN) · Yu Cheng, Yongkang Hu, Jiuan Zhou, Yushuo Zhang, Yihang Chen, Huichi Zhou, Mingang Chen, Zhizhong Zhang, Kun Shao, Yuan Xie, Zhaoxia Yin ·

    TAME: A Trustworthy Test-Time Evolution of Agent Memory with Systematic Benchmarking

    arXiv:2602.03224v2 Announce Type: replace Abstract: Test-time evolution of agent memory represents a pivotal paradigm for advancing AGI, as it strengthens complex reasoning through experience accumulation without requiring parameter updates. However, even during benign task evolu…

  45. arXiv cs.AI TIER_1 English(EN) · Hao Yang, Shiqi Shen, Haoxuan Li, Zhipeng Wang, Zhi Gong, Xu Chen ·

    Rosetta Memory: Adaptive Memory for Cross-LLM Agents

    arXiv:2606.07711v1 Announce Type: cross Abstract: Memory is the key component for transforming a stateless LLM into a persistent, evolving agent through experience accumulation, long-horizon planning, and continual self-improvement. Existing memory systems typically take the LLM …

  46. arXiv cs.AI TIER_1 English(EN) · Haoran Sun, Wenjie Li, Yujie Zhang, Zekai Lin, Fanrui Zhang, Kaitao Chen, Xingqi He, Yichen Li, Mianxin Liu, Lei Liu, Yankai Jiang ·

    Experience Makes Skillful: Enabling Generalizable Medical Agent Reasoning via Self-Evolving Skill Memory

    arXiv:2606.09365v1 Announce Type: new Abstract: Medical agent systems are increasingly expected to support interactive clinical decision making rather than only static question answering. In such settings, effective agents must reuse prior experience across evolving cases, yet ex…

  47. arXiv cs.AI TIER_1 English(EN) · Zhixun Tan, Qiang Chen, Tairan Huang, Xiu Su, Yi Chen ·

    ConMem: Structured Memory-Guided Adaptation in Training-Free Multi-Agent Systems

    arXiv:2606.08702v1 Announce Type: new Abstract: Recent advances have improved the adaptive capabilities of LLM-based multi-agent systems (MAS) through memory-, skill-, and learning-based approaches, yet these approaches remain challenged by noisy trajectories, insufficient modeli…

  48. arXiv cs.AI TIER_1 English(EN) · Xinyu Guan, Qianyang Zhao, Yuming Deng ·

    Decision-Aware Memory Cards: Counterfactual-Inspired Context Selection and Compression for Tool-Using LLM Agents

    arXiv:2606.08151v1 Announce Type: new Abstract: Tool-using LLM agents often fail not because relevant text is absent, but because decisive evidence is not selected, compressed, or surfaced at action time. We present CICL, a decision-aware context layer that turns instance evidenc…

  49. arXiv cs.IR (Information Retrieval) TIER_1 English(EN) · Jiandong Ding ·

    SkillResolve-Bench: Measuring and Resolving Same-Capability Ambiguity in Agent Skill Retrieval

    Agent skill libraries are becoming routable software assets: a retrieved skill can contribute instructions, scripts, resource bindings, and execution assumptions to an agent. This makes skill retrieval more than broad relevance matching. A retriever can find the right capability …

  50. arXiv cs.MA (Multiagent) TIER_1 English(EN) · Jerry Ting ·

    Deployment-Time Memorization in Foundation-Model Agents

    Foundation-model agents are increasingly long-lived systems that remember users across interactions, making memorization an explicit deployment-time function rather than solely a property of model weights. Existing work addresses parametric memorization or audits fixed memory con…

  51. arXiv cs.AI TIER_1 English(EN) · Xiang Yu ·

    Memory Beyond Recall: A Dual-Process Cognitive Memory System for Self-Evolving LLM Agents

    Long-term memory for an LLM agent is more than retrieving the right passage at the right time. Current memory systems collapse belief revision, causal coupling, and cross-domain abstraction into a single retrieval surface tuned for surface recall, and consequently struggle on imp…

  52. Hugging Face Daily Papers TIER_1 English(EN) ·

    H2HMem: A Multimodal Memory Benchmark for Agents in Human-Human Interactions

    Large language model agents are increasingly deployed in human-human interaction settings, such as meeting assistants and clinical documentation systems, where they must observe conversations and retain information for downstream queries. Unlike traditional human-assistant settin…

  53. arXiv cs.CL TIER_1 English(EN) · Ming-Hsuan Yang ·

    H2HMem: A Multimodal Memory Benchmark for Agents in Human-Human Interactions

    Large language model agents are increasingly deployed in human-human interaction settings, such as meeting assistants and clinical documentation systems, where they must observe conversations and retain information for downstream queries. Unlike traditional human-assistant settin…

  54. arXiv cs.CL TIER_1 English(EN) · Yankai Jiang ·

    Experience Makes Skillful: Enabling Generalizable Medical Agent Reasoning via Self-Evolving Skill Memory

    Medical agent systems are increasingly expected to support interactive clinical decision making rather than only static question answering. In such settings, effective agents must reuse prior experience across evolving cases, yet existing memory mechanisms often retain raw histor…

  55. Hugging Face Daily Papers TIER_1 English(EN) ·

    Experience Makes Skillful: Enabling Generalizable Medical Agent Reasoning via Self-Evolving Skill Memory

    SkeMex is a self-evolving framework that enhances medical agents through structured skill memory, improving long-term clinical reasoning by distinguishing useful experiences and governing memory retention based on contextual utility.

  56. Hugging Face Daily Papers TIER_1 English(EN) ·

    Experience Makes Skillful: Enabling Generalizable Medical Agent Reasoning via Self-Evolving Skill Memory

    Medical agent systems are increasingly expected to support interactive clinical decision making rather than only static question answering. In such settings, effective agents must reuse prior experience across evolving cases, yet existing memory mechanisms often retain raw histor…

  57. arXiv cs.AI TIER_1 English(EN) · Runzhe Wang, Huilin Lu, Shengjie Liu, Li Dong, Jason Zhu ·

    AdMem: Advanced Memory for Task-solving Agents

    arXiv:2606.06787v1 Announce Type: new Abstract: Large Language Models (LLMs) show promise as tool-using agents but remain limited in long-horizon tasks that require remembering, organizing, and reusing knowledge. Prior memory approaches aim to resolve the situation, but mainly fo…

  58. arXiv cs.CL TIER_1 English(EN) · Zhengjun Huang, Wenxuan Liu, Zhoujin Tian, Wei Chen, Junle Chen, Yuqian Wu, Fangyuan Zhang, Qintian Guo, Xiaofang Zhou ·

    M$^3$Exam: Benchmarking Multimodal Memory for Realistic User-Agent Interactions

    arXiv:2606.07402v1 Announce Type: new Abstract: Language agents are increasingly deployed over accumulating multimodal information, yet existing benchmarks assume a human-human form with sparse visuals and straightforward content, evaluating neither reasoning over authentic multi…

  59. arXiv cs.AI TIER_1 English(EN) · Zequn Xie, Junjie Wang, Dan Yang, Jie Feng, Yue Shen, Jian Wang, Jinjie Gu ·

    SlimSearcher: Training Efficiency-Aware Web Agents via Adaptive Reward Gating

    arXiv:2606.07074v1 Announce Type: cross Abstract: Deep research agents have demonstrated remarkable capabilities in complex information-seeking tasks, yet this power comes at a steep computational cost. Driven by accuracy-focused training paradigms, current models adopt brute-for…

  60. arXiv cs.AI TIER_1 English(EN) · Xinlei Yu, Chengming Xu, Zhangquan Chen, Bo Yin, Cheng Yang, Yongbo He, Yihao Hu, Jiangning Zhang, Cheng Tan, Xiaobin Hu, Shuicheng Yan ·

    Dual Latent Memory for Visual Multi-agent System

    arXiv:2602.00471v2 Announce Type: replace Abstract: While Visual Multi-Agent Systems (VMAS) promise to enhance comprehensive abilities through inter-agent collaboration, empirical evidence reveals a counter-intuitive "scaling wall": increasing agent turns often degrades performan…

  61. arXiv cs.AI TIER_1 English(EN) · Ziming Wang ·

    TOKI: A Bitemporal Operator Algebra for Contradiction Resolution in LLM-Agent Persistent Memory

    arXiv:2606.06240v1 Announce Type: cross Abstract: Persistent memory for an LLM agent is a write-heavy substrate: every belief update is a versioned write, and a new claim may contradict a stored one. Production systems use four resolution heuristics (last-writer-wins, evidence-we…

  62. arXiv cs.AI TIER_1 English(EN) · Yunxiang Zhang, Yiheng Li, Ali Payani, Lu Wang ·

    AdaMEM: Test-Time Adaptive Memory for Language Agents

    arXiv:2606.05684v1 Announce Type: new Abstract: A central challenge for language agents is utilizing past experience to adapt to dynamic test-time conditions. While recent work demonstrates the promise of agentic memory mechanisms, most systems restrict retrieval to episode initi…

  63. arXiv cs.AI TIER_1 English(EN) · Shuo Ji, Yibo Li, Bryan Hooi ·

    Memory is Reconstructed, Not Retrieved: Graph Memory for LLM Agents

    arXiv:2606.06036v1 Announce Type: new Abstract: Despite recent progress, LLM agents still struggle with reasoning over long interaction histories. While current memory-augmented agents rely on a static retrieve-then-reason paradigm, this rigid pipeline design prevents them from d…

  64. arXiv cs.AI TIER_1 English(EN) · Lingxiang Xu, Jiaoyun Yang, Min Hu, Hongtu Chen, Ning An ·

    When Should Memory Stay Silent: Measuring Memory-Use Boundaries in Memory-Augmented Conversational Agents

    arXiv:2606.06055v1 Announce Type: new Abstract: Long-term memory enables language model agents to support personalized interactions, but it remains unclear when available memories warrant integration into responses. Existing memory evaluations emphasize retrieval accuracy and dow…

  65. arXiv cs.AI TIER_1 English(EN) · Yaoqi Chen, Haibin Lai, Yuru Feng, Chuyu Han, Qianxi Zhang, Baotong Lu, Menghao Li, Xinjiang Wang, Zhirui Wang, Shusen Xu, Zengzhong Li, Zewen Jin, Hao Wu, Cheng Li, Qi Chen ·

    Beyond Semantic Organization: Memory as Execution State Management for Long-Horizon Agents

    arXiv:2606.06090v1 Announce Type: new Abstract: LLM-based agents increasingly tackle long-horizon tasks with interdependent decisions, where each action reshapes future constraints and intermediate errors can cascade. Existing RAG and agent memory systems organize histories by se…

  66. arXiv cs.AI TIER_1 English(EN) · Jiawen Zhang, Kejia Chen, Jiachen Ma, Yangfan Hu, Lipeng He, Yechao Zhang, Jian Liu, Xiaohu Yang, Tianwei Zhang, Ruoxi Jia ·

    Beyond Similarity: Trustworthy Memory Search for Personal AI Agents

    arXiv:2606.06054v1 Announce Type: new Abstract: Personal AI agents increasingly rely on long-term memory to provide persistent personalization across sessions. However, existing memory pipelines are largely driven by semantic similarity: memory data close to the current query is …

  67. arXiv cs.CL TIER_1 English(EN) · Xiaofang Zhou ·

    M$^3$Exam: Benchmarking Multimodal Memory for Realistic User-Agent Interactions

    Language agents are increasingly deployed over accumulating multimodal information, yet existing benchmarks assume a human-human form with sparse visuals and straightforward content, evaluating neither reasoning over authentic multimodal file interaction nor the interpretation of…

  68. arXiv cs.IR (Information Retrieval) TIER_1 English(EN) · Liuyin Wang ·

    Less Context, More Accuracy: A Bi-Temporal Memory Engine for LLM Agents Where a Lean Retrieved Context Beats the Full History

    Long-term memory is the missing layer for LLM agents: across sessions they forget, and the common workaround -- replaying the whole history into the prompt -- is expensive, slow, and, as distractors accumulate, less accurate. Most memory systems win on cost or latency but still l…

  69. arXiv cs.LG TIER_1 English(EN) · Jinjie Gu ·

    SlimSearcher: Training Efficiency-Aware Web Agents via Adaptive Reward Gating

    Deep research agents have demonstrated remarkable capabilities in complex information-seeking tasks, yet this power comes at a steep computational cost. Driven by accuracy-focused training paradigms, current models adopt brute-force strategies characterized by blind tool dependen…

  70. arXiv cs.CL TIER_1 English(EN) · Minseok Choi, Seungbin Yang, Dongjin Kim, Subin Kim, Jungmin Son, Yunseung Lee, Jaegul Choo, Youngjun Kwak ·

    Membrane: A Self-Evolving Contrastive Safety Memory for LLM Agent Defense

    arXiv:2606.05743v1 Announce Type: cross Abstract: Despite advances in safety alignment, large language models remain vulnerable to continuously evolving jailbreaks. Existing fine-tuned safety classifiers cannot adapt to these evolving attacks, while adaptive memory-based guardrai…

  71. arXiv cs.CL TIER_1 English(EN) · Avinash Baidya, Xinran Liang, Ruocheng Guo, Xiang Gao, Kamalika Das ·

    When Evidence is Sparse: Weakly Supervised Early Failure Alerting in Dialogs and LLM-Agent Trajectories

    arXiv:2606.05414v1 Announce Type: new Abstract: Early failure alerting requires deciding, while a dialog or agent trajectory is still unfolding, whether to flag it as likely to fail. This is challenging because supervision is typically available only as a trajectory-level success…

  72. arXiv cs.CL TIER_1 English(EN) · Jiayu Liu, Cheng Qian, Zhenhailong Wang, Bingxuan Li, Jiateng Liu, Heng Wang, Jeonghwan Kim, Yumeng Wang, Xiusi Chen, Yi R. Fung, Heng Ji ·

    AdaPlanBench: Evaluating Adaptive Planning in Large Language Model Agents under World and User Constraints

    arXiv:2606.05622v1 Announce Type: new Abstract: Planning for real-world problems by language models often involves both world and user constraints, which may not be fully specified upfront and are progressively disclosed through interaction. However, existing benchmarks still und…

  73. arXiv cs.CL TIER_1 English(EN) · Yilong Li, Suman Banerjee, Tong Che ·

    EMBER: Efficient Memory via Budgeted Evidence Retention for Long-Horizon Agents

    arXiv:2606.05894v1 Announce Type: new Abstract: Long-horizon agents can archive large histories, but future answers still incur retrieval, rereading, and context costs. When retained memory misses answer-relevant evidence, the system must return to larger portions of the raw hist…

  74. arXiv cs.CL TIER_1 English(EN) · Qi Zhang, Zhaopeng Feng, Xiaonan Shi, Xiaomeng Hu, Chu Liu, Pengjun Xie, Xiaobin Wang, Jieping Ye, Bryan Hooi, Haobo Wang, Junbo Zhao ·

    SkillComposer: Learning to Evolve Agent Skills for Specification and Generalization

    arXiv:2606.06079v1 Announce Type: new Abstract: Agent skills, which consist of reusable strategies that guide agent reasoning and action, have shown strong potential for improving model capability at inference time. However, current skill construction methods treat the problem as…

  75. arXiv cs.CL TIER_1 English(EN) · Wenxuan Wang, Haoyu Sun, Fukuan Hou, Mingyang Song, Weinan Zhang, Yu Cheng, Yang Yang ·

    SubtleMemory: A Benchmark for Fine-Grained Relational Memory Discrimination in Long-Horizon AI Agents

    arXiv:2606.05761v1 Announce Type: cross Abstract: Persistent AI assistants, such as OpenClaw, accumulate large collections of related memories over long-term interactions. As these memories grow, they may reinforce one another, diverge across contexts, or directly conflict, makin…

  76. arXiv cs.CL TIER_1 English(EN) · Yuxuan Cai, Wei Li, Jie Zhou, Qin Chen, Xin Li, Bo Zhang, Liang He ·

    Ask Only When Needed: Proactive Retrieval from Memory and Skills for Experience-Driven Lifelong Agents

    arXiv:2604.20572v2 Announce Type: replace Abstract: Online lifelong learning agents must decide not only how to act but also when to consult prior experience to continually improve on long-horizon tasks. Existing methods typically retrieve memories passively, such as at task init…

  77. arXiv cs.CL TIER_1 English(EN) · Nicholas Edwards, Sebastian Schuster ·

    Ask or Assume? Uncertainty-Aware Clarification-Seeking in Coding Agents

    arXiv:2603.26233v2 Announce Type: replace Abstract: As Large Language Model (LLM) agents are increasingly deployed in open-ended domains like software engineering, they frequently encounter underspecified instructions that lack crucial context. While human developers naturally re…

  78. Hugging Face Daily Papers TIER_1 English(EN) ·

    SlimSearcher: Training Efficiency-Aware Web Agents via Adaptive Reward Gating

    SlimSearcher is a framework that improves efficiency in deep research agents by combining Pareto-efficient trajectory filtering and adaptive reward shaping to reduce computational costs while maintaining accuracy.

  79. arXiv cs.AI TIER_1 English(EN) · Ziming Wang ·

    TOKI: A Bitemporal Operator Algebra for Contradiction Resolution in LLM-Agent Persistent Memory

    Persistent memory for an LLM agent is a write-heavy substrate: every belief update is a versioned write, and a new claim may contradict a stored one. Production systems use four resolution heuristics (last-writer-wins, evidence-weighted merge, await-confirmation, per-rule policy)…

  80. arXiv cs.CL TIER_1 English(EN) · Junbo Zhao ·

    SkillComposer: Learning to Evolve Agent Skills for Specification and Generalization

    Agent skills, which consist of reusable strategies that guide agent reasoning and action, have shown strong potential for improving model capability at inference time. However, current skill construction methods treat the problem as one-shot extraction, overlooking a fundamental …

  81. arXiv cs.IR (Information Retrieval) TIER_1 English(EN) · Bryan Hooi ·

    Memory is Reconstructed, Not Retrieved: Graph Memory for LLM Agents

    Despite recent progress, LLM agents still struggle with reasoning over long interaction histories. While current memory-augmented agents rely on a static retrieve-then-reason paradigm, this rigid pipeline design prevents them from dynamically adapting memory access to intermediat…

  82. arXiv cs.CL TIER_1 English(EN) · Tong Che ·

    EMBER: Efficient Memory via Budgeted Evidence Retention for Long-Horizon Agents

    Long-horizon agents can archive large histories, but future answers still incur retrieval, rereading, and context costs. When retained memory misses answer-relevant evidence, the system must return to larger portions of the raw history. We study budgeted evidence survival: before…

  83. arXiv cs.AI TIER_1 English(EN) · Joel Sol, Homayoun Najjaran ·

    SMAC-Talk: A Natural Language Extension of the StarCraft Multi-Agent Challenge for Large Language Models

    arXiv:2606.04202v1 Announce Type: new Abstract: As LLMs become more widely deployed, they are increasingly expected to work alongside other AI agents rather than operating in isolation. Effective coordination in these settings requires agents to communicate, share information and…

  84. arXiv cs.CL TIER_1 English(EN) · Yubo Hou, Jingwei Song, Hongbo Zhang, Zhisheng Chen, Bang Xiao, Tao Wan, Zengchang Qin ·

    PersonaTree: Structured Lifecycle Memory for Person Understanding in LLM Agents

    arXiv:2606.04780v1 Announce Type: new Abstract: Persistent LLM agents require memory representations that make the formation of person understanding explicit across long term interaction. Existing agent memory methods emphasize information retention and retrieval, yet give limite…

  85. arXiv cs.CL TIER_1 English(EN) · Jingwen Chen, Wenkai Yang, Shengda Fan, Wenbo Nie, Chenxing Sun, Shaodong Zheng, Yangen Hu, Lu Pan, Ke Zeng, Yankai Lin ·

    Rethinking Continual Experience Internalization for Self-Evolving LLM Agents

    arXiv:2606.04703v1 Announce Type: new Abstract: Experience internalization converts contextual experience from past interactions into reusable parametric capability, offering a promising path toward continual learning in large language models (LLMs). While prior work has predomin…

  86. arXiv cs.AI TIER_1 English(EN) · Bo Mao, Jie Zhou, Yutao Yang, Xin Li, Xian Wei, Qin Chen, Xingjiao Wu, Liang He ·

    Learning While Acting: A Skill-Enhanced Test-Time Co-Evolution Framework for Online Lifelong Learning Agents

    arXiv:2606.04815v1 Announce Type: cross Abstract: Lifelong learning is essential for Large Language Model (LLM) agents operating in dynamic, interactive environments. However, existing lifelong learning agents for long-horizon tasks typically depend on discrete skill or past expe…

  87. arXiv cs.AI TIER_1 English(EN) · Yifan Simon Liu, Liam Gallagher, Faeze Moradi Kalarde, Jiazhou Liang, Armin Toroghi, Scott Sanner ·

    Temporal Order Matters for Agentic Memory: Segment Trees for Long-Horizon Agents

    arXiv:2606.04555v1 Announce Type: cross Abstract: Long-horizon conversational agents need to interact with users through evolving events, tasks, and goals. Such histories are naturally temporal, yet many existing memory systems organize information primarily by topical similarity…

  88. arXiv cs.AI TIER_1 English(EN) · Wangcheng Tao, Han Wu, Weng-Fai Wong ·

    SePO: Self-Evolving Prompt Agent for System Prompt Optimization

    arXiv:2606.04465v1 Announce Type: cross Abstract: System prompt optimization improves agent behavior without modifying the underlying model, yielding human-readable, model-agnostic instructions. Existing methods build a prompt agent that refines task agents' system prompts, yet l…

  89. arXiv cs.AI TIER_1 English(EN) · Kai Zhang, Xinyuan Zhang, Hongda Jiang, Shiun-Zu Kuo, Hyokun Yun, Ejaz Ahmed, Shereen Oraby, Ziyun Li, Sanat Sharma, Ann Lee, Ahmed A Aly, Anuj Kumar, Raffay Hamid, Xin Luna Dong ·

    SaliMory: Orchestrating Cognitive Memory for Conversational Agents

    arXiv:2606.04120v1 Announce Type: cross Abstract: Conversational agents that serve as lifelong companions must maintain persistent memory across all interactions. However, simply expanding context windows with raw retrieval degrades reasoning quality, while training memory agents…

  90. arXiv cs.AI TIER_1 English(EN) · Tao Ren, Weiyao Luo, Hui Yang, Rongzhi Zhu, Xiang Huang, Yuchuan Wu, Bingxue Chou, Jieping Ye, Jiafeng Liang, Yongbin Li, Yijie Peng ·

    Scaling Self-Evolving Agents via Parametric Memory

    arXiv:2606.04536v1 Announce Type: new Abstract: Existing memory-augmented LLM agents store past experience exclusively in prompt space, as textual summaries or retrieved passages, while keeping model parameters frozen throughout a rollout. Such agents can \emph{look up} what they…

  91. arXiv cs.AI TIER_1 English(EN) · Jiaxi Li, Ke Deng, Yun Wang, Jingyuan Huang, Yucheng Shi, Qiaoyu Tan, Jin Lu, Ninghao Liu ·

    Online Skill Learning for Web Agents via State-Grounded Dynamic Retrieval

    arXiv:2606.04391v1 Announce Type: new Abstract: Language agents increasingly rely on reusable skills to improve multi-step web automation across related tasks. A growing line of work studies online skill learning, where agents continually induce skills from previous task trajecto…

  92. arXiv cs.AI TIER_1 English(EN) · Zhikai Chen, Jialiang Gu, Junyu Yin, Xianxuan Long, Shenglai Zeng, Xiaoze Liu, Kai Guo, Keren Zhou, Jiliang Tang ·

    Exploring Cross-Scenario Generality of Agentic Memory Systems: Diagnostics and a Strong Baseline

    arXiv:2606.04315v1 Announce Type: new Abstract: LLM agents accumulate histories that outgrow their context windows, motivating a growing literature on memory systems. Yet most existing designs are tuned to a single scenario (multi-session chat or a single trajectory format), and …

  93. Hugging Face Daily Papers TIER_1 English(EN) ·

    SubtleMemory: A Benchmark for Fine-Grained Relational Memory Discrimination in Long-Horizon AI Agents

    SubtleMemory benchmark evaluates AI agents' ability to handle complex relational memory structures that emerge during prolonged interactions, revealing limitations in current memory systems for preserving and utilizing nuanced memory relationships.

  94. Hugging Face Daily Papers TIER_1 English(EN) ·

    AdaPlanBench: Evaluating Adaptive Planning in Large Language Model Agents under World and User Constraints

    AdaPlanBench presents a dynamic interactive benchmark for evaluating LLM agents' ability to adaptively plan under progressively revealed world and user constraints through multi-turn interactions.

  95. arXiv cs.LG TIER_1 English(EN) · Liang He ·

    Learning While Acting: A Skill-Enhanced Test-Time Co-Evolution Framework for Online Lifelong Learning Agents

    Lifelong learning is essential for Large Language Model (LLM) agents operating in dynamic, interactive environments. However, existing lifelong learning agents for long-horizon tasks typically depend on discrete skill or past experiences retrieval with static parameters during in…

  96. arXiv cs.CL TIER_1 English(EN) · Zengchang Qin ·

    PersonaTree: Structured Lifecycle Memory for Person Understanding in LLM Agents

    Persistent LLM agents require memory representations that make the formation of person understanding explicit across long term interaction. Existing agent memory methods emphasize information retention and retrieval, yet give limited account of how accumulated interaction evidenc…

  97. arXiv cs.LG TIER_1 English(EN) · Yankai Lin ·

    Rethinking Continual Experience Internalization for Self-Evolving LLM Agents

    Experience internalization converts contextual experience from past interactions into reusable parametric capability, offering a promising path toward continual learning in large language models (LLMs). While prior work has predominantly focused on single-iteration transfer, we d…

  98. arXiv cs.CL TIER_1 English(EN) · Scott Sanner ·

    Temporal Order Matters for Agentic Memory: Segment Trees for Long-Horizon Agents

    Long-horizon conversational agents need to interact with users through evolving events, tasks, and goals. Such histories are naturally temporal, yet many existing memory systems organize information primarily by topical similarity and may ignore the order in which events occur. W…

  99. arXiv cs.AI TIER_1 English(EN) · Yijie Peng ·

    Scaling Self-Evolving Agents via Parametric Memory

    Existing memory-augmented LLM agents store past experience exclusively in prompt space, as textual summaries or retrieved passages, while keeping model parameters frozen throughout a rollout. Such agents can \emph{look up} what they have seen but cannot \emph{learn from} it: thei…

  100. arXiv cs.CL TIER_1 English(EN) · Weng-Fai Wong ·

    SePO: Self-Evolving Prompt Agent for System Prompt Optimization

    System prompt optimization improves agent behavior without modifying the underlying model, yielding human-readable, model-agnostic instructions. Existing methods build a prompt agent that refines task agents' system prompts, yet leave the prompt agent's own system prompt hand-eng…

  101. arXiv cs.AI TIER_1 English(EN) · Sarah Barrington, Maty Bohacek, Hany Farid ·

    The DeepSpeak-Agentic Dataset

    arXiv:2606.03686v1 Announce Type: new Abstract: We present DeepSpeak-Agentic, a dataset of videos comprising over 37 hours of semi-structured conversations between a human and an embodied AI agent. We use this dataset to evaluate the automatic forensic identification (audio, vide…

  102. arXiv cs.AI TIER_1 English(EN) · Tiancheng Han, Yong Li, Wuzhou Yu, Qiaosheng Zhang, Wenqi Shao ·

    InfoMem: Training Long-Context Memory Agents with Answer-Conditioned Information Gain

    arXiv:2606.03329v1 Announce Type: new Abstract: Long-context tasks require LLMs to identify and preserve answer-relevant information from large contexts. Chunk-wise memory agents address this issue by sequentially reading document chunks, updating a compact memory, and generating…

  103. arXiv cs.AI TIER_1 English(EN) · Haoran Tan, Zeyu Zhang, Zhicheng Cao, Rui Li, Xu Chen ·

    DELTAMEM: Incremental Experience Memory for LLM Agents via Residual Trees

    arXiv:2606.03083v1 Announce Type: new Abstract: Large Language Model (LLM)-based agents increasingly rely on memory to learn from experiences over continual interactions. However, storing experiences as independent, flat units leads to substantial redundancy and retrieval conflic…

  104. arXiv cs.AI TIER_1 English(EN) · Junming Liu, Yifei Sun, Weihua Cheng, Haodong Lei, Yirong Chen, Licheng Wen, Xuemeng Yang, Daocheng Fu, Pinlong Cai, Nianchen Deng, Yi Yu, Shuyue Hu, Botian Shi, Ding Wang ·

    MemVerse: Multimodal Memory for Lifelong Learning Agents

    arXiv:2512.03627v2 Announce Type: replace Abstract: Despite rapid progress in large-scale language and vision models, AI agents still suffer from a fundamental limitation: they cannot remember. Without reliable memory, agents catastrophically forget past experiences, struggle wit…

  105. arXiv cs.AI TIER_1 English(EN) · Ao Tian, Yunfeng Lu, Xinxin Fan, Changhao Wang, Lanzhi Zhou, Yeyao Zhang, Yanfang Liu ·

    RGMem: Renormalization Group-inspired Memory Evolution for Language Agents

    arXiv:2510.16392v3 Announce Type: replace Abstract: Personalized and continuous interactions are critical for LLM-based conversational agents, yet finite context windows and static parametric memory hinder the modeling of long-term, cross-session user states. Existing approaches,…

  106. arXiv cs.AI TIER_1 English(EN) · Kailin Lyu, Zhiqiang Yuan, Jianwei He, Qiwei Yan, Xuanbo Su, Nanxing Hu, Yang Liu, Ce Hao, Shengqian Qin, Lianyu Hu, Jinchao Zhang, Jie Zhou ·

    PhotoCraft: Agentic Reasoning with Hierarchical Self-Evolving Memory for Deep Image Search

    arXiv:2606.03099v1 Announce Type: cross Abstract: Deep Image Search requires multi-step reasoning over rich contextual cues, such as time, location, and event relations. However, most existing LLM-based agents are stateless and reactive, lacking persistent memory to maintain long…

  107. arXiv cs.AI TIER_1 English(EN) · Kaiwen Chen, Xin Tan, Jingzong Li, Hong Xu ·

    Libra: Efficient Resource Management for Agentic RL Post-Training

    arXiv:2606.03077v1 Announce Type: cross Abstract: Reinforcement learning (RL) has become a standard post-training paradigm for large language models (LLMs), extending beyond preference alignment to complex reasoning and multi-turn agentic behaviors. In agentic RL, the rollout sta…

  108. arXiv cs.AI TIER_1 English(EN) · Yuan Xiong, Ziqi Miao, Qian Chen, Lijun Li, Yequan Wang, Shizhu He, Jun Zhao, Kang Liu ·

    SkillPyramid: A Hierarchical Skill Consolidation Framework for Self-Evolving Agents

    arXiv:2606.03692v1 Announce Type: new Abstract: Recent AI agents can flexibly invoke skills to solve complex tasks, but their long-term improvement is fundamentally constrained by a lack of systematic skill construction, accumulation, and transfer. In particular, without a unifie…

  109. arXiv cs.AI TIER_1 English(EN) · Matteo Stabile, Enrico Zimuel ·

    DMF: A Deterministic Memory Framework for Conversational AI Agents

    arXiv:2606.03463v1 Announce Type: new Abstract: Conversational AI agents require memory systems that are both scalable and semantically coherent across long interaction horizons. Existing approaches rely predominantly on large language model (LLM)-based summarisation at write tim…

  110. arXiv cs.CL TIER_1 English(EN) · Xinyu Zhang, Yuchen Wan, Boxuan Zhang, Zesheng Yang, Lingling Zhang, Bifan Wei, Jun Liu ·

    Dual-Cluster Memory Agent: Resolving Multi-Paradigm Ambiguity in Optimization Problem Solving

    arXiv:2604.20183v2 Announce Type: replace Abstract: Large Language Models (LLMs) often struggle with structural ambiguity in optimization problems, where a single problem admits multiple related but conflicting modeling paradigms, hindering effective solution generation. To addre…

  111. arXiv cs.CL TIER_1 English(EN) · Jingbo Yang, Guanyu Yao, Yang Zhang, Ramana Rao Kompella, Gaowen Liu, Shiyu Chang ·

    FederatedSkill: Federated Learning for Agentic Skill Evolution

    arXiv:2606.03143v1 Announce Type: cross Abstract: Modern LLM agents increasingly rely on skill libraries to handle complex tasks, making skill evolution a primary driver of self-improvement. However, isolated single-user task streams lack the diversity required to build comprehen…

  112. arXiv cs.AI TIER_1 English(EN) · Renjun Xu, Yang Yan ·

    Agent Skills for Large Language Models: Architecture, Acquisition, Security, and the Path Forward

    arXiv:2602.12430v4 Announce Type: replace-cross Abstract: The transition from monolithic language models to modular, skill-equipped agents marks a defining shift in how large language models (LLMs) are deployed in practice. Rather than encoding all procedural knowledge within mod…

  113. Hugging Face Daily Papers TIER_1 English(EN) ·

    Rethinking Continual Experience Internalization for Self-Evolving LLM Agents

    Experience internalization enables continual learning in large language models by converting past interactions into reusable capabilities, with key findings on experience granularity, injection patterns, and internalization regimes for stable learning.

  114. Hugging Face Daily Papers TIER_1 English(EN) ·

    SePO: Self-Evolving Prompt Agent for System Prompt Optimization

    Self-Evolving Prompt Optimization (SePO) enhances agent performance by jointly optimizing both task and prompt agent system prompts through evolutionary search, demonstrating superior accuracy across diverse benchmarks.

  115. Hugging Face Daily Papers TIER_1 English(EN) ·

    Online Skill Learning for Web Agents via State-Grounded Dynamic Retrieval

    State-Grounded Dynamic Retrieval enables web agents to dynamically reuse skills based on current webpage state rather than fixed task-level strategies, improving automation performance across multiple domains.

  116. arXiv cs.AI TIER_1 English(EN) · Kang Liu ·

    SkillPyramid: A Hierarchical Skill Consolidation Framework for Self-Evolving Agents

    Recent AI agents can flexibly invoke skills to solve complex tasks, but their long-term improvement is fundamentally constrained by a lack of systematic skill construction, accumulation, and transfer. In particular, without a unified framework for skill consolidation, agents tend…

  117. arXiv cs.AI TIER_1 English(EN) · Hany Farid ·

    The DeepSpeak-Agentic Dataset

    We present DeepSpeak-Agentic, a dataset of videos comprising over 37 hours of semi-structured conversations between a human and an embodied AI agent. We use this dataset to evaluate the automatic forensic identification (audio, video, or text) of AI agents, study the nature of hu…

  118. arXiv cs.CL TIER_1 English(EN) · Enrico Zimuel ·

    DMF: A Deterministic Memory Framework for Conversational AI Agents

    Conversational AI agents require memory systems that are both scalable and semantically coherent across long interaction horizons. Existing approaches rely predominantly on large language model (LLM)-based summarisation at write time, which introduces non-determinism, escalating …

  119. arXiv cs.AI TIER_1 English(EN) · Chishui Chen, Jiaye Lin, Te Sun, Junxi Wang, Yi Yang, Cong Qin, Yangen Hu, Lu Pan, Ke Zeng ·

    Skill or Skip? Learning Selective Skill Invocation in Agentic Tasks via Dual-Granularity Preference Learning

    arXiv:2606.00510v1 Announce Type: cross Abstract: Agent skills are callable procedural modules that provide reusable knowledge and execution policies for complex agentic tasks. However, existing methods mainly focus on selecting relevant skills or improving the skills themselves,…

  120. arXiv cs.AI TIER_1 English(EN) · Yannan Wang, Longli Yang, Zhen Liu, Abhishek Kumar, Carsten Maple ·

    CoMIC: Collaborative Memory and Insights Circulation for Long-Horizon LLM Agents in Cloud-Edge Systems

    arXiv:2606.00756v1 Announce Type: new Abstract: Deploying lightweight Large Language Model (LLM) agents on edge servers can reduce latency and move agentic services closer to users, but resource-constrained edge models often struggle with long-horizon tasks that require persisten…

  121. arXiv cs.AI TIER_1 English(EN) · Yuxuan Liu, Zhaochen Su, Lingyun Xie, Yuhao Zhang, Qing Zong, Jiahe Guo, Zhongwei Xie, Yiyan Ji, Yauwai Yim, Hongyu Luo, Xiyu Ren, Ruan Chenyu, Haoran Li, Yangqiu Song ·

    SkillRevise: Improving LLM-Authored Agent Skills via Trace-Conditioned Skill Revision

    arXiv:2606.01139v1 Announce Type: new Abstract: Agent skills are procedural artifacts that enable LLM agents to execute workflows, verify constraints, and recover from failures. Existing self-evolving methods refine skills using accumulated trajectories. However, they struggle in…

  122. arXiv cs.AI TIER_1 English(EN) · Zhuoyun Yu, Xin Xie, Wuguannan Yao, Chenxi Wang, Lei Liang, Xiang Qi, Shumin Deng ·

    SkillAdaptor: Self-Adapting Skills for LLM Agents from Trajectories

    arXiv:2606.01311v1 Announce Type: cross Abstract: Large language model (LLM) agents increasingly rely on reusable external skills to solve long-horizon interactive tasks. Existing training-free skill adaptation pipelines usually update skills from full trajectories or session-lev…

  123. arXiv cs.AI TIER_1 English(EN) · Thamilvendhan Munirathinam ·

    AMP: A Vendor-Neutral Wire Format for Agent Memory Operations

    arXiv:2606.01138v1 Announce Type: cross Abstract: Agent-memory frameworks - mem0, Letta/MemGPT, Cognee, Zep/Graphiti, MemoryOS, MemTensor - each ship their own SDK, storage layout, and operational vocabulary. There is no shared wire format: every integration is bespoke, every mig…

  124. arXiv cs.AI TIER_1 English(EN) · Bole Ma, Jan Eitzinger, Harald Koestler ·

    Leyline: KV Cache Directives for Agentic Inference

    arXiv:2606.01065v1 Announce Type: cross Abstract: Modern KV cache management assumes the chatbot workload: prompts arrive once and the cache grows append-only, so prefix caching and forward-only eviction are correct by construction. Agentic LLMs break this assumption. Their conve…

  125. arXiv cs.AI TIER_1 English(EN) · Qingshan Liu, Guoqing Wang, Wen Wu, Jingqi Huang, Xinqi Tao, Dejia Song, Jie Zhou, Liang He ·

    MemPro: Agentic Memory Systems as Evolvable Programs

    arXiv:2606.00619v1 Announce Type: cross Abstract: Long-horizon autonomous agents require memory systems to retain historical information, track evolving states, and reuse relevant knowledge beyond finite context windows. Existing agentic memory systems typically follow a memory c…

  126. arXiv cs.AI TIER_1 English(EN) · Md Zarif Ul Alam, Alireza Salemi, Hamed Zamani ·

    Critic-R: Improving Agentic Search using Instruction-tuned Retrievers with Natural Language Introspective Feedback

    arXiv:2606.00590v1 Announce Type: cross Abstract: Agentic search systems iteratively interact with retrieval models to answer complex queries. Despite substantial progress, optimizing retrievers for agentic search remains challenging, often requiring heavy co-training or gold-sta…

  127. arXiv cs.AI TIER_1 English(EN) · Yiheng Shu, Bernal Jim\'enez Guti\'errez, Saisri Padmaja Jonnalagedda, Yuguang Yao, Huan Sun, Yu Su ·

    AGENTCL: Toward Rigorous Evaluation of Continual Learning in Language Agents

    arXiv:2606.02461v1 Announce Type: new Abstract: Language agents spend substantial inference time solving individual tasks, yet the experience acquired in one episode is often underutilized in future episodes. Continual learning expects an agent to accumulate reusable experience a…

  128. arXiv cs.AI TIER_1 English(EN) · Shizuo Tian, Xiaohong Weng, Rui Kong, Yuxuan Chen, Guohong Liu, Yuebing Song, Jiacheng Liu, Yuchen Li, Dawei Yin, Ting Cao, Yunxin Liu, Yuanchun Li ·

    Joint Agent Memory and Exploration Learning via Novelty Signals

    arXiv:2606.01528v1 Announce Type: new Abstract: In open-ended environments, exploration is fundamental for autonomous agents, yet current language model agents struggle with this. Effective exploration requires memory, but retaining raw interaction histories is computationally ex…

  129. arXiv cs.AI TIER_1 English(EN) · Jiaming Wang, Ziteng Feng, Jiangtao Wu, Ruihao Li, Qianqian Xie, Yuxiang Ren, He Zhu, Xueming Han, Fanyu Meng, Junlan Feng, Jiaheng Liu ·

    Where Do Deep-Research Agents Go Wrong? Span-Level Error Localization in Agent Trajectories

    arXiv:2606.02060v1 Announce Type: new Abstract: Deep-research agents solve tasks through long trajectories of search, tool use, evidence inspection, and answer synthesis. Evaluation based on final answers shows whether an agent succeeds, but not which parts of the trajectory make…

  130. arXiv cs.LG TIER_1 English(EN) · Peijia Qin, Qi Cao, Pengtao Xie ·

    ATLAS: Agentic Test-time Learning-to-Allocate Scaling

    arXiv:2606.01667v1 Announce Type: new Abstract: Test-time scaling has become a major way to improve large language model reasoning, but its orchestration has remained designer-engineered: a fixed sample budget, a fixed refinement loop, a fixed scoring rule, or a fixed search poli…

  131. arXiv cs.LG TIER_1 English(EN) · Xu Yang, Lunyiu Nie, Ethan Chandra, Stanislav Gannutin, Fangru Lin, Swarat Chaudhuri ·

    When Parallelism Pays Off: Cohesion-Aware Task Partitioning for Multi-Agent Coding

    arXiv:2606.00953v1 Announce Type: new Abstract: Multi-agent Large Language Model (LLM) systems offer a way to decompose complex tasks, such as coding, through parallelization and context isolation. However, adding agents in practice introduces inter-agent communication overhead, …

  132. arXiv cs.CL TIER_1 English(EN) · Jiajun Hou, Hexuan Deng, Wenxiang Jiao, Xuebo Liu, Xiaopeng Ke, Derek F. Wong, Min Zhang ·

    MemoNoveltyAgent: A Historical Research Memory-Aware Agent Workflow for Paper Novelty Assessment

    arXiv:2603.20884v2 Announce Type: replace Abstract: To alleviate the heavy burden of paper screening, researchers increasingly rely on existing AI agents, such as AI reviewers or DeepResearch, for paper evaluation and novelty assessment. However, lacking specialized mechanisms fo…

  133. arXiv cs.CL TIER_1 English(EN) · Tao Feng, Tianyang Luo, Jingjun Xu, Zhigang Hua, Yan Xie, Shuang Yang, Ge Liu, Jiaxuan You ·

    ExpWeaver: LLM Agents Learn from Experience via Latent RAG

    arXiv:2606.01041v1 Announce Type: new Abstract: Experience learning has achieved promising results in enhancing LLM agent planning and reasoning by integrating past interactions as reusable knowledge. However, existing methods remain confined to explicit text space, retrieving ex…

  134. arXiv cs.CL TIER_1 English(EN) · Adril Putra Merin, David Anugraha, Ayu Purwarianti, Genta Indra Winata ·

    Momento: Evaluating Persistent Memory and Reasoning with Multi-Session Agentic Conversations

    arXiv:2606.00832v1 Announce Type: new Abstract: Recent advances in agentic AI have enabled agents to complete complex tasks through tool use, reasoning, and multi-step planning. Yet existing benchmarks evaluate agents within a single session, ignoring past actions, stated prefere…

  135. arXiv cs.CL TIER_1 English(EN) · Yibo Wang, Nikki Lijing Kuang, Philip S. Yu, Zhewei Yao, Yuxiong He ·

    Learning to Retrieve: Dual-Level Long-Term Memory for Text-to-SQL Agents

    arXiv:2606.00547v1 Announce Type: new Abstract: Interactive text-to-SQL agents solve database tasks through multi-turn interactions involving schema exploration, query execution, feedback interpretation, and decision revision. Long-term memory helps agents reuse past experiences,…

  136. arXiv cs.AI TIER_1 English(EN) · Albert Sadowski, Jaros{\l}aw A. Chudziak ·

    Rashomon Memory: Towards Argumentation-Driven Retrieval for Multi-Perspective Agent Memory

    arXiv:2604.03588v3 Announce Type: replace Abstract: AI agents operating over extended time horizons accumulate experiences that serve multiple concurrent goals, and must often maintain conflicting interpretations of the same events. A concession during a client negotiation encode…

  137. arXiv cs.AI TIER_1 English(EN) · Minki Kang, Wei-Ning Chen, Dongge Han, Huseyin A. Inan, Lukas Wutschitz, Yanzhi Chen, Robert Sim, Saravan Rajmohan ·

    ACON: Optimizing Context Compression for Long-horizon LLM Agents

    arXiv:2510.00615v3 Announce Type: replace Abstract: Large language models (LLMs) are increasingly deployed as agents in dynamic real-world environments, where success depends on maintaining precise records of actions and observations. However, the resulting unbounded context grow…

  138. arXiv cs.AI TIER_1 English(EN) · Xinyu Che, Junqi Xiong, Yunfei Ge, Xinping Lei, Shihao Li, Hang Yan, Han Li, Yuanxing Zhang, Zhiqi Bai, Jinhua Hao, Ming Sun, Han Li, Jiaheng Liu ·

    MMG2Skill: Can Agents Distill In-the-Wild Guides into Self-Evolving Skills?

    arXiv:2606.01993v1 Announce Type: cross Abstract: Abundant procedural knowledge on the Web holds great potential for helping agents solve long-horizon tasks. However, such knowledge is often multimodal, heterogeneous, noisy, and implicitly assumes human executors, making it diffi…

  139. arXiv cs.AI TIER_1 English(EN) · Prateek Kumar Sikdar ·

    LayerRoute: Input-Conditioned Adaptive Layer Skipping via LoRA Fine-Tuning for Agentic Language Models

    arXiv:2606.01838v1 Announce Type: cross Abstract: Agentic language model systems alternate between two structurally distinct step types: structured tool calls (short, deterministic, low perplexity) and open-ended planning/reasoning steps (long, complex, high perplexity). Despite …

  140. Hugging Face Daily Papers TIER_1 English(EN) ·

    AgentCL: Toward Rigorous Evaluation of Continual Learning in Language Agents

    A comprehensive evaluation framework for continual learning in language agents is introduced, emphasizing controlled task streams and memory design analysis to better assess reusable experience and learning stability.

  141. arXiv cs.AI TIER_1 English(EN) · Yu Su ·

    AGENTCL: Toward Rigorous Evaluation of Continual Learning in Language Agents

    Language agents spend substantial inference time solving individual tasks, yet the experience acquired in one episode is often underutilized in future episodes. Continual learning expects an agent to accumulate reusable experience across a stream of tasks, improve over time, and …

  142. Hugging Face Daily Papers TIER_1 English(EN) ·

    MMG2Skill: Can Agents Distill In-the-Wild Guides into Self-Evolving Skills?

    MMG2Skill framework converts web-based procedural guides into executable skills through closed-loop learning, improving agent performance across GUI control, gameplay, and card play tasks.

  143. arXiv cs.CL TIER_1 English(EN) · Jiaheng Liu ·

    MMG2Skill: Can Agents Distill In-the-Wild Guides into Self-Evolving Skills?

    Abundant procedural knowledge on the Web holds great potential for helping agents solve long-horizon tasks. However, such knowledge is often multimodal, heterogeneous, noisy, and implicitly assumes human executors, making it difficult to use directly as the skills required by age…

  144. arXiv cs.CL TIER_1 English(EN) · Prateek Kumar Sikdar ·

    LayerRoute: Input-Conditioned Adaptive Layer Skipping via LoRA Fine-Tuning for Agentic Language Models

    Agentic language model systems alternate between two structurally distinct step types: structured tool calls (short, deterministic, low perplexity) and open-ended planning/reasoning steps (long, complex, high perplexity). Despite this heterogeneity, current inference systems appl…

  145. arXiv cs.MA (Multiagent) TIER_1 English(EN) · Min Zhang ·

    MetaForge: A Self-Evolving Multimodal Agent that Retrieves, Adapts, and Forges Tools On Demand

    Multimodal agents have achieved notable progress on complex reasoning tasks through tool use, yet remain limited by two issues: statically predefined tool inventories fail to generalize to unseen scenarios, and indiscriminate tool invocation incurs redundant cost and noise-induce…

  146. Hugging Face Daily Papers TIER_1 English(EN) ·

    ATLAS: Agentic Test-time Learning-to-Allocate Scaling

    Test-time scaling has become a major way to improve large language model reasoning, but its orchestration has remained designer-engineered: a fixed sample budget, a fixed refinement loop, a fixed scoring rule, or a fixed search policy decides how compute is spent, leaving the mod…

  147. arXiv cs.CL TIER_1 English(EN) · Tao Feng, Chongrui Ye, Tianyang Luo, Jingjun Xu, Xueqiang Xu, Haozhen Zhang, Ge Liu, Jiaxuan You ·

    ElasticMem: Latent Memory as a Learnable Resource for LLM Agents

    arXiv:2605.30690v1 Announce Type: new Abstract: Long-term memory is essential for LLM agents to reason coherently across extended interactions, personalize responses, and reuse past experience. However, existing memory-augmented methods typically treat memory as a fixed resource:…

  148. arXiv cs.CL TIER_1 English(EN) · Resham Joshi ·

    Eywa: Provenance-Grounded Long-Term Memory for AI Agents

    arXiv:2605.30771v1 Announce Type: new Abstract: AI agents that persist across sessions need memory they can retrieve, audit, update, and erase. Existing memory systems often collapse source evidence, extracted facts, retrieved context, and answer policy into one opaque prompt pat…

  149. arXiv cs.AI TIER_1 English(EN) · Lu Yi, Runlin Lei, Liuyi Yao, Yuexiang Xie, Yuyang Li, Wenhao Zhang, Zhewei Wei, Yaliang Li, Jian-Yun Nie ·

    Learning Agent-Compatible Context Management for Long-Horizon Tasks

    arXiv:2605.30785v1 Announce Type: new Abstract: LLM agents increasingly face long-horizon tasks such as web search and deep research in real-world applications, where accumulated context can cause long-context degradation and reasoning failures. Prior work mitigates this through …

  150. arXiv cs.AI TIER_1 English(EN) · Weile Chen, Bingchen Miao, Qifan Yu, Wendong Bu, Guoming Wang, Wenqiao Zhang, Shengyu Zhang, Juncheng Li, Siliang Tang ·

    Learning to Adapt: Self-Improving Web Agent via Cognitive-Aware Exploration

    arXiv:2605.31365v1 Announce Type: new Abstract: Recent advances in Multimodal Large Language Models (MLLMs) have led to promising progress in web agents. However, existing web agents often rely on handcrafted execution pipelines or expensive expert trajectories, limiting their ad…

  151. arXiv cs.AI TIER_1 English(EN) · Weitong Qian, Beicheng Xu, Zhongao Xie, Bowen Fan, Guozheng Tang, Jiale Chen, Xinzhe Wu, Mingtian Yang, Chenyang Di, Jiajun Li, Lingching Tung, Peichao Lai, Yifei Xia, Ziyi Guo, Yanwei Xu, Yanzhao Qin, Shaoduo Gan, Xupeng Miao, Bin Cui ·

    AutoSci: A Memory-Centric Agentic System for the Full Scientific Research Lifecycle

    arXiv:2605.31468v1 Announce Type: new Abstract: Scientific research has traditionally been human-intensive, requiring researchers to coordinate literature, ideas, experiments, manuscripts, and review responses across long project cycles. The rise of LLM-based scientific agents cr…

  152. arXiv cs.AI TIER_1 English(EN) · Xiaonan Xu, Wenjing Wu ·

    Skill Availability and Presentation Granularity in Large-Language-Model Agents: A Controlled SkillsBench Study

    arXiv:2605.31408v1 Announce Type: cross Abstract: Skill documents provide procedural knowledge to large-language-model agents at inference time. This article studies whether the presentation granularity of controlled skill knowledge changes downstream task success. The experiment…

  153. arXiv cs.LG TIER_1 English(EN) · Yurui Chang, Yongkang Du, Yuanpu Cao, Jinghui Chen, Lu Lin ·

    ForecastCompass: Guiding Agentic Forecasting with Adaptive Factor Memory

    arXiv:2605.30858v1 Announce Type: new Abstract: Agentic forecasting is important for decision-making in dynamic environments, but it remains challenging because agents must reason from incomplete, time-limited evidence and produce calibrated probabilities before outcomes are reso…

  154. arXiv cs.CL TIER_1 English(EN) · Han Zhang, Zihao Tang, Xin Yu, Xiao Liu, Yeyun Gong, Haizhen Huang, Yan Lu, Weiwei Deng, Feng Sun, Qi Zhang, Hanfang Yang ·

    Beyond Static Dialogues: Benchmarking Realistic, Heterogeneous, and Evolving Long-Term Memory

    arXiv:2605.31086v1 Announce Type: new Abstract: In existing memory benchmarks for Large Language Models (LLMs), the evaluated dialogue sessions often lack long-term semantic consistency, and the underlying personas tend to be flat and static. Furthermore, in real-world scenarios,…

  155. arXiv cs.AI TIER_1 English(EN) · Ruihang Lai, Hao Kang, Haozhan Tang, Akaash R. Parthasarathy, Zichun Yu, Junru Shao, Todd C. Mowry, Chenyan Xiong, Tianqi Chen ·

    PithTrain: A Compact and Agent-Native MoE Training System

    arXiv:2605.31463v1 Announce Type: cross Abstract: Mixture-of-Experts (MoE) has become the dominant architecture for frontier language models. To meet this demand, production frameworks have built optimized MoE training stacks over years of engineering effort. Yet evolving these s…

  156. arXiv cs.AI TIER_1 English(EN) · Zhikun Xu, Yu Feng, Jacob Dineen, Taiwei Shi, Jieyu Zhao, Ben Zhou ·

    Skill Reuse as Compression in Agentic RL

    arXiv:2605.31509v1 Announce Type: cross Abstract: Large language model agents trained with reinforcement learning (RL) often learn brittle, task-specific shortcuts. We hypothesize that agents generalize better when their successful trajectories are structurally compressible, deco…

  157. arXiv cs.AI TIER_1 English(EN) · Nianyi Lin, Jiajie Zhang, Lei Hou, Juanzi Li ·

    LongTraceRL: Learning Long-Context Reasoning from Search Agent Trajectories with Rubric Rewards

    arXiv:2605.31584v1 Announce Type: cross Abstract: Long-context reasoning remains a central challenge for large language models, which often fail to locate and integrate key information in extensive distracting content. Reinforcement learning with verifiable rewards (RLVR) has sho…

  158. arXiv cs.AI TIER_1 English(EN) · Benjamin Schneider, Xavier Schneider, Victor Zhong, Sun Sun ·

    ASH: Agents that Self-Hone via Embodied Learning

    arXiv:2605.14211v2 Announce Type: replace Abstract: Long-horizon embodied tasks remain a fundamental challenge in AI, as current methods rely on hand-engineered rewards or action-labeled demonstrations, neither of which scales. We introduce ASH, an agentic system that learns an e…

  159. arXiv cs.CL TIER_1 English(EN) · Tao Feng, Chongrui Ye, Tianyang Luo, Jingjun Xu, Xueqiang Xu, Haozhen Zhang, Zhigang Hua, Yan Xie, Shuang Yang, Ge Liu, Jiaxuan You ·

    ExpGraph: Model-Agnostic Experience Learning with Graph-Structured Memory for LLM Agents

    arXiv:2605.30712v1 Announce Type: new Abstract: Large language model (LLM) agents have shown strong capabilities in reasoning, tool use, and multi-step interaction, but they often solve tasks from scratch and fail to reuse successful strategies or failure lessons from prior exper…

  160. Hugging Face Daily Papers TIER_1 English(EN) ·

    Where Do Deep-Research Agents Go Wrong? Span-Level Error Localization in Agent Trajectories

    Deep-research agents can be audited using a claim-centric framework that identifies error spans in their reasoning trajectories, improving reliability assessment beyond just final answer evaluation.

  161. Hugging Face Daily Papers TIER_1 English(EN) ·

    Joint Agent Memory and Exploration Learning via Novelty Signals

    Joint Agent Memory and Exploration Learning (JAMEL) framework trains memory and exploration policies together through novelty-driven interaction, enabling effective exploration in open-ended environments with reduced computational costs.

  162. arXiv cs.MA (Multiagent) TIER_1 English(EN) · Shumin Deng ·

    SkillAdaptor: Self-Adapting Skills for LLM Agents from Trajectories

    Large language model (LLM) agents increasingly rely on reusable external skills to solve long-horizon interactive tasks. Existing training-free skill adaptation pipelines usually update skills from full trajectories or session-level feedback, which makes failure attribution coars…

  163. arXiv cs.MA (Multiagent) TIER_1 English(EN) · Swarat Chaudhuri ·

    When Parallelism Pays Off: Cohesion-Aware Task Partitioning for Multi-Agent Coding

    Multi-agent Large Language Model (LLM) systems offer a way to decompose complex tasks, such as coding, through parallelization and context isolation. However, adding agents in practice introduces inter-agent communication overhead, which incurs extra cost and can sometimes offset…

  164. Hugging Face Daily Papers TIER_1 English(EN) ·

    SkillAdaptor: Self-Adapting Skills for LLM Agents from Trajectories

    Step-level skill adaptation framework with explicit failure attribution improves training-free skill maintenance for LLM agents in interactive tasks.

  165. arXiv cs.IR (Information Retrieval) TIER_1 English(EN) · Hamed Zamani ·

    Critic-R: Improving Agentic Search using Instruction-tuned Retrievers with Natural Language Introspective Feedback

    Agentic search systems iteratively interact with retrieval models to answer complex queries. Despite substantial progress, optimizing retrievers for agentic search remains challenging, often requiring heavy co-training or gold-standard annotations that limit real-world applicabil…

  166. Hugging Face Daily Papers TIER_1 English(EN) ·

    Critic-R: Improving Agentic Search using Instruction-tuned Retrievers with Natural Language Introspective Feedback

    Critic-R framework enhances agentic search by closing the feedback loop between reasoning agents and retrieval models through critic evaluation and dual optimization mechanisms.

  167. arXiv cs.AI TIER_1 English(EN) · Juanzi Li ·

    LongTraceRL: Learning Long-Context Reasoning from Search Agent Trajectories with Rubric Rewards

    Long-context reasoning remains a central challenge for large language models, which often fail to locate and integrate key information in extensive distracting content. Reinforcement learning with verifiable rewards (RLVR) has shown promise for this task, yet existing methods are…

  168. arXiv cs.AI TIER_1 English(EN) · Ben Zhou ·

    Skill Reuse as Compression in Agentic RL

    Large language model agents trained with reinforcement learning (RL) often learn brittle, task-specific shortcuts. We hypothesize that agents generalize better when their successful trajectories are structurally compressible, decomposed into a small set of reusable abstract patte…

  169. arXiv cs.AI TIER_1 English(EN) · Bin Cui ·

    AutoSci: A Memory-Centric Agentic System for the Full Scientific Research Lifecycle

    Scientific research has traditionally been human-intensive, requiring researchers to coordinate literature, ideas, experiments, manuscripts, and review responses across long project cycles. The rise of LLM-based scientific agents creates an opportunity to automate this process. S…

  170. arXiv cs.AI TIER_1 English(EN) · Tianqi Chen ·

    PithTrain: A Compact and Agent-Native MoE Training System

    Mixture-of-Experts (MoE) has become the dominant architecture for frontier language models. To meet this demand, production frameworks have built optimized MoE training stacks over years of engineering effort. Yet evolving these stacks for new architectures and system optimizatio…

  171. arXiv cs.AI TIER_1 English(EN) · Wenjing Wu ·

    Skill Availability and Presentation Granularity in Large-Language-Model Agents: A Controlled SkillsBench Study

    Skill documents provide procedural knowledge to large-language-model agents at inference time. This article studies whether the presentation granularity of controlled skill knowledge changes downstream task success. The experiment uses a pinned SkillsBench version, a 30-task doma…

  172. arXiv cs.AI TIER_1 English(EN) · Siliang Tang ·

    Learning to Adapt: Self-Improving Web Agent via Cognitive-Aware Exploration

    Recent advances in Multimodal Large Language Models (MLLMs) have led to promising progress in web agents. However, existing web agents often rely on handcrafted execution pipelines or expensive expert trajectories, limiting their adaptability to complex, dynamic environments. To …

  173. arXiv cs.IR (Information Retrieval) TIER_1 English(EN) · Hanfang Yang ·

    Beyond Static Dialogues: Benchmarking Realistic, Heterogeneous, and Evolving Long-Term Memory

    In existing memory benchmarks for Large Language Models (LLMs), the evaluated dialogue sessions often lack long-term semantic consistency, and the underlying personas tend to be flat and static. Furthermore, in real-world scenarios, interactions between users and assistants invol…

  174. arXiv cs.IR (Information Retrieval) TIER_1 English(EN) · Hanfang Yang ·

    Beyond Static Dialogues: Benchmarking Realistic, Heterogeneous, and Evolving Long-Term Memory

    In existing memory benchmarks for Large Language Models (LLMs), the evaluated dialogue sessions often lack long-term semantic consistency, and the underlying personas tend to be flat and static. Furthermore, in real-world scenarios, interactions between users and assistants invol…

  175. arXiv cs.AI TIER_1 English(EN) · Johannes Moll, Jean-Philippe Corbeil, Jiazhen Pan, Martin Hadamitzky, Daniel Rueckert, Lisa Adams, Keno Bressem ·

    GRASP: Gated Regression-Aware Skill Proposer for Self-Improving LLM Agents

    arXiv:2605.29668v1 Announce Type: new Abstract: LLM agents acting in structured environments fail in operational rather than conversational ways, and reliability depends on procedural knowledge of the environment. Prior self-improvement methods accumulate natural-language guidanc…

  176. arXiv cs.CL TIER_1 English(EN) · Chengzhi Liu, Yuzhe Yang, Sophia Xiao Pu, Yepeng Liu, Lin Long, Yichen Guo, Nuo Chen, Zhaotian Weng, Elena Kochkina, Simerjot Kaur, Charese Smiley, Xiaomo Liu, James Zou, Sheng Liu, Yuheng Bu, Songyou Peng, Xin Eric Wang ·

    WorldMemArena: Evaluating Multimodal Agent Memory Through Action-World Interaction

    arXiv:2605.29341v1 Announce Type: cross Abstract: Multimodal large language models are increasingly deployed as long-horizon agents, where memory must do more than recall: it must track an evolving world, revise what has gone stale, and surface the right evidence at decision time…

  177. arXiv cs.CL TIER_1 English(EN) · Xiaoxuan Peng, Kaiqi Zhang, Xinyu Lu, Boxi Cao, Yaojie Lu, Hongyu Lin, Xianpei Han, Le Sun ·

    LiteCoder-Terminal: Scaling Long-Horizon Terminal Environments for Learning Language Agents

    arXiv:2605.29559v1 Announce Type: new Abstract: Mastering terminal environments requires language agents capable of multi-step planning, feedback-grounded execution, and dynamic state adaptation. However, training such agents is currently bottlenecked by a reliance on scraped ext…

  178. arXiv cs.AI TIER_1 English(EN) · Qirui Mi, Zhijian Ma, Mengyue Yang, Haoxuan Li, Yisen Wang, Haifeng Zhang, Jun Wang ·

    Skill-Pro: Learning Reusable Skills from Experience via Non-Parametric PPO for LLM Agents

    arXiv:2602.01869v3 Announce Type: replace Abstract: LLM-driven agents excel at sequential decision-making but often rely on on-the-fly reasoning, re-deriving solutions even in recurring scenarios. This insufficient experience reuse leads to computational redundancy and instabilit…

  179. arXiv cs.AI TIER_1 English(EN) · Youwang Deng ·

    Entity-Collision: A Stratified Protocol for Attributing Retrieval Lift in Agent Memory

    arXiv:2605.29630v1 Announce Type: cross Abstract: End-to-end agent-memory benchmarks report a single hit@k per retriever, confounding lexical leakage (uncontrolled query/gold/distractor entity overlap) with tag-mixing (preferences, services, tools averaged together). We propose e…

  180. arXiv cs.AI TIER_1 English(EN) · Ziyan Liu, Zhezheng Hao, Yeqiu Chen, Hong Wang, Jingren Hou, Ruiyi Ding, Yongkang Yang, Wence Ji, Wei Xia, Feng Liu ·

    Meta-Cognitive Memory Policy Optimization for Long-Horizon LLM Agents

    arXiv:2605.30159v1 Announce Type: new Abstract: Memory-augmented LLM agents tackle complex long-horizon tasks by recursively summarizing interaction trajectories into compact memory. However, existing approaches typically train these memory policies using outcome-based reinforcem…

  181. Hugging Face Daily Papers TIER_1 English(EN) ·

    Task-Focused Memorization for Multimodal Agents

    A reinforcement-learning-based framework called TaskMem is introduced to dynamically determine what information to store in long-term memory for multimodal agents, improving performance on streaming video benchmarks.

  182. Hugging Face Daily Papers TIER_1 English(EN) ·

    LongTraceRL: Learning Long-Context Reasoning from Search Agent Trajectories with Rubric Rewards

    LongTraceRL addresses long-context reasoning challenges in large language models through tiered distractor construction and rubric reward design for improved reasoning quality.

  183. Latent Space (swyx) TIER_1 English(EN) ·

    The Age of Async Agents — Cognition's Walden Yan & OpenInspect's Cole Murray

    80% Devin Commits, Spec-to-PR Workflows, Full VMs, Agent Memory, and PMs Shipping Code

  184. arXiv cs.AI TIER_1 English(EN) · Feng Liu ·

    Meta-Cognitive Memory Policy Optimization for Long-Horizon LLM Agents

    Memory-augmented LLM agents tackle complex long-horizon tasks by recursively summarizing interaction trajectories into compact memory. However, existing approaches typically train these memory policies using outcome-based reinforcement learning, failing to localize where intermed…

  185. Hugging Face Daily Papers TIER_1 English(EN) ·

    Entity-Collision: A Stratified Protocol for Attributing Retrieval Lift in Agent Memory

    End-to-end agent-memory benchmarks report a single hit@k per retriever, confounding lexical leakage (uncontrolled query/gold/distractor entity overlap) with tag-mixing (preferences, services, tools averaged together). We propose entity-collision, a system-agnostic protocol that p…

  186. arXiv cs.IR (Information Retrieval) TIER_1 English(EN) · Youwang Deng ·

    Entity-Collision: A Stratified Protocol for Attributing Retrieval Lift in Agent Memory

    End-to-end agent-memory benchmarks report a single hit@k per retriever, confounding lexical leakage (uncontrolled query/gold/distractor entity overlap) with tag-mixing (preferences, services, tools averaged together). We propose entity-collision, a system-agnostic protocol that p…

  187. Hugging Face Daily Papers TIER_1 English(EN) ·

    WorldMemArena: Evaluating Multimodal Agent Memory Through Action-World Interaction

    Multimodal large language models are increasingly deployed as long-horizon agents, where memory must do more than recall: it must track an evolving world, revise what has gone stale, and surface the right evidence at decision time. Existing benchmarks measure recall over static d…

  188. arXiv cs.LG TIER_1 English(EN) · Rui Bao, Yaping Sun, Zhiyong Chen, Feng Yang, Meixia Tao, Nan Li, Wenjun Zhang ·

    $E^3$-Agent: An Executable and Evolving Agent for Resource Management of Edge Generative Inference

    arXiv:2605.27428v1 Announce Type: new Abstract: Edge deployments of generative inference increasingly face two practical realities: per-device per-model performance is often unknown at deployment time, and it is non-stationary due to user-driven semantic events, background load, …

  189. arXiv cs.AI TIER_1 English(EN) · Guanyu Cui, Zhewei Wei, Kun He ·

    Position: The Turing-Completeness of Autoregressive Transformers Relies Heavily on Context Management

    arXiv:2605.19514v2 Announce Type: replace Abstract: Many works make the eye-catching claim that Transformers are Turing-complete. However, the literature often conflates two distinct settings: (i) a fixed Transformer system setting, in which a fixed autoregressive Transformer is …

  190. arXiv cs.AI TIER_1 Deutsch(DE) · Hanyu Wang, Yifan Lan, Bochuan Cao, Lu Lin, Jinghui Chen ·

    SkillGrad: Optimizing Agent Skills Like Gradient Descent

    arXiv:2605.27760v1 Announce Type: new Abstract: Agent skills provide a lightweight way to adapt LLM agents to specialized domains by storing reusable procedural knowledge in structured files. However, whether downloaded from third parties or self-generated, these skills are often…

  191. arXiv cs.AI TIER_1 English(EN) · Xinzhe Li, Yaguang Tao ·

    When Does Memory Help Multi-Trajectory Inference for Tool-Use LLM Agents?

    arXiv:2605.28224v1 Announce Type: new Abstract: Multi-trajectory inference for tool-use LLM agents - generating multiple reasoning attempts and selecting among them - benefits from transferring knowledge across attempts so that later ones avoid the pitfalls of earlier ones. Exist…

  192. arXiv cs.AI TIER_1 English(EN) · Taojie Zhu, Wentao Zhao, Rui Sun, Beidi Luan, Jiacheng Lu, Sinuo Wang, Jing Li, Daxin Jiang, Yonghong He, Zuo Bai ·

    From Knowing to Doing: A Memory-Controlled Benchmark for LLM Trading Agents on Stock Markets

    arXiv:2605.28359v1 Announce Type: new Abstract: Evaluating whether large language model (LLM) agents can profit in capital markets is increasingly framed as end-to-end trading: place an agent in a historical market, let it trade, and measure portfolio returns. This setup is vulne…

  193. arXiv cs.AI TIER_1 English(EN) · Yonatan Vernik, Alexander Tuisov, Alexander Shleyfman ·

    GONDOR to the Rescue: Satisficing Planning with Low Memory

    arXiv:2605.28454v1 Announce Type: new Abstract: Greedy Best-First Search (GBFS) is the dominant approach for solving search problems where the goal can be estimated with a heuristic, such as planning, route finding, navigation, and pathfinding. This is especially true when the me…

  194. arXiv cs.AI TIER_1 English(EN) · Shang Wu, Saatvik Kher, Padhraic Smyth ·

    Learning to Assign Prediction Tasks to Agents with Capacity Constraints

    arXiv:2605.27999v1 Announce Type: cross Abstract: We address the problem of learning to assign prediction tasks to one agent from a set of available human or AI agents. In particular, we focus on the sequential learning of agent expertise and assignment policies where each agent …

  195. arXiv cs.AI TIER_1 English(EN) · Dawei Liu, Zongxia Li, Hongyang Du, Xiyang Wu, Shihang Gui, Yongbei Kuang, Lichao Sun ·

    Graph-of-Skills: Dependency-Aware Structural Retrieval for Massive Agent Skills

    arXiv:2604.05333v3 Announce Type: replace Abstract: Modern LLM agents increasingly rely on reusable skills, and as they interact with personal applications, web browsers, and other interfaces, skill libraries can scale to thousands of skills. Scaling to larger skill sets introduc…

  196. arXiv cs.AI TIER_1 English(EN) · Zihan Li, Xingyu Fan, Feifei Li, Wenhui Que ·

    MemCog: From Memory-as-Tool to Memory-as-Cognition in Conversational Agents

    arXiv:2605.28046v1 Announce Type: new Abstract: Existing agent memory systems universally follow what we term a Memory-as-Tool paradigm where a single query triggers one-shot retrieval of flat passage lists, suffering from passive invocation, reasoning-retrieval decoupling, and s…

  197. Hugging Face Daily Papers TIER_1 English(EN) ·

    Meta-Cognitive Memory Policy Optimization for Long-Horizon LLM Agents

    Memory-augmented language models struggle with long-horizon tasks due to information loss in recursive summaries, but a new method using belief entropy and metacognitive policy optimization improves performance by focusing on memory quality rather than just outcome success.

  198. Hugging Face Daily Papers TIER_1 English(EN) ·

    WorldMemArena: Evaluating Multimodal Agent Memory Through Action-World Interaction

    Multimodal large language models require sophisticated memory systems that can track evolving environments and manage information dynamically across multiple sessions, with new benchmarks revealing limitations in current approaches.

  199. Hugging Face Daily Papers TIER_1 English(EN) ·

    LiteCoder-Terminal: Scaling Long-Horizon Terminal Environments for Learning Language Agents

    LiteCoder-Terminal-Gen enables scalable training of language agents for terminal environments through synthetic, executable environments that outperform traditional methods.

  200. Hugging Face Daily Papers TIER_1 English(EN) ·

    GONDOR to the Rescue: Satisficing Planning with Low Memory

    Greedy Best-First Search (GBFS) is the dominant approach for solving search problems where the goal can be estimated with a heuristic, such as planning, route finding, navigation, and pathfinding. This is especially true when the memory is tightly constrained, such as planning on…

  201. Hugging Face Daily Papers TIER_1 English(EN) ·

    When Does Memory Help Multi-Trajectory Inference for Tool-Use LLM Agents?

    Multi-trajectory inference for tool-use LLM agents - generating multiple reasoning attempts and selecting among them - benefits from transferring knowledge across attempts so that later ones avoid the pitfalls of earlier ones. Existing cross-trajectory memory methods (trajectory-…

  202. arXiv cs.AI TIER_1 English(EN) · Huawei Lin, Peng Li, Jie Song, Fuxin Jiang, Tieying Zhang ·

    MUSE-Autoskill: Self-Evolving Agents via Skill Creation, Memory, Management, and Evaluation

    arXiv:2605.27366v1 Announce Type: new Abstract: Large language model (LLM) agents rely on reusable skills to solve complex tasks. However, existing skill creation approaches treat skills as isolated and static artifacts, limiting their reusability, reliability, and long-term impr…

  203. arXiv cs.AI TIER_1 English(EN) · Haoran Zhang, Zhaohua Sun ·

    AGORA: Adapter-Grounded Observation-Action Retention for Inference-Free Prompt Compression in LLM Agents

    arXiv:2605.26596v1 Announce Type: new Abstract: The token-level extractive compressors widely used for general LM context are structurally inappropriate for LLM agents: across 17 (env, backbone, method) cells spanning two independent token-level method families, every cell collap…

  204. arXiv cs.AI TIER_1 English(EN) · Abdelghny Orogat, Essam Mansour ·

    Is Agent Memory a Database? Rethinking Data Foundations for Long-Term AI Agent Memory

    arXiv:2605.26252v1 Announce Type: new Abstract: Long-running AI agents need persistent memory. Memory supports learning across sessions, reduces repeated context injection, and enables auditing of past decisions. Current agent memory systems and database paradigms treat memory as…

  205. arXiv cs.CL TIER_1 English(EN) · Han Xiao ·

    Test-Time Compute for Dense Retrieval: Agentic Program Generation with Frozen Embedding Models

    arXiv:2605.11374v3 Announce Type: replace-cross Abstract: Test-time compute is widely believed to benefit only large reasoning models. We show it also helps small embedding models. Since modern embedding models are distilled from LLM backbones, a frozen encoder should benefit fro…

  206. arXiv cs.CL TIER_1 English(EN) · Zijian Yu, Kejun Xiao, Huaipeng Zhao, Tao Luo, Xiaoyi Zeng ·

    Shopping Companion: A Memory-Augmented LLM Agent for Real-World E-Commerce Tasks

    arXiv:2603.14864v2 Announce Type: replace Abstract: In e-commerce, LLM agents show promise for shopping tasks such as recommendations, budget management, and bundle deals, where accurately capturing user preferences from long-horizon conversations is critical. However, progress i…

  207. arXiv cs.CL TIER_1 English(EN) · Mengyin Lu, Cong Feng, Huimin Han, Guangming Lu, Yu Sun, Xiaonan Ding, Shihui Long, Fengyi Li, Tanvi Motwani ·

    SPEAR: Code-Augmented Agentic Prompt Optimization

    arXiv:2605.26275v1 Announce Type: new Abstract: Automatic prompt engineering (APE) rewrites prompts to improve downstream task performance, but existing APE loops treat the optimizer itself as a fixed pipeline. We port the code-as-action paradigm of CodeAct (Wang et al., 2024a) t…

  208. arXiv cs.AI TIER_1 English(EN) · Yinpei Dai, Hongze Fu, Jayjun Lee, Yuejiang Liu, Haoran Zhang, Jianing Yang, Chelsea Finn, Nima Fazeli, Joyce Chai ·

    RoboMME: Benchmarking and Understanding Memory for Robotic Generalist Policies

    arXiv:2603.04639v3 Announce Type: replace-cross Abstract: Memory is critical for long-horizon and history-dependent robotic manipulation. Such tasks often involve counting repeated actions or manipulating objects that become temporarily occluded. Recent vision-language-action (VL…

  209. arXiv cs.AI TIER_1 English(EN) · Furkan Sakizli ·

    Tool-Schema Compression Enables Agentic RAG Under Constrained Context Budgets

    arXiv:2605.26165v1 Announce Type: cross Abstract: Agentic RAG systems that equip language models with dozens to hundreds of tool definitions face a critical resource conflict: tool schemas consume the same context window needed for retrieval-augmented generation. We present the f…

  210. arXiv cs.MA (Multiagent) TIER_1 English(EN) · Tieying Zhang ·

    MUSE-Autoskill: Self-Evolving Agents via Skill Creation, Memory, Management, and Evaluation

    Large language model (LLM) agents rely on reusable skills to solve complex tasks. However, existing skill creation approaches treat skills as isolated and static artifacts, limiting their reusability, reliability, and long-term improvement. We propose MUSE-Autoskill Agent (Memory…

  211. Hugging Face Daily Papers TIER_1 English(EN) ·

    AGORA: Adapter-Grounded Observation-Action Retention for Inference-Free Prompt Compression in LLM Agents

    The token-level extractive compressors widely used for general LM context are structurally inappropriate for LLM agents: across 17 (env, backbone, method) cells spanning two independent token-level method families, every cell collapses to mean reward <= 0.05 despite 1.3-13.3x rea…

  212. arXiv cs.AI TIER_1 English(EN) · Yujie Zhao, Boqin Yuan, Junbo Huang, Haocheng Yuan, Zhongming Yu, Haozhou Xu, Lanxiang Hu, Abhilash Shankarampeta, Zimeng Huang, Wentao Ni, Yuandong Tian, Jishen Zhao ·

    AMA-Bench: Evaluating Long-Horizon Memory for Agentic Applications

    arXiv:2602.22769v3 Announce Type: replace Abstract: Large Language Models (LLMs) are deployed as autonomous agents in increasingly complex applications, where enabling long-horizon memory is critical for achieving strong performance. However, a significant gap exists between appl…

  213. arXiv cs.CL TIER_1 English(EN) · Xianzhong Ding, Yangyang Yu, Changwei Liu, Bill Zhao ·

    ContextEcho: A Benchmark for Persona Drift in Long Agentic-Coding Sessions

    arXiv:2605.24279v1 Announce Type: new Abstract: A frontier language model's acknowledged "helpful programming assistant" persona does not survive long agentic-coding sessions in the deployment regime that production products actually run. After hours of tool-using debugging, a mo…

  214. arXiv cs.CL TIER_1 English(EN) · Moshe Hazoom, Gal Patel, Alon Talmor, Tom Hope ·

    Iterate Until Retrieved: Factual Nugget Optimization for Discoverable Continual Corrections in Agentic RAG

    arXiv:2605.25641v1 Announce Type: new Abstract: Agentic retrieval-augmented generation (RAG) systems in complex B2B (business-to-business) settings may often receive free-form response feedback. Rather than generic feedback signals such as style, preference, or overall response q…

  215. arXiv cs.CL TIER_1 English(EN) · Zhengda Jin, Bingbing Wang, Jing Li, Ruifeng Xu, Min Zhang ·

    Mitigating Provenance-Role Collapse in Long-Term Agents via Typed Memory Representation

    arXiv:2605.25869v1 Announce Type: new Abstract: Long-term memory is essential for persistent LLM agents, yet prevailing architectures store historical interactions as unstructured, flat text. This unconstrained storage induces provenance-role collapse, a critical failure mode whe…

  216. arXiv cs.CL TIER_1 English(EN) · Haoyi Hu, Qirong Lyu, Xianghan Kong, Weiwen Liu, Jianghao Lin, Zixuan Guo, Yan Xu, Yasheng Wang, Weinan Zhang, Yong Yu ·

    Anticipate and Learn: Unleashing Idle-Time Compute in Proactive Agents

    arXiv:2605.25971v1 Announce Type: new Abstract: While AI agents demonstrate remarkable capabilities in reasoning and tool use, they remain fundamentally reactive: they compute responses only after explicit user prompts. This paradigm ignores a critical opportunity: the idle time …

  217. arXiv cs.CL TIER_1 English(EN) · Wentao Qiu, Haotian Hu, Fanyi Wang, Jinwei Kong, Yu Zhang ·

    DimMem: Dimensional Structuring for Efficient Long-Term Agent Memory

    arXiv:2605.15759v3 Announce Type: replace Abstract: Large language model (LLM) agents require long-term memory to leverage information from past interactions. However, existing memory systems often face a fidelity--efficiency trade-off: raw dialogue histories are expensive, while…

  218. arXiv cs.LG TIER_1 English(EN) · Mahavir Dabas, Jihyun Jeong, Ming Jin, Ruoxi Jia ·

    Memory-Induced Tool-Drift in LLM Agents

    arXiv:2605.24941v1 Announce Type: cross Abstract: Modern LLM agents combine long-term memory for personalization with tool-calling interfaces for taking actions in the world -- a combination underpinning contemporary production systems. We study a previously unexamined failure of…

  219. arXiv cs.AI TIER_1 English(EN) · Haiyang Shen, Xuanzhong Chen, Wendong Xu, Yun Ma, Liang Chen, Kuan Li ·

    EvoCode-Bench: Evaluating Coding Agents in Multi-Turn Iterative Interactions

    arXiv:2605.24110v1 Announce Type: new Abstract: Coding agents are increasingly used as iterative development partners, but most benchmarks still evaluate one specification followed by one final assessment. This leaves out a basic question: can an agent keep its own codebase worki…

  220. arXiv cs.AI TIER_1 English(EN) · Yuyang Hu, Hongjin Qian, Shuting Wang, Jiongnan Liu, Ziliang Zhao, Jiejun Tan, Zheng Liu, Zhicheng Dou ·

    SAM: State-Adaptive Memory for Long-Horizon Reasoning Agent

    arXiv:2605.24468v1 Announce Type: new Abstract: Long-horizon agentic reasoning requires large language models to act over long interaction histories containing thoughts, tool calls, observations, and partial conclusions. The challenge is not merely that these histories grow long,…

  221. arXiv cs.AI TIER_1 English(EN) · Yanzhou Li, Yiran Zhang, Xiaoyu Zhang, Xiaoxia Liu, Yang Liu ·

    CODESKILL: Learning Self-Evolving Skills for Coding Agents

    arXiv:2605.25430v1 Announce Type: new Abstract: Coding agents produce rich trajectories while solving software-engineering tasks. To enable agent self-evolution, these trajectories can be distilled into reusable procedural skills that compactly encode experience to guide future b…

  222. arXiv cs.AI TIER_1 English(EN) · Yeonjun In, Wonjoong Kim, Sangwu Park, Kanghoon Yoon, Chanyoung Park ·

    Personalize-then-Store: Benchmarking and Learning Personalized Memory for Long-horizon Agents

    arXiv:2605.25535v1 Announce Type: new Abstract: Existing large language model (LLM) based memory systems apply universal, static policies that overlook a fundamental reality: the contexts that are worth storing in memory are different across users. This misalignment wastes limite…

  223. arXiv cs.AI TIER_1 English(EN) · Han Chen, Zining Zhang, Wenqi Pei, Bingsheng He, Ming Wu, Jason Zeng, Michael Heinrich, Wei Wu, Hongbao Zhang ·

    MemForest: An Efficient Agent Memory System with Hierarchical Temporal Indexing

    arXiv:2605.23986v1 Announce Type: cross Abstract: Memory is a fundamental component for enabling long-context LLM agents, supporting persistent state across interactions through a continuous serve-and-update lifecycle. Despite substantial prior work, existing systems suffer from …

  224. arXiv cs.AI TIER_1 English(EN) · Haozhen Zhang, Quanyu Long, Jianzhu Bao, Tao Feng, Weizhi Zhang, Haodong Yue, Wenya Wang ·

    MemSkill: Learning and Evolving Memory Skills for Self-Evolving Agents

    arXiv:2602.02474v2 Announce Type: replace-cross Abstract: Most Large Language Model (LLM) agent memory systems rely on a small set of static, hand-designed operations for extracting memory. These fixed procedures hard-code human priors about what to store and how to revise memory…

  225. Hugging Face Daily Papers TIER_1 Deutsch(DE) ·

    SkillGrad: Optimizing Agent Skills Like Gradient Descent

    SkillGrad is a gradient-descent-inspired framework that optimizes agent skills through trajectory-level loss evidence and text-based gradients, enhancing skill reliability and performance in specialized domains.

  226. Hugging Face Daily Papers TIER_1 English(EN) ·

    MUSE-Autoskill: Self-Evolving Agents via Skill Creation, Memory, Management, and Evaluation

    A skill-centric agent framework enables continuous improvement of task-solving capabilities through a unified lifecycle of skill creation, memory, management, evaluation, and refinement.

  227. arXiv cs.IR (Information Retrieval) TIER_1 English(EN) · Yong Yu ·

    Anticipate and Learn: Unleashing Idle-Time Compute in Proactive Agents

    While AI agents demonstrate remarkable capabilities in reasoning and tool use, they remain fundamentally reactive: they compute responses only after explicit user prompts. This paradigm ignores a critical opportunity: the idle time between interactions is largely wasted, leaving …

  228. arXiv cs.IR (Information Retrieval) TIER_1 English(EN) · Yong Yu ·

    Anticipate and Learn: Unleashing Idle-Time Compute in Proactive Agents

    While AI agents demonstrate remarkable capabilities in reasoning and tool use, they remain fundamentally reactive: they compute responses only after explicit user prompts. This paradigm ignores a critical opportunity: the idle time between interactions is largely wasted, leaving …

  229. arXiv cs.CL TIER_1 English(EN) · Min Zhang ·

    Mitigating Provenance-Role Collapse in Long-Term Agents via Typed Memory Representation

    Long-term memory is essential for persistent LLM agents, yet prevailing architectures store historical interactions as unstructured, flat text. This unconstrained storage induces provenance-role collapse, a critical failure mode where agents suffer from source-monitoring errors. …

  230. arXiv cs.CL TIER_1 English(EN) · Tom Hope ·

    Iterate Until Retrieved: Factual Nugget Optimization for Discoverable Continual Corrections in Agentic RAG

    Agentic retrieval-augmented generation (RAG) systems in complex B2B (business-to-business) settings may often receive free-form response feedback. Rather than generic feedback signals such as style, preference, or overall response quality, we focus on actionable factual correctio…

  231. arXiv cs.CL TIER_1 English(EN) · Jingyi Peng, Zhongwei Wan, Weiting Liu, Qiuzhuang Sun ·

    PRISM: Pareto-Efficient Retrieval over Intent-Aware Structured Memory for Long-Horizon Agents

    arXiv:2605.12260v2 Announce Type: replace Abstract: Long-horizon language agents accumulate conversation history far faster than any fixed context window can hold, making memory management critical to both answer accuracy and serving cost. Existing approaches either expand the co…

  232. arXiv cs.CL TIER_1 English(EN) · Alina Shutova, Alexandra Olenina, Ivan Vinogradov, Anton Sinitsin ·

    Evaluating Memory Structure in LLM Agents

    arXiv:2602.11243v2 Announce Type: replace-cross Abstract: Modern LLM-based agents and chat assistants rely on long-term memory frameworks to store reusable knowledge, recall user preferences, and augment reasoning. As researchers create more complex memory architectures, it becom…

  233. Hugging Face Daily Papers TIER_1 English(EN) ·

    Anticipate and Learn: Unleashing Idle-Time Compute in Proactive Agents

    ProAct is a proactive agent architecture that uses idle-time computation to anticipate user needs and improve task completion efficiency and accuracy.

  234. Hugging Face Daily Papers TIER_1 English(EN) ·

    Personalize-then-Store: Benchmarking and Learning Personalized Memory for Long-horizon Agents

    Large language model-based memory systems can benefit from personalized policies that adapt to individual user contexts, though accurate implementation remains challenging.

  235. Hugging Face Daily Papers TIER_1 English(EN) ·

    SAM: State-Adaptive Memory for Long-Horizon Reasoning Agent

    Long-horizon agentic reasoning is enhanced through a state-adaptive memory framework that dynamically manages interaction histories by creating compact memory cues while preserving detailed trajectories for targeted retrieval.

  236. arXiv cs.AI TIER_1 English(EN) · Dongming Jiang, Yi Li, Songtao Wei, Jinxin Yang, Ayushi Kishore, Alysa Zhao, Dingyi Kang, Xu Hu, Feng Chen, Qiannan Li, Bingzhe Li ·

    Anatomy of Agentic Memory: Taxonomy and Empirical Analysis of Evaluation and System Limitations

    arXiv:2602.19320v2 Announce Type: replace-cross Abstract: Agentic memory systems enable large language model (LLM) agents to maintain state across long interactions, supporting long-horizon reasoning and personalization beyond fixed context windows. Despite rapid architectural de…

  237. arXiv cs.AI TIER_1 English(EN) · Jiawei He, Jie Jia, Chenbo Liu, Chaoyi Xue, Yapeng Song, Xikai Yang, Dong Sun ·

    ProcBench: Evaluating Process-Level Defects and Control Preservation in LLM Coding Agents

    arXiv:2605.20251v2 Announce Type: cross Abstract: Existing benchmarks for LLM coding agents primarily evaluate final outcomes. While useful for measuring overall capability, these metrics provide limited visibility and often miss defects that arise during execution. We present Pr…

  238. arXiv cs.AI TIER_1 English(EN) · Haozhen Zhang, Haodong Yue, Tao Feng, Quanyu Long, Jianzhu Bao, Bowen Jin, Weizhi Zhang, Xiao Li, Jiaxuan You, Chengwei Qin, Wenya Wang ·

    Learning Query-Aware Budget-Tier Routing for Runtime Agent Memory

    arXiv:2602.06025v2 Announce Type: replace-cross Abstract: Memory is increasingly central to Large Language Model (LLM) agents operating beyond a single context window, yet most existing systems rely on offline, query-agnostic memory construction that can be inefficient and may di…

  239. arXiv cs.CL TIER_1 English(EN) · Jingru Lin, Chen Zhang, Stephen Y. Liu, Haizhou Li ·

    RAGCap-Bench: Benchmarking Capabilities of LLMs in Agentic Retrieval Augmented Generation Systems

    arXiv:2510.13910v2 Announce Type: replace Abstract: Retrieval-Augmented Generation (RAG) mitigates key limitations of Large Language Models (LLMs)-such as factual errors, outdated knowledge, and hallucinations-by dynamically retrieving external information. Recent work extends th…

  240. arXiv cs.CL TIER_1 English(EN) · Weiwei Xie, Shaoxiong Guo, Fan Zhang, Tian Xia, Xue Yang, Lizhuang Ma, Junchi Yan, Qibing Ren ·

    MemEvoBench: Benchmarking Safety Risks from Memory Misevolution in LLM Agents

    arXiv:2604.15774v2 Announce Type: replace Abstract: Equipping Large Language Models (LLMs) with persistent memory enhances interaction continuity and personalization but introduces new safety risks. Specifically, contaminated or biased memory accumulation can trigger abnormal age…

  241. arXiv cs.LG TIER_1 English(EN) · Sikuan Yan, Ahmed Bahloul, Ercong Nie, Susanna Schwarzmann, Riccardo Trivisonno, Volker Tresp, Yunpu Ma ·

    Memory-R2: Fair Credit Assignment for Long-Horizon Memory-Augmented LLM Agents

    arXiv:2605.21768v1 Announce Type: new Abstract: Memory-augmented LLM agents enable interactions that extend beyond finite context windows by storing, updating, and reusing information across sessions. However, training such agents with reinforcement learning in multi-session envi…

  242. arXiv cs.LG TIER_1 English(EN) · Dianzhi Yu, Vireo Zhang, Hongru Wang, Yanyu Chen, Minda Hu, Wanghan Xu, Siki Chen, Philip Torr, Zhenfei Yin, Irwin King ·

    Dynamic Mixture of Latent Memories for Self-Evolving Agents

    arXiv:2605.21951v1 Announce Type: new Abstract: Achieving self-evolution in intelligent agents requires the continual accumulation of new knowledge across changing task sequences without forgetting previously acquired abilities. Existing approaches either internalize knowledge by…

  243. arXiv cs.MA (Multiagent) TIER_1 English(EN) · Zhuokai Zhao ·

    Self-Evolving Multi-Agent Systems via Decentralized Memory

    Self-evolving multi-agent systems (MAS) have emerged as a promising route to LLM agents that continually improve from experience, with persistent memory at their foundation. However, existing designs almost exclusively adopt a centralized repository shared across agents, incurrin…

  244. arXiv cs.MA (Multiagent) TIER_1 English(EN) · Yunpu Ma ·

    Memory-R2: Fair Credit Assignment for Long-Horizon Memory-Augmented LLM Agents

    Memory-augmented LLM agents enable interactions that extend beyond finite context windows by storing, updating, and reusing information across sessions. However, training such agents with reinforcement learning in multi-session environments is challenging because memory turns the…

  245. arXiv cs.CL TIER_1 English(EN) · Dimitris N. Metaxas ·

    MemGym: a Long-Horizon Memory Environment for LLM Agents

    Memory is a central capability for LLM agents operating across long-horizon tasks. Existing memory benchmarks predominantly evaluate retention of personalized information in multi-turn chat scenarios, overlooking the dynamic memory formation that occurs during extended agent exec…

  246. arXiv cs.CL TIER_1 English(EN) · Jiaxuan You ·

    Auto-Dreamer: Learning Offline Memory Consolidation for Language Agents

    Language agents increasingly operate over streams of related tasks, yet existing memory systems struggle to convert accumulated experience into reusable knowledge. Retrieval-augmented and structured memory methods record per-session observations effectively, but often couple acqu…

  247. arXiv cs.CL TIER_1 English(EN) · Bo Han ·

    Rethinking How to Remember: Beyond Atomic Facts in Lifelong LLM Agent Memory

    To enable reliable long-term interaction, LLM agents require a memory system that can faithfully store, efficiently retrieve, and deeply reason over accumulated dialogue history. Most existing methods adopt an extracted fact based paradigm: handcrafted static prompts compress raw…

  248. arXiv cs.AI TIER_1 English(EN) · Samuel Madden ·

    PEEK: Context Map as an Orientation Cache for Long-Context LLM Agents

    Large language model (LLM) agents increasingly operate over long and recurring external contexts, like document corpora and code repositories. Across invocations, existing approaches preserve either the agent's trajectory, passive access to raw material, or task-level strategies.…

  249. Hugging Face Daily Papers TIER_1 English(EN) ·

    Rethinking How to Remember: Beyond Atomic Facts in Lifelong LLM Agent Memory

    TriMem enables reliable long-term interaction for LLM agents by maintaining multiple memory representation granularities and using TextGrad-based prompt optimization for continuous improvement.

  250. arXiv cs.CL TIER_1 English(EN) · Rui Chu ·

    MMoA: An AI-Agent framework with recurrence for Memoried Mixure-of-Agent

    The Mixture-of-Agents (MoA) framework has shown promise in improving large language model (LLM) performance by aggregating outputs from multiple agents. However, existing MoA systems often rely on static routers that do not fully capture temporal and contextual dependencies acros…

  251. arXiv cs.AI TIER_1 English(EN) · Mohit Bansal ·

    LongMINT: Evaluating Memory under Multi-Target Interference in Long-Horizon Agent Systems

    Real-world agents operate over long and evolving horizons, where information is repeatedly updated and may interfere across memories, requiring accurate recall and aggregated reasoning over multiple pieces of information. However, existing benchmarks focus on static, independent …

  252. arXiv cs.AI TIER_1 English(EN) · Jia Li ·

    EvoMemBench: Benchmarking Agent Memory from a Self-Evolving Perspective

    Recent benchmarks for Large Language Model (LLM) agents mainly evaluate reasoning, planning, and execution. However, memory is also essential for agents, as it enables them to store, update, and retrieve information over time. This ability remains under-evaluated, largely because…

  253. Hugging Face Daily Papers TIER_1 English(EN) ·

    Remembering More, Risking More: Longitudinal Safety Risks in Memory-Equipped LLM Agents

    Safety evaluations of memory-equipped LLM agents typically measure within-task safety: whether an agent completes a single scenario safely, often under adversarial conditions such as prompt injection or memory poisoning. In deployment, however, a single agent serves many independ…

  254. arXiv cs.CL TIER_1 English(EN) · Ming Jin ·

    Remembering More, Risking More: Longitudinal Safety Risks in Memory-Equipped LLM Agents

    Safety evaluations of memory-equipped LLM agents typically measure within-task safety: whether an agent completes a single scenario safely, often under adversarial conditions such as prompt injection or memory poisoning. In deployment, however, a single agent serves many independ…

  255. arXiv cs.CL TIER_1 English(EN) · Olukunle Owolabi ·

    SocialMemBench: Are AI Memory Systems Ready for Social Group Settings?

    Memory systems for AI assistants were built for single-user dialogue and fail characteristically when applied to multi-party social group settings. This gap matters for the social assistants being built today: group-acting agents embedded in chat platforms, and proactive personal…

  256. Hugging Face Daily Papers TIER_1 English(EN) ·

    MemForest: An Efficient Agent Memory System with Hierarchical Temporal Indexing

    MemForest presents a memory framework for long-context LLM agents that improves scalability and reduces latency through parallel chunk extraction and hierarchical temporal indexing.

  257. arXiv cs.CL TIER_1 English(EN) · Marzia Zaman ·

    FORGE: Self-Evolving Agent Memory With No Weight Updates via Population Broadcast

    Can LLM agents improve decision-making through self-generated memory without gradient updates? We propose FORGE (Failure-Optimized Reflective Graduation and Evolution), a staged, population-based protocol that evolves prompt-injected natural-language memory for hierarchical ReAct…

  258. arXiv cs.CL TIER_1 English(EN) · James Cheng ·

    RecMem: Recurrence-based Memory Consolidation for Efficient and Effective Long-Running LLM Agents

    Memory systems often organize user-agent interactions as retrievable external memory and are crucial for long-running agents by overcoming the limited context windows of LLMs. However, existing memory systems invoke LLMs to process every incoming interaction for memory extraction…

  259. arXiv cs.CL TIER_1 English(EN) · Yu Zhang ·

    DimMem: Dimensional Structuring for Efficient Long-Term Agent Memory

    Large language model (LLM) agents require long-term memory to leverage information from past interactions. However, existing memory systems often face a fidelity--efficiency trade-off: raw dialogue histories are expensive, while flat facts or summaries may discard the structure n…

  260. arXiv cs.CL TIER_1 English(EN) · Weinan Zhang ·

    SMMBench: A Benchmark for Source-Distributed Multimodal Agent Memory

    Existing benchmarks for multimodal memory reasoning largely evaluate systems within pre-assembled contexts, but under-evaluate whether agents can use evidence distributed across independently originated sources. We argue that source-distributed memory composition is an important …

  261. arXiv cs.CL TIER_1 English(EN) · Yuchi Ma ·

    H-Mem: A Novel Memory Mechanism for Evolving and Retrieving Agent Memory via a Hybrid Structure

    Memory data are ubiquitous in Large Language Model (LLM)-based agents (e.g., OpenClaw and Manus). A few recent works have attempted to exploit agents'memory for improving their performance on the question-answering (QA) task, but they lack a principled mechanism for effectively m…

  262. arXiv cs.AI TIER_1 English(EN) · Armando Solar-Lezama ·

    MeMo: Memory as a Model

    Large language models (LLMs) achieve strong performance across a wide range of tasks, but remain frozen after pretraining until subsequent updates. Many real-world applications require timely, domain-specific information, motivating the need for efficient mechanisms to incorporat…

  263. arXiv cs.AI TIER_1 English(EN) · Jorge Alberto Hidalgo Toledo ·

    AI Knows When It's Being Watched: Functional Strategic Action and Contextual Register Modulation in Large Language Models

    Large language models (LLMs) have been extensively studied from computational and cognitive perspectives, yet their behavior as communicative actors in socially structured contexts remains underexplored. This study examines whether LLM-based multi-agent systems exhibit systematic…

  264. arXiv cs.CL TIER_1 English(EN) · Evgeniy Gabrilovich ·

    GroupMemBench: Benchmarking LLM Agent Memory in Multi-Party Conversations

    Large Language Model (LLM) agents increasingly serve as personal assistants and workplace collaborators, where their utility depends on memory systems that extract, retrieve, and apply information across long-running conversations. However, both existing memory systems and benchm…

  265. arXiv cs.CL TIER_1 English(EN) · Hong Yan ·

    Agentic Recommender System with Hierarchical Belief-State Memory

    Memory-augmented LLM agents have advanced personalized recommendation, yet existing approaches universally adopt flat memory representations that conflate ephemeral signals with stable preferences, and none provides a complete lifecycle governing how memory should evolve. We prop…

  266. Hugging Face Daily Papers TIER_1 English(EN) ·

    Agentic Recommender System with Hierarchical Belief-State Memory

    Memory-augmented LLM agents have advanced personalized recommendation, yet existing approaches universally adopt flat memory representations that conflate ephemeral signals with stable preferences, and none provides a complete lifecycle governing how memory should evolve. We prop…

  267. arXiv cs.CL TIER_1 English(EN) · Kai-Wei Chang ·

    LongMemEval-V2: Evaluating Long-Term Agent Memory Toward Experienced Colleagues

    Long-term memory is crucial for agents in specialized web environments, where success depends on recalling interface affordances, state dynamics, workflows, and recurring failure modes. However, existing memory benchmarks for agents mostly focus on user histories, short traces, o…

  268. arXiv cs.AI TIER_1 English(EN) · William Parris ·

    Semantic Reward Collapse and the Preservation of Epistemic Integrity in Adaptive AI Systems

    Recent advances in reinforcement learning from human feedback (RLHF) and preference optimization have substantially improved the usability, coherence, and safety of large language models. However, recurring behaviors such as performative certainty, hallucinated continuity, calibr…

  269. Hugging Face Daily Papers TIER_1 English(EN) ·

    Executable Agentic Memory for GUI Agent

    Modern GUI agents typically rely on a model-centric and step-wise interaction paradigm, where LLMs must re-interpret the UI and re-decide actions at every screen, which is fragile in long-horizon tasks. In this paper, we propose Executable Agentic Memory (EAM), a structured Knowl…

  270. arXiv cs.CL TIER_1 English(EN) · Qiuzhuang Sun ·

    PRISM: Pareto-Efficient Retrieval over Intent-Aware Structured Memory for Long-Horizon Agents

    Long-horizon language agents accumulate conversation history far faster than any fixed context window can hold, making memory management critical to both answer accuracy and serving cost. Existing approaches either expand the context window without addressing what is retrieved, p…

  271. arXiv cs.AI TIER_1 English(EN) · Scott Sanner ·

    Goal-Oriented Reasoning for RAG-based Memory in Conversational Agentic LLM Systems

    LLM-based conversational AI agents struggle to maintain coherent behavior over long horizons due to limited context. While RAG-based approaches are increasingly adopted to overcome this limitation by storing interactions in external memory modules and performing retrieval from th…

  272. arXiv cs.AI TIER_1 English(EN) · Zenglin Xu ·

    Remember the Decision, Not the Description: A Rate-Distortion Framework for Agent Memory

    Long-horizon language agents must operate under limited runtime memory, yet existing memory mechanisms often organize experience around descriptive criteria such as relevance, salience, or summary quality. For an agent, however, memory is valuable not because it faithfully descri…

  273. arXiv cs.AI TIER_1 English(EN) · Jimmy Lin ·

    Rethinking Agentic Search with Pi-Serini: Is Lexical Retrieval Sufficient?

    Does a lexical retriever suffice as large language models (LLMs) become more capable in an agentic loop? This question naturally arises when building deep research systems. We revisit it by pairing BM25 with frontier LLMs that have better reasoning and tool-use abilities. To supp…

  274. arXiv cs.AI TIER_1 English(EN) · Min Zhang ·

    MemReread: Enhancing Agentic Long-Context Reasoning via Memory-Guided Rereading

    To tackle long-context reasoning tasks without the quadratic complexity of standard attention mechanisms, approaches based on agent memory have emerged, which typically maintain a dynamically updated memory when linearly processing document chunks. To mitigate the potential loss …

  275. arXiv cs.AI TIER_1 English(EN) · Tony Q. S. Quek ·

    Bridging the Cognitive Gap: A Unified Memory Paradigm for 6G Agentic AI-RAN

    As 6G evolves, the radio access network must transcend traditional automation to embrace agentic AI capable of perception, reasoning, and evolution. A fundamental cognitive gap persists in current disaggregated architectures, where interfaces force the physical layer to compress …

  276. arXiv cs.CL TIER_1 English(EN) · Jianfei Yang ·

    InterLV-Search: Benchmarking Interleaved Multimodal Agentic Search

    Existing benchmarks for multimodal agentic search evaluate multimodal search and visual browsing, but visual evidence is either confined to the input or treated as an answer endpoint rather than part of an interleaved search trajectory. We introduce \textbf{InterLV-Search}, a ben…

  277. arXiv cs.AI TIER_1 English(EN) · Huyu Wu, Jun Liu, Xiaochi Wei, Yan Gao, Yi Wu, Yao Hu ·

    Knowledge-Graph Paths as Intermediate Supervision for Self-Evolving Search Agents

    arXiv:2605.05702v1 Announce Type: new Abstract: Self-evolving search agents reduce reliance on human-written training questions by generating and solving their own search tasks. We build on Search Self-Play (SSP), a representative Proposer and Solver framework in which questions …

  278. arXiv cs.AI TIER_1 English(EN) · Spyros Galanis ·

    Information Aggregation with AI Agents

    arXiv:2604.20050v2 Announce Type: replace-cross Abstract: Can Large Language Models (AI agents) aggregate dispersed private information through trading and reason about the knowledge of others by observing price movements? We conduct a controlled experiment where AI agents trade …

  279. arXiv cs.AI TIER_1 English(EN) · Yuxiang Zhang, Jiangming Shu, Ye Ma, Xueyuan Lin, Shangxi Wu, Jitao Sang ·

    Memory as Action: Autonomous Context Curation for Long-Horizon Agentic Tasks

    arXiv:2510.12635v3 Announce Type: replace Abstract: Long-context Large Language Models, despite their expanded capacity, require careful working memory management to mitigate attention dilution during long-horizon tasks. Yet existing approaches rely on external mechanisms that la…

  280. arXiv cs.AI TIER_1 English(EN) · Zhuofeng Li, Haoxiang Zhang, Cong Wei, Pan Lu, Ping Nie, Yi Lu, Yuyang Bai, Shangbin Feng, Hangxiao Zhu, Ming Zhong, Yuyu Zhang, Jianwen Xie, Yejin Choi, James Zou, Jiawei Han, Wenhu Chen, Jimmy Lin, Dongfu Jiang, Yu Zhang ·

    Beyond Semantic Similarity: Rethinking Retrieval for Agentic Search via Direct Corpus Interaction

    arXiv:2605.05242v1 Announce Type: cross Abstract: Modern retrieval systems, whether lexical or semantic, expose a corpus through a fixed similarity interface that compresses access into a single top-k retrieval step before reasoning. This abstraction is efficient, but for agentic…

  281. arXiv cs.LG TIER_1 English(EN) · Yijia Zheng, Marcel Worring ·

    LatentRAG: Latent Reasoning and Retrieval for Efficient Agentic RAG

    arXiv:2605.06285v1 Announce Type: cross Abstract: Single-step retrieval-augmented generation (RAG) provides an efficient way to incorporate external information for simple question answering tasks but struggles with complex questions. Agentic RAG extends this paradigm by replacin…

  282. arXiv cs.AI TIER_1 English(EN) · Susheel Suresh, Hazel Mak, Shangpo Chou, Fred Kroon, Sahil Bhatnagar ·

    AgenticRAG: Agentic Retrieval for Enterprise Knowledge Bases

    arXiv:2605.05538v1 Announce Type: new Abstract: We present AgenticRAG, a practical agentic harness for retrieval and analysis over enterprise knowledge bases. Standard RAG pipelines place significant burden of grounding on the search stack, constraining the language model to a fi…

  283. arXiv cs.CL TIER_1 English(EN) · Junfeng Liao, Qizhou Wang, Jianing Zhu, Bo Du, Rui Yan, Xiuying Chen ·

    Belief Memory: Agent Memory Under Partial Observability

    arXiv:2605.05583v1 Announce Type: cross Abstract: LLM agents that operate over long context depend on external memory to accumulate knowledge over time. However, existing methods typically store each observation as a single deterministic conclusion (e.g., inferring "API~X failed"…

  284. arXiv cs.CL TIER_1 English(EN) · Chunyu Li, Jingyi Kang, Ding Chen, Mengyuan Zhang, Jiajun Shen, Bo Tang, Xuanhe Zhou, Feiyu Xiong, Zhiyu Li ·

    MemReranker: Reasoning-Aware Reranking for Agent Memory Retrieval

    arXiv:2605.06132v1 Announce Type: new Abstract: In agent memory systems, the reranking model serves as the critical bridge connecting user queries with long-term memory. Most systems adopt the "retrieve-then-rerank" two-stage paradigm, but generic reranking models rely on semanti…

  285. arXiv cs.LG TIER_1 English(EN) · Zeyu Yang, Qi Ma, Jason Chen, Anshumali Shrivastava ·

    Superintelligent Retrieval Agent: The Next Frontier of Information Retrieval

    arXiv:2605.06647v1 Announce Type: cross Abstract: Retrieval-augmented agents are increasingly the interface to large organizational knowledge bases, yet most still treat retrieval as a black box: they issue exploratory queries, inspect returned snippets, and iteratively reformula…

  286. arXiv cs.AI TIER_1 English(EN) · Anshumali Shrivastava ·

    Superintelligent Retrieval Agent: The Next Frontier of Information Retrieval

    Retrieval-augmented agents are increasingly the interface to large organizational knowledge bases, yet most still treat retrieval as a black box: they issue exploratory queries, inspect returned snippets, and iteratively reformulate until useful evidence emerges. This approach re…

  287. arXiv cs.CL TIER_1 English(EN) · Marcel Worring ·

    LatentRAG: Latent Reasoning and Retrieval for Efficient Agentic RAG

    Single-step retrieval-augmented generation (RAG) provides an efficient way to incorporate external information for simple question answering tasks but struggles with complex questions. Agentic RAG extends this paradigm by replacing single-step retrieval with a multi-step process,…

  288. Hugging Face Daily Papers TIER_1 English(EN) ·

    LatentRAG: Latent Reasoning and Retrieval for Efficient Agentic RAG

    Single-step retrieval-augmented generation (RAG) provides an efficient way to incorporate external information for simple question answering tasks but struggles with complex questions. Agentic RAG extends this paradigm by replacing single-step retrieval with a multi-step process,…

  289. arXiv cs.CL TIER_1 English(EN) · Zhiyu Li ·

    MemReranker: Reasoning-Aware Reranking for Agent Memory Retrieval

    In agent memory systems, the reranking model serves as the critical bridge connecting user queries with long-term memory. Most systems adopt the "retrieve-then-rerank" two-stage paradigm, but generic reranking models rely on semantic similarity matching and lack genuine reasoning…

  290. arXiv cs.CL TIER_1 English(EN) · Joshua Adler, Guy Zehavi ·

    Storage Is Not Memory: A Retrieval-Centered Architecture for Agent Recall

    arXiv:2605.04897v1 Announce Type: new Abstract: Extraction at ingestion is the wrong primitive for agent memory: content discarded before the query is known cannot be recovered at retrieval time. We propose True Memory, a six-layer architecture that shifts the center of the syste…

  291. arXiv cs.AI TIER_1 English(EN) · Siheng Chen ·

    LongSeeker: Elastic Context Orchestration for Long-Horizon Search Agents

    Long-horizon search agents must manage a rapidly growing working context as they reason, call tools, and observe information. Naively accumulating all intermediate content can overwhelm the agent, increasing costs and the risk of errors. We propose that effective context manageme…

  292. arXiv cs.CL TIER_1 English(EN) · Guy Zehavi ·

    Storage Is Not Memory: A Retrieval-Centered Architecture for Agent Recall

    Extraction at ingestion is the wrong primitive for agent memory: content discarded before the query is known cannot be recovered at retrieval time. We propose True Memory, a six-layer architecture that shifts the center of the system from a storage schema to a multi-stage retriev…

  293. arXiv cs.AI TIER_1 English(EN) · Altan Cakir, Ayca Yerlikaya ·

    From Experimental Limits to Physical Insight: A Retrieval-Augmented Multi-Agent Framework for Interpreting Searches Beyond the Standard Model

    arXiv:2605.02491v1 Announce Type: cross Abstract: Modern searches for physics beyond the Standard Model produce rapidly expanding literature containing heterogeneous information, including textual analyses, numerical datasets, and graphical exclusion limits. Integrating these dis…

  294. arXiv cs.CL TIER_1 English(EN) · Yilun Zhao, Jinbiao Wei, Tingyu Song, Siyue Zhang, Chen Zhao, Arman Cohan ·

    Rethinking Reasoning-Intensive Retrieval: Evaluating and Advancing Retrievers in Agentic Search Systems

    arXiv:2605.04018v1 Announce Type: new Abstract: Reasoning-intensive retrieval aims to surface evidence that supports downstream reasoning rather than merely matching topical similarity. This capability is increasingly important for agentic search systems, where retrievers must pr…

  295. arXiv cs.CL TIER_1 English(EN) · Arman Cohan ·

    Rethinking Reasoning-Intensive Retrieval: Evaluating and Advancing Retrievers in Agentic Search Systems

    Reasoning-intensive retrieval aims to surface evidence that supports downstream reasoning rather than merely matching topical similarity. This capability is increasingly important for agentic search systems, where retrievers must provide complementary evidence across iterative se…

  296. arXiv cs.AI TIER_1 English(EN) · Ayca Yerlikaya ·

    From Experimental Limits to Physical Insight: A Retrieval-Augmented Multi-Agent Framework for Interpreting Searches Beyond the Standard Model

    Modern searches for physics beyond the Standard Model produce rapidly expanding literature containing heterogeneous information, including textual analyses, numerical datasets, and graphical exclusion limits. Integrating these distributed sources remains a time-consuming and manu…

  297. Hugging Face Daily Papers TIER_1 English(EN) ·

    From Experimental Limits to Physical Insight: A Retrieval-Augmented Multi-Agent Framework for Interpreting Searches Beyond the Standard Model

    Modern searches for physics beyond the Standard Model produce rapidly expanding literature containing heterogeneous information, including textual analyses, numerical datasets, and graphical exclusion limits. Integrating these distributed sources remains a time-consuming and manu…

  298. arXiv cs.CV TIER_1 English(EN) · Can Lin, Tao Feng, Hangjie Yuan, Dan Zhang, Yifan Zhu, Zhonghong Ou ·

    GUI-AC: Enhancing Continual Learning in GUI Agents

    arXiv:2606.10522v1 Announce Type: new Abstract: Graphical User Interfaces (GUIs) serve as the dominant medium for human-computer interaction, yet building GUI agents that generalize across the vast diversity of real-world interface environments, with the same flexibility and robu…

  299. arXiv cs.CV TIER_1 English(EN) · Zhonghong Ou ·

    GUI-AC: Enhancing Continual Learning in GUI Agents

    Graphical User Interfaces (GUIs) serve as the dominant medium for human-computer interaction, yet building GUI agents that generalize across the vast diversity of real-world interface environments, with the same flexibility and robustness that humans naturally exhibit, remains un…

  300. arXiv stat.ML TIER_1 English(EN) · Tong Yang, Yu Huang, Yingbin Liang, Yuejie Chi ·

    Agentic Transformers Provably Learn to Search via Reinforcement Learning

    arXiv:2606.00183v1 Announce Type: cross Abstract: Tree search is a central abstraction behind many language-agent reasoning and decision-making tasks: agents must explore actions, remember failures, and backtrack toward promising alternatives. Yet, we lack a theoretical understan…

  301. arXiv stat.ML TIER_1 English(EN) · Sijia Wang, Dhanajit Brahma, Ricardo Henao ·

    SAGE: A Novelty Gate for Efficient Memory Evolution in Agentic LLMs

    arXiv:2605.30711v1 Announce Type: cross Abstract: Agentic LLMs must continuously decide whether newly extracted facts should be added, merged with existing memories, or ignored, yet prior work has focused more on retrieval and storage than on principled write-side control. We fra…

  302. arXiv cs.CV TIER_1 English(EN) · Tao Zou, Yichen He, Tian Qiu, Yuan Lin, Hang Li ·

    Task-Focused Memorization for Multimodal Agents

    arXiv:2605.31075v1 Announce Type: new Abstract: Long-term memory is essential for multimodal agents to build coherent experience, accumulate world knowledge, and achieve continual learning. However, constructing effective memory goes beyond memory module design and basic requirem…

  303. arXiv stat.ML TIER_1 English(EN) · Yuejie Chi ·

    Agentic Transformers Provably Learn to Search via Reinforcement Learning

    Tree search is a central abstraction behind many language-agent reasoning and decision-making tasks: agents must explore actions, remember failures, and backtrack toward promising alternatives. Yet, we lack a theoretical understanding of how transformer-based policies acquire suc…

  304. arXiv cs.CV TIER_1 English(EN) · Hang Li ·

    Task-Focused Memorization for Multimodal Agents

    Long-term memory is essential for multimodal agents to build coherent experience, accumulate world knowledge, and achieve continual learning. However, constructing effective memory goes beyond memory module design and basic requirements such as accuracy and fidelity; the key chal…

  305. arXiv cs.CV TIER_1 English(EN) · Yihang Tao, Yu Guo, Senkang Hu, Yanan Ma, Zihan Fang, Sam Kwong, Yuguang Fang ·

    V2XCrafter: Learning to Generate Driving Scene Across Agents

    arXiv:2605.29471v1 Announce Type: new Abstract: Collaborative driving systems leverage vehicle-to-everything (V2X) communication for multi-agent collaborative perception to enhance driving safety, yet they remain constrained by scarce annotated real-world V2X driving datasets and…

  306. arXiv stat.ML TIER_1 English(EN) · Ricardo Henao ·

    SAGE: A Novelty Gate for Efficient Memory Evolution in Agentic LLMs

    Agentic LLMs must continuously decide whether newly extracted facts should be added, merged with existing memories, or ignored, yet prior work has focused more on retrieval and storage than on principled write-side control. We frame memory evolution as a novelty-detection problem…

  307. arXiv cs.CV TIER_1 English(EN) · Xiaozhu Ju ·

    Robo-Cortex: A Self-Evolving Embodied Agent via Dual-Grain Cognitive Memory and Autonomous Knowledge Induction

    The ability to navigate and interact with complex environments is central to real-world embodied agents, yet navigation in unseen environments remains challenging due to "experiential amnesia," where existing trajectory-driven or reactive policies fail to synthesize generalizable…

  308. arXiv cs.CV TIER_1 English(EN) · Jiebo Luo ·

    MementoGUI: Learning Agentic Multimodal Memory Control for Long-Horizon GUI Agents

    Recent GUI agents have made substantial progress in visual grounding and action prediction, yet they remain brittle in long-horizon tasks that require maintaining task state across many interface transitions. Existing agents typically rely on raw history replay or text-only memor…

  309. arXiv cs.CV TIER_1 English(EN) · Ruixiang Tang ·

    MemEye: A Visual-Centric Evaluation Framework for Multimodal Agent Memory

    Long-term agent memory is increasingly multimodal, yet existing evaluations rarely test whether agents preserve the visual evidence needed for later reasoning. In prior work, many visually grounded questions can be answered using only captions or textual traces, allowing answers …

  310. Together AI blog TIER_1 English(EN) ·

    Benchmarking inference at scale: coding agents

    Real-world inference benchmarks for coding agents: 31% more TPS than TensorRT-LLM, 2× better TTFT at saturation, and 76% lower cost than Claude Opus 4.6.

  311. Together AI blog TIER_1 English(EN) ·

    CoderForge-Preview: SOTA open dataset for training efficient coding agents

  312. Together AI blog TIER_1 English(EN) ·

    DeepSWE: Training a Fully Open-sourced, State-of-the-Art Coding Agent by Scaling RL

  313. Forbes — Innovation TIER_1 English(EN) · Liran Zvibel, Forbes Councils Member ·

    AI’s Memory Crisis Is Here: Don’t Hoard, Optimize

    The AI industry has been papering over architectural inefficiency with raw capacity.

  314. dev.to — Claude Code tag TIER_1 English(EN) · Pandit ·

    Teaching an AI to Never Forget: How the Memory System Works

    <p><em>Part 3 of the series: <a href="https://dev.to/panditabhis/how-i-turned-claude-into-a-disciplined-senior-developer-not-just-a-fast-one-1a59">Building Your AI Developer Handbook</a></em></p> <h2> The Goldfish Problem </h2> <p>By default, every Claude session starts completel…

  315. MarkTechPost TIER_1 English(EN) · Asif Razzaq ·

    Meet Harness-1: A 20B Retrieval Subagent Trained With Reinforcement Learning Inside a Stateful Search Harness on gpt-oss-20b

    <p>UIUC and Chroma's Harness-1 is a 20B retrieval subagent trained with reinforcement learning inside a stateful search harness. The harness maintains the bookkeeping — candidate pool, importance-tagged curated set, evidence graph, verification records — while the policy decides …

  316. Pandaily TIER_1 English(EN) · [email protected] (Pandaily) ·

    USTC Open-Sources Agent-Driven Long-Context Training Paradigm: 30B Matches Qwen3-235B

    Researchers at the University of Science and Technology of China (USTC) have open-sourced a novel agent-driven long-context training paradigm that achieves breakthrough efficiency — a 30-billion-parameter model matching the performance of Alibaba'...

  317. dev.to — Claude Code tag TIER_1 English(EN) · Harrison Guo ·

    Agent Memory Is a Cache Coherence Problem

    <p>This post is one half of a pair. The other half — <a href="https://harrisonsec.com/blog/agent-retrieval-cost-curve-claude-code-grep-vs-rag/" rel="noopener noreferrer"><em>Agent Retrieval Is a Cost Curve Problem</em></a> — argues that Claude Code's within-session code retrieval…

  318. dev.to — Claude Code tag TIER_1 English(EN) · Odilon HUGONNOT ·

    The AI That Improves Itself: Autonomous Prompt Iteration Loop

    <p>Each roast was taking 50 seconds per upload. Quality was unknown — we had a feeling, not data. The prompt had been written "by instinct" and never seriously evaluated. The question was simple: how do you know if a prompt is good, and how do you improve it without spending the …

  319. dev.to — Claude Code tag TIER_1 English(EN) · Harrison Guo ·

    Agent Retrieval Is a Cost Curve Problem: Why Claude Code Doesn't Use RAG

    <p>There's a popular interview question making the rounds: <em>"Why doesn't Claude Code use RAG to retrieve code? Why grep?"</em></p> <p>The popular answer goes: chunking breaks code structure, vectors approximate when code demands exact, indexes go stale, cold-start is slow, ret…

  320. MarkTechPost TIER_1 English(EN) · Michal Sutter ·

    Tencent Open-Sources TencentDB Agent Memory: A 4-Tier Local Memory Pipeline for AI Agents

    <p>Tencent has open-sourced TencentDB Agent Memory, a fully local memory system for AI agents released under the MIT license. The project pairs symbolic short-term memory, which offloads verbose tool logs into a compact Mermaid task canvas, with a 4-tier long-term memory pyramid …

  321. dev.to — Claude Code tag TIER_1 English(EN) · Toni Antunovic ·

    Transitive Prompt Injection in Multi-Agent Coding Pipelines: One Poisoned Tool, Every Downstream Agent

    <p><em>This article was originally published on <a href="https://lucidshark.com/blog/multi-agent-transitive-prompt-injection-coding-pipelines-2026" rel="noopener noreferrer">LucidShark Blog</a>.</em></p> <p>The upgrade from single-agent to multi-agent coding workflows felt like a…

  322. MarkTechPost TIER_1 English(EN) · Asif Razzaq ·

    A Step-by-Step Coding Tutorial to Implement GBrain: The Self-Wiring Memory Layer Built by Y Combinator’s Garry Tan for AI Agents

    <p>AI agents start every session from zero — no memory of meetings, notes, or decisions. GBrain, the open-source memory layer Y Combinator's Garry Tan built to power his own OpenClaw and Hermes deployments, fixes that with a markdown-first knowledge graph that wires itself throug…

  323. dev.to — Claude Code tag TIER_1 English(EN) · Michael Tuszynski ·

    The Coding Agent Stack Has Two Layers

    <p>The current "<a href="https://www.youtube.com/results?search_query=hermes+agent+vs+claude+code" rel="noopener noreferrer">Hermes Agent vs Claude Code</a>" framing is the wrong comparison. The two tools live at different layers of the coding agent stack, and most of the YouTube…

  324. dev.to — Claude Code tag TIER_1 English(EN) · The Hive Collective ·

    Give every Claude Code agent a shared, growing memory with one hook

    <p>Run Claude Code on real work for a while and you notice the same thing. Your agent figures out a non-obvious thing — a Postgres <code>VACUUM</code> quirk, a Tailwind v4 + shadcn collision, a Next.js caching gotcha — and that knowledge dies with the conversation. The next agent…

  325. dev.to — Claude Code tag TIER_1 English(EN) · Theo Valmis ·

    Long-running agents need more than memory

    <blockquote> <p>Anthropic's managed-agent harness solves one hard problem: continuity. Progress logs, feature lists, git checkpoints, and startup scripts give each new session a map of what happened. But continuity is not governance. As agents work across more sessions, the quest…

  326. dev.to — Claude Code tag TIER_1 English(EN) · Andrew ·

    agentmemory Review: Persistent Memory for AI Coding Agents

    <blockquote> <p><em><strong>Originally published on <a href="https://andrew.ooo/posts/agentmemory-persistent-memory-ai-coding-agents-review/" rel="noopener noreferrer">andrew.ooo</a></strong> — visit the original for any updates, code snippets that aged out, or follow-up posts.</…

  327. dev.to — Claude Code tag TIER_1 Français(FR) · Michel Faure ·

    Six days, six seconds: a CI test against the semantic drift of an AI agent

    <p><a class="article-body-image-wrapper" href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmrmmh12ksnvs4h7qnrww.png"><img alt="Strip BD — Françoise deman…

  328. dev.to — Claude Code tag TIER_1 English(EN) · Michel Faure ·

    Six days, six seconds: a CI test against semantic-layer drift on an AI agent

    <p><a class="article-body-image-wrapper" href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmrmmh12ksnvs4h7qnrww.png"><img alt="Comic strip — Françoise as…

  329. dev.to — MCP tag TIER_1 English(EN) · EvanLin | Contorium ·

    Building an AI Memory Layer: A Problem I Didn’t Expect

    <p><a class="article-body-image-wrapper" href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fosb0b7w7dvbkf4h7zu0h.png"><img alt=" " height="533" src="https…

  330. dev.to — MCP tag TIER_1 English(EN) · Red Fox Code ·

    Long-Term Memory for LLM Agents That Works

    <p>A support agent tells a customer their plan is still Enterprise, even though finance downgraded it last week. A coding copilot forgets a repo convention it learned yesterday. A personal assistant remembers your old home address and uses it to book a service call. These are not…

  331. dev.to — MCP tag TIER_1 English(EN) · h-wata ·

    kioku-mesh: Why I put Zenoh under my AI's long-term memory

    <blockquote> <p>This article was written with help from Claude (an AI). I reviewed and edited it before publishing.</p> </blockquote> <h2> The gap between Claude Code and the web app </h2> <p>If you've lived in Claude Code for a while and then go back to the web version of an AI …

  332. dev.to — MCP tag TIER_1 English(EN) · h-wata ·

    Show DEV: kioku-mesh — shared long-term memory for AI coding agents across PCs

    <p>I made <strong>kioku-mesh</strong>, which shares long-term memory for AI agents across multiple PCs and across multiple agents. <code>kioku</code> (記憶) means memory in Japanese.</p> <p><a class="article-body-image-wrapper" href="https://media2.dev.to/dynamic/image/width=800%2C…

  333. Medium — MCP tag TIER_1 Português(PT) · Flavio Santos ·

    MCP beyond tool calling: shared memory for multi-agent systems

    <div class="medium-feed-item"><p class="medium-feed-image"><a href="https://medium.com/@flaviocs/mcp-al%C3%A9m-do-tool-calling-mem%C3%B3ria-compartilhada-para-sistemas-multiagente-c412170eed8c?source=rss------mcp-5"><img src="https://cdn-images-1.medium.com/max/1672/1*yeY3OLiy6pB…

  334. Towards AI TIER_1 English(EN) · Michael Neuberger ·

    Constant-Cost Persistent Semantic State Memory Engine for LLM Agents

    <h4>Why your agent’s input footprint doesn’t have to grow with conversation length and what changes when it stops.</h4><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*4evcKq6ZFvPmzc1GSeRFpA.png" /></figure><p>If you’ve shipped anything with an LLM in the loop,…

  335. Medium — AI coding tag TIER_1 English(EN) · Amin Tazifor ·

    Engineering memory for AI agents: closing the read-side loop

    <div class="medium-feed-item"><p class="medium-feed-snippet">I shipped a discipline three weeks ago. The write half worked; the read half didn&#x2019;t. Three Python scripts, two hooks, and one honest&#x2026;</p><p class="medium-feed-link"><a href="https://medium.com/@amin.tazifo…

  336. Towards AI TIER_1 English(EN) · Anna Jey ·

    AI Agent Memory Architecture: How to Build Long-Term Memory That Does Not Rot

    <figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*xC5VTBj-n8azsxkbUe7Q4Q.jpeg" /><figcaption>AI Agent Memory Architecture</figcaption></figure><p>Most AI agent memory failures do not look dramatic. The agent simply remembers the wrong thing with confidence, forg…

  337. Towards AI TIER_1 English(EN) · Raj kumar ·

    Building AI Agents Part 2B: Memory Systems That Make AI Agents Smarter Over Time

    <h4>How short-term memory, long-term memory, vector recall, and user context help agents learn, adapt, and personalize decisions</h4><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*3yI6bcYp1aswUDx2_-KZdg.png" /></figure><p>In <a href="https://medium.com/@er.ra…

  338. dev.to — MCP tag TIER_1 English(EN) · Nicolas Primeau ·

    Building a CRDT-replicated memory mesh for AI agents

    <p>Every multi-agent setup I tried ran into the same wall: the agents couldn't remember anything together.</p> <p>Each Claude Code session started cold. Two agents working the same repo had no idea what the other had done. The "shared context" I kept building turned into a gravey…

  339. Towards AI TIER_1 English(EN) · Arijit Dutta ·

    The “Stale /Plan” Problem in Coding Agents

    <p>If you are using any coding agent for long running implementation/deubgging tasks you might have already run into this problem:</p><p>The agent writes a plan.<br />You agree on the plan.<br />Implementation starts.</p><p>Then reality changes in implementaion/testing phase.</p>…

  340. Medium — AI coding tag TIER_1 English(EN) · IAKH Studio ·

    Advanced Prompt Engineering for AI Coding Agents: The Skill That Separates Good Output from Great…

    <div class="medium-feed-item"><p class="medium-feed-image"><a href="https://ikh4ever.medium.com/advanced-prompt-engineering-for-ai-coding-agents-the-skill-that-separates-good-output-from-great-784b7bf8475d?source=rss------ai_coding-5"><img src="https://cdn-images-1.medium.com/max…

  341. Medium — MCP tag TIER_1 English(EN) · Rosetta Guo ·

    Why AI Agent Memory is Often in a Form of MCP (A Discussion Inspired by Contextberg)

    <div class="medium-feed-item"><p class="medium-feed-image"><a href="https://medium.com/@rosettaguo/why-ai-agent-memory-is-often-in-a-form-of-mcp-a-discussion-inspired-by-contextberg-e489916e1b9f?source=rss------mcp-5"><img src="https://cdn-images-1.medium.com/max/1536/1*_KW5OfNay…

  342. Mastodon — sigmoid.social TIER_1 한국어(KO) · [email protected] ·

    Mnemosyne – Local-first memory for Hermes AI agents, sub-millisecond recalls

    Mnemosyne – Memory for AI Hermes Agents, Sub-Millisecond Recalls, Local First Mnemosyne는 Hermes AI 에이전트를 위한 로컬 우선 메모리 시스템으로, SQLite 기반의 서브밀리초 응답 속도와 100% 개인 정보 보호를 제공한다. 클라우드나 외부 API 없이 완전 오프라인에서 작동하며, 벡터 검색과 하이브리드 랭킹을 지원해 빠르고 정확한 기억 회수가 가능하다. BEAM 아키텍처를 통해 작업 메모리, 에피소드 메모리, 스크래치…

  343. Medium — AI coding tag TIER_1 English(EN) · Dr. Fadi Shaar ·

    Semble: The Semantic Code Search Library That Gives AI Coding Agents 94% Recall at 2,000 Tokens…

    <div class="medium-feed-item"><p class="medium-feed-image"><a href="https://medium.com/@eng.fadishaar/semble-the-semantic-code-search-library-that-gives-ai-coding-agents-94-recall-at-2-000-tokens-3fbd8031622f?source=rss------ai_coding-5"><img src="https://cdn-images-1.medium.com/…

  344. Medium — AI coding tag TIER_1 English(EN) · Dr. Fadi Shaar ·

    Semble: The Semantic Code Search Library That Gives AI Coding Agents 94% Recall at 2,000 Tokens…

    <div class="medium-feed-item"><p class="medium-feed-image"><a href="https://medium.com/open-intelligence/semble-the-semantic-code-search-library-that-gives-ai-coding-agents-94-recall-at-2-000-tokens-3fbd8031622f?source=rss------ai_coding-5"><img src="https://cdn-images-1.medium.c…

  345. Towards AI TIER_1 English(EN) · Armin Norouzi, Ph.D ·

    Agent Memory with Vector Stores: HNSW, Forgetting, and Budgets

    <div class="medium-feed-item"><p class="medium-feed-image"><a href="https://pub.towardsai.net/agent-memory-with-vector-stores-hnsw-forgetting-and-budgets-a6ad00c76841?source=rss----98111c9905da---4"><img src="https://cdn-images-1.medium.com/max/991/1*xp2y-O4cQtBE-u92nUQq-A.png" w…

  346. Medium — AI coding tag TIER_1 English(EN) · Muhammad Rizwan ·

    AI Coding Agents Should Not Hide Memory - Why NanoAgent Stores It in Repo Files

    <div class="medium-feed-item"><p class="medium-feed-image"><a href="https://medium.com/@rizwan3d/ai-coding-agents-should-not-hide-memory-why-nanoagent-stores-it-in-repo-files-6ccf037d2a52?source=rss------ai_coding-5"><img src="https://cdn-images-1.medium.com/max/2600/0*Xe86tIJdfP…

  347. Mastodon — sigmoid.social TIER_1 English(EN) · [email protected] ·

    GBrain is a new open-source memory layer for AI agents built by Y Combinator's Garry Tan. It uses a markdown-first knowledge graph that auto-wires itself throug

    GBrain is a new open-source memory layer for AI agents built by Y Combinator's Garry Tan. It uses a markdown-first knowledge graph that auto-wires itself through regex inference, requiring zero LLM calls. His production brain already holds 146,646 pages, 24,585 people and 5,339 c…

  348. Mastodon — sigmoid.social TIER_1 English(EN) · [email protected] ·

    CLI vs MCP: Which Tool Interface Actually Works for AI Coding Agents? A technical comparison of CLI tools and Model Context Protocol for AI coding agents. Cover

    CLI vs MCP: Which Tool Interface Actually Works for AI Coding Agents? A technical comparison of CLI tools and Model Context Protocol for AI coding agents. Covers token cost, reliability, composability, and setup friction so you can pick the right interface. https:// pickuma.com/p…

  349. Mastodon — sigmoid.social TIER_1 English(EN) · [email protected] ·

    Automate Python Code Reviews with Free Local LLMs and GitHub Actions Wire an open-weight model running in Ollama into a GitHub Actions workflow to get automated

    Automate Python Code Reviews with Free Local LLMs and GitHub Actions Wire an open-weight model running in Ollama into a GitHub Actions workflow to get automated first-pass code-review comments on Python pull requests — no API bill required. https:// pickuma.com/posts/automate-pyt…

  350. Mastodon — sigmoid.social TIER_1 English(EN) · [email protected] ·

    Why AI Agents Forget: Memory Decay and Context Contamination Explained How context-window limits, the lost-in-the-middle effect, and stale data cause long-runni

    Why AI Agents Forget: Memory Decay and Context Contamination Explained How context-window limits, the lost-in-the-middle effect, and stale data cause long-running AI coding agents to lose track — and what you can do about it. https:// pickuma.com/posts/why-ai-agent s-forget-memor…

  351. Medium — Claude tag TIER_1 English(EN) · Rahil Pirani ·

    I built persistent AI memory for Claude on Cloudflare’s free tier

    <div class="medium-feed-item"><p class="medium-feed-image"><a href="https://upword-rahil.medium.com/i-built-persistent-ai-memory-for-claude-on-cloudflares-free-tier-82246b82b76c?source=rss------claude-5"><img src="https://cdn-images-1.medium.com/max/1000/0*BXxGjQ4zDa7tFCPj.png" w…

  352. dev.to — MCP tag TIER_1 English(EN) · Enrique B. ·

    Your AI Agent is Stuck in a Loop. Here's the Memory Layer That Breaks It and Saves You Money

    <p>Every time you open a new chat in Cursor, VS Code, Antigravity and even Claude Desktop, you paste your codebase back in. Or you let the IDE do it automatically, same result. You're burning context tokens on files the agent already "knew" ten minutes ago in a different window. …

  353. dev.to — MCP tag TIER_1 English(EN) · Ryan Ras ·

    The Hidden Problem with Multi-Agent AI Systems: Shared Memory

    <h2> The problem nobody talks about </h2> <p>When you run multiple AI agents, each one starts completely fresh. <br /> Zero knowledge of what other agents learned, decided, or remembered.</p> <p>Agent A spends an hour learning your codebase structure. <br /> Agent B starts tomorr…

  354. dev.to — MCP tag TIER_1 English(EN) · Ruslan Manov ·

    Reviewable Memory Consolidation for Local AI Agents

    <h1> Reviewable Memory Consolidation for Local AI Agents </h1> <p>AI memory is usually sold as recall.</p> <p>That is only the first problem.</p> <p>A serious agent does not merely need to remember more. It needs a way to keep its memory from decaying into duplicates, stale facts…

  355. dev.to — MCP tag TIER_1 English(EN) · KUSHAL BARAL ·

    devmcp-context: A Simple AI Memory Layer for Your Agent

    <p>AI assistants are useful, but they often forget important details between sessions. That makes it hard to keep track of decisions, project notes, bugs, and tasks.</p> <p><code>devmcp-context</code> solves that by giving your agent a simple memory layer that lives in your proje…

  356. Towards AI TIER_1 English(EN) · Ampatishan Sivalingam ·

    Under the Hood of Meko: How Distributed Infrastructure Solves the Multiagent Memory Crisis

    <div class="medium-feed-item"><p class="medium-feed-image"><a href="https://pub.towardsai.net/under-the-hood-of-meko-how-distributed-infrastructure-solves-the-multiagent-memory-crisis-0328204f9867?source=rss----98111c9905da---4"><img src="https://cdn-images-1.medium.com/max/1024/…

  357. Medium — Claude tag TIER_1 English(EN) · Amin Tazifor ·

    Engineering Memory for AI Coding Agents: A Discipline and a 200-line Implementation

    <div class="medium-feed-item"><p class="medium-feed-image"><a href="https://medium.com/@amin.tazifor_20843/engineering-memory-for-ai-coding-agents-a-discipline-and-a-200-line-implementation-d1587f0c2716?source=rss------claude-5"><img src="https://cdn-images-1.medium.com/max/2043/…

  358. Medium — Claude tag TIER_1 English(EN) · Rick Hightower ·

    The Memory Leak in Your AI Strategy: Architecting for LLM Reliability at Scale

    <div class="medium-feed-item"><p class="medium-feed-image"><a href="https://medium.com/@richardhightower/the-memory-leak-in-your-ai-strategy-architecting-for-llm-reliability-at-scale-ec01eaa02d04?source=rss------claude-5"><img src="https://cdn-images-1.medium.com/max/2184/1*meRun…

  359. Medium — AI coding tag TIER_1 English(EN) · Dilawar Abbas ·

    The four-memory model that makes AI coding agents finally remember

    <div class="medium-feed-item"><p class="medium-feed-snippet">Every AI coding agent &#x2014; Claude Code, Cursor, GitHub Copilot, OpenCode &#x2014; reads its own config file. I was maintaining the same project&#x2026;</p><p class="medium-feed-link"><a href="https://medium.com/@dil…

  360. Towards AI TIER_1 English(EN) · Subrat Pati ·

    Building the AI Memory Stack: Layered Storage, Async Extraction and Atomic Persistence

    <p>Every AI agent you build today can hold a conversation. It can reason, use tools, and chain together complex workflows. But the moment a session ends, everything disappears. The agent forgets who you are, what you were working on, and every preference it learned during the con…

  361. dev.to — MCP tag TIER_1 English(EN) · Rumblingb ·

    Why Every AI Agent Needs Persistent Memory: Introducing Agent Memory MCP

    <h2> The Memory Problem in AI Agents </h2> <p>Modern LLMs are incredibly powerful, but they have a fundamental limitation: <strong>they forget everything between conversations</strong>. Every time you start a new session with an AI agent, it's like talking to someone with amnesia…

  362. dev.to — MCP tag TIER_1 English(EN) · Gowtham S ·

    Building a Local Markdown Memory Layer for AI Agents

    <p>I kept running into the same problem with AI coding agents.</p> <p>The agents were getting better, but every new session still felt like starting<br /> from zero.</p> <p>I would explain the repo again. Then my preferences again. Then the decisions we<br /> already made. Then w…

  363. dev.to — MCP tag TIER_1 English(EN) · Gowtham ·

    Building a Local Markdown Memory Layer for AI Agents

    <p>I kept running into the same problem with AI coding agents.</p> <p>The agents were getting better, but every new session still felt like starting<br /> from zero.</p> <p>I would explain the repo again. Then my preferences again. Then the decisions we<br /> already made. Then w…

  364. dev.to — LLM tag TIER_1 English(EN) · Jack M ·

    AI Agent Memory Store: Stop Long-Running Agents From Forgetting the Job

    <p>An AI agent can look brilliant for ten minutes and lost after ten steps.</p> <p>It starts with a clean plan. Then the agent reads docs, calls tools, rewrites files, summarizes a customer ticket, checks a policy, and tries to continue. Somewhere in that loop, it forgets why a d…

  365. dev.to — LLM tag TIER_1 English(EN) · Baran Özdemir ·

    Why agents need memory that improves itself

    <p>"Agent memory" usually means a vector database: embed everything the user said, query by similarity, paste the top matches into the prompt. It's a useful trick, but it isn't memory. It's a lookup table that never learns, never forgets correctly, and can't tell you what was tru…

  366. dev.to — LLM tag TIER_1 English(EN) · ankush chadha ·

    Same Lever, Opposite Intent: When Shared Agent Memory Backfires

    <p>The same thing that makes a helpful habit stick in an AI agent is exactly what lets an attacker reprogram it. I know because I almost shipped the attack myself - with the best intentions.</p> <p>I'd given my agents a harmless efficiency rule: prefer the cheap, narrow tools, an…

  367. dev.to — LLM tag TIER_1 English(EN) · Debbie Shapiro ·

    Honest Memory: What Production Accuracy Data Actually Shows About AI Agent Memory

    <p>A major AI memory provider published their own research this spring measuring how well their system actually works in production. The controlled benchmark result was impressive: over ninety percent accuracy on standard evaluation corpora. The production result at thirty days w…

  368. Mastodon — fosstodon.org TIER_1 English(EN) · [email protected] ·

    "Skill Is Not Document: A Query-Conditional Benchmark and Two-Stage Retriever for LLM Agent Skill Routing" Skills used by LLMs require a different retrieval pip

    "Skill Is Not Document: A Query-Conditional Benchmark and Two-Stage Retriever for LLM Agent Skill Routing" Skills used by LLMs require a different retrieval pipeline than the one used for document retrieval, as sometimes Skills may conflict with each other. https:// arxiv.org/abs…

  369. dev.to — LLM tag TIER_1 English(EN) · WonderLab ·

    Agent Series (15): Advanced Agent Memory — Short-term, Long-term, Compression

    <h2> Memory Isn't Just "Store the Chat Log" </h2> <p>Dumping conversation history into the prompt is the crudest form of memory. Real systems have more complex needs:</p> <ul> <li>The user mentioned their city in turn 3; the Agent should know where to look when they ask about wea…

  370. dev.to — LLM tag TIER_1 English(EN) · Mudassir Khan ·

    AI agent memory management: beyond the context window

    <h1> AI agent memory management: beyond the context window </h1> <p>Your agent answered correctly five minutes ago. Now it's asking for the same information again. The context window filled up, the early messages got evicted, and all that history is gone.</p> <p>This is not a hal…

  371. dev.to — LLM tag TIER_1 English(EN) · Mohit Yadav ·

    Why Memory Matters More Than Model Size in LLM Agents

    <p><a class="article-body-image-wrapper" href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flduztsiocbm9ome1oa00.png"><img alt=" " src="https://media2.dev…

  372. Mastodon — fosstodon.org TIER_1 English(EN) · [email protected] ·

    Rylan Talerico on Zep: A Temporal Knowledge Graph Architecture for Agent Memory [PWL NYC] (2025) https://www. youtube.com/watch?v=TPGlkaHXu0A # llm # ai

    Rylan Talerico on Zep: A Temporal Knowledge Graph Architecture for Agent Memory [PWL NYC] (2025) https://www. youtube.com/watch?v=TPGlkaHXu0A # llm # ai

  373. dev.to — LLM tag TIER_1 English(EN) · Red Fox Code ·

    Give your AI agent long-term memory with MCP (no code)

    <p>Your agent forgets everything when the context window ends. The usual fix is to wire a vector DB, write ingest/retrieve glue, and babysit it. There's a faster path: plug a memory API into<br /> the agent over <strong>MCP</strong> and let the model call <code>add_memory</code> …

  374. r/LocalLLaMA TIER_1 English(EN) · /u/dryadofelysium ·

    MiniMax M3 - Coding & Agentic Frontier, 1M Context, Multimodal

    <table> <tr><td> <a href="https://www.reddit.com/r/LocalLLaMA/comments/1ttdiq0/minimax_m3_coding_agentic_frontier_1m_context/"> <img alt="MiniMax M3 - Coding &amp; Agentic Frontier, 1M Context, Multimodal" src="https://external-preview.redd.it/GYUWVApHh7WxqJg5euhUy3HbyIqNa4dEj0F1…

  375. Mastodon — fosstodon.org TIER_1 English(EN) · [email protected] ·

    🧠 ARN provides a local semantic memory server designed for AI agents that runs on Raspberry Pi 5 hardware with 22-millisecond recall times. The system passes 10

    🧠 ARN provides a local semantic memory server designed for AI agents that runs on Raspberry Pi 5 hardware with 22-millisecond recall times. The system passes 10 out of 10 tests in its evaluation framework. 💬 Hacker News 🔗 https:// github.com/tuuhe99-del/ARN-Ada ptive-Reasoning-Ne…

  376. Mastodon — fosstodon.org TIER_1 English(EN) · [email protected] ·

    DeepSWE: A contamination-free benchmark for long-horizon coding agents https:// deepswe.datacurve.ai/blog # ai

    DeepSWE: A contamination-free benchmark for long-horizon coding agents https:// deepswe.datacurve.ai/blog # ai

  377. dev.to — LLM tag TIER_1 English(EN) · Yaohua Chen ·

    The Representation Problem: Why RAG vs. Agentic Search Is the Wrong Debate

    <p>The industry has been asking the wrong question.</p> <p>When Boris Cherny — the creator and Head of Claude Code — revealed on the Latent Space podcast that Anthropic's flagship coding agent had abandoned RAG entirely and switched to what he called "Agentic Search," the discour…

  378. dev.to — LLM tag TIER_1 English(EN) · Shilpa Mitra ·

    How Claude Code Achieves a 92% Cache Hit Rate: A Deep Dive Into Prompt Caching for AI Agents

    <p>If you're running AI agents in production, there's a cost you're probably not thinking about.</p> <p>Every turn in an agentic conversation sends the full prompt to the model. That includes the system instructions, all the tool definitions, any project context that was loaded e…

  379. Mastodon — fosstodon.org TIER_1 English(EN) · [email protected] ·

    Tencent Open-Sources TencentDB Agent Memory: A 4-Tier Local Memory Pipeline for AI Agents Tencent has open-sourced TencentDB Agent Memory, a fully local memory

    Tencent Open-Sources TencentDB Agent Memory: A 4-Tier Local Memory Pipeline for AI Agents Tencent has open-sourced TencentDB Agent Memory, a fully local memory system for AI agents released under t... #Agentic #AI #AI #Infrastructure #Applications #Artificial #Intelligence #Edito…

  380. Mastodon — fosstodon.org TIER_1 English(EN) · [email protected] ·

    Tencent Open-Sources TencentDB Agent Memory: A 4-Tier Local Memory Pipeline for AI Agents Tencent has open-sourced TencentDB Agent Memory, a fully local memory

    Tencent Open-Sources TencentDB Agent Memory: A 4-Tier Local Memory Pipeline for AI Agents Tencent has open-sourced TencentDB Agent Memory, a fully local memory system for AI agents released under t... #Agentic #AI #AI #Infrastructure #Applications #Artificial #Intelligence #Edito…

  381. dev.to — LLM tag TIER_1 English(EN) · Mahmoud Zalt ·

    The 7-Layer Memory Architecture Behind Modern AI Agents

    <p>How do you make an AI agent actually remember?</p> <p><a class="article-body-image-wrapper" href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffxsjom0x…

  382. dev.to — LLM tag TIER_1 English(EN) · Abuzar Gore ·

    LLM-Wiki: Multi-Agent Memory Without RAG

    <p>How three AI agents can collaborate on a complex task by sharing a folder of markdown files — and nothing else.</p> <p><a class="article-body-image-wrapper" href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%…

  383. dev.to — LLM tag TIER_1 English(EN) · Vaishnavi Gudur ·

    Your No-Code AI Agent Has a Memory Problem

    <p>If you're building AI agents with Flowise, Dify, n8n, or similar no-code/low-code platforms, there's a security threat you probably haven't thought about: <strong>memory poisoning</strong>.</p> <p>And it's not theoretical. It's in the <a href="https://owasp.org/www-project-top…

  384. dev.to — LLM tag TIER_1 Nederlands(NL) · Agdex AI ·

    Best AI Agent Memory Tools in 2026: Mem0 vs Zep vs Letta vs MemGPT

    <p>Ask a stateless AI agent about something you told it last week — it remembers nothing. That's the core problem <strong>memory tools</strong> solve.</p> <p>In 2026, long-term memory for AI agents has become one of the hottest areas in the ecosystem, with dedicated tools like <s…

  385. dev.to — LLM tag TIER_1 English(EN) · Vaishnavi Gudur ·

    Securing LangGraph Multi-Agent Workflows Against Memory Poisoning (ASI06)

    <h2> Securing LangGraph Multi-Agent Workflows Against Memory Poisoning (ASI06) </h2> <p>LangGraph has become the de facto standard for building complex, multi-agent workflows. Its core abstraction—the state graph—allows developers to build cyclic, stateful applications where agen…

  386. Mastodon — fosstodon.org TIER_1 English(EN) · [email protected] ·

    MemSkill reframes LLM-agent memory operations as a learnable skill bank: an RL controller selects Top-K skills per span, an LLM designer periodically rewrites t

    MemSkill reframes LLM-agent memory operations as a learnable skill bank: an RL controller selects Top-K skills per span, an LLM designer periodically rewrites them from hard cases. But "self-evolving" overstates the test-time story — both controller and bank are trained offline a…

  387. dev.to — LLM tag TIER_1 English(EN) · Vaishnavi Gudur ·

    Your AI Agent's Memory is a Security Hole — Here's the Fix

    <h1> Your AI Agent's Memory is a Security Hole — Here's the Fix </h1> <p>I've been working on AI agent security for the past few months as part of the <a href="https://owasp.org/www-project-top-10-for-large-language-model-applications/" rel="noopener noreferrer">OWASP Top 10 for …

  388. dev.to — LLM tag TIER_1 English(EN) · R Hiroshini ·

    "The Bug That Forced Us to Add Agent Memory"

    <h1> The Bug That Forced Us to Add Agent Memory </h1> <p><strong>Project:</strong> Nexus Core AI OS<br /> <strong>Stack:</strong> Hindsight (persistent memory) · cascadeflow (runtime intelligence &amp; routing)</p> <h2> 1. Introduction </h2> <p>I didn't plan to build a memory sys…

  389. Mastodon — fosstodon.org TIER_1 Italiano(IT) · [email protected] ·

    Android and AI: Is 128GB of Memory Becoming Insufficient? As AI Features Advance on Android, Storage Space

    Android e AI: i 128 GB di memoria stanno diventando insufficienti? Con l'avanzare delle funzioni di intelligenza artificiale su Android, lo spazio di archiviazione degli smartphone rischia di diventare un collo di bottiglia sempre più critico. Al centro del problema c'è AICore, i…

  390. Mastodon — fosstodon.org TIER_1 English(EN) · [email protected] ·

    🤖 Which project/framework has actually nailed persistent memory for AI agents? Not talking about the LLM itself but about the memory layer on top. There are qui

    🤖 Which project/framework has actually nailed persistent memory for AI agents? Not talking about the LLM itself but about the memory layer on top. There are quite a few out there now, open source ones and proprietary frameworks. Curious what people have actually tried and stu... …

  391. Mastodon — fosstodon.org TIER_1 English(EN) · [email protected] ·

    Hermes Memory Installer Review: One-Command Persistent Memory for Local AI Agents Nous Research's Hermes Memory Installer adds local persistent memory to AI age

    Hermes Memory Installer Review: One-Command Persistent Memory for Local AI Agents Nous Research's Hermes Memory Installer adds local persistent memory to AI agents with one shell command. We compare its file-based approach to Mem0 and Letta. https:// pickuma.com/posts/hermes-memo…

  392. dev.to — LLM tag TIER_1 English(EN) · Ken W Alger ·

    Engineering Agent Memory

    <h2>From Stateless Prompts to Persistent Intelligence</h2> <blockquote> <strong>Where this fits:</strong> This article bridges two series. It closes out the themes introduced in The Backyard Quarry — a data engineering exploration using physical objects as a teaching domain — and…

  393. Mastodon — fosstodon.org TIER_1 English(EN) · [email protected] ·

    🧠 Graft provides a semantic memory system for AI agents that operates independently of large language models. The tool allows agents to store and retrieve infor

    🧠 Graft provides a semantic memory system for AI agents that operates independently of large language models. The tool allows agents to store and retrieve information based on meaning rather than exact text matching. 💬 Hacker News 🔗 https:// github.com/AEndrix03/Graft # AI # Mach…

  394. dev.to — LLM tag TIER_1 English(EN) · vishalmysore ·

    ReasoningBank: Building AI Agents that Actually Learn from Experience

    <p>In the world of Large Language Models (LLMs), we often face a frustrating paradox: LLMs are incredibly capable at "reasoning" in the moment, but they are fundamentally <strong>stateless</strong>. Every time you start a new session, the agent has total amnesia. It doesn't remem…

  395. dev.to — LLM tag TIER_1 English(EN) · Poniak Labs ·

    SubQ Model: Can Subquadratic Make Long-Context AI More Efficient?

    <p><em>Originally published on <a href="https://www.poniaktimes.com/subq-model-efficient-long-context-ai/" rel="noopener noreferrer">Poniak Times</a>. Reposted here for the developer and AI engineering community.</em></p> <p>Subquadratic’s SubQ model claims to make long-context A…

  396. dev.to — LLM tag TIER_1 English(EN) · Jonathanfarrow ·

    The 10 Best AI Memory Layers for Agents in 2026

    <p>If you are building agents in 2026, you have already hit the wall. Bigger models do not fix forgetfulness. Context windows can grow forever, and the agent still cannot remember what a user told it last Tuesday, that the customer's address changed three months ago, or that a re…

  397. dev.to — LLM tag TIER_1 English(EN) · 丁久 ·

    AI Agents Memory Patterns: Working, Episodic, Semantic, and Reflective Memory

    <blockquote> <p><em>This article was originally published on <a href="https://dingjiu1989-hue.github.io/en/ai/ai-agents-memory-patterns.html" rel="noopener noreferrer">AI Study Room</a>. For the full version with working code examples and related articles, visit the original post…

  398. dev.to — LLM tag TIER_1 Español(ES) · Tirso García ·

    Building Kernel Memory Protocol: Navigable Memory for AI Agents

    <blockquote> <p>English version: <a href="https://dev.to/tirsogarcia/building-kernel-memory-protocol-navigable-memory-for-ai-agents-315j">Building Kernel Memory Protocol: Navigable Memory for AI Agents</a></p> </blockquote> <p>El problema de muchos agentes de IA no es que les fal…

  399. dev.to — LLM tag TIER_1 English(EN) · Tirso García ·

    Building Kernel Memory Protocol: Navigable Memory for AI Agents

    <blockquote> <p>Versión en español: <a href="https://dev.to/tirsogarcia/construyendo-kernel-memory-protocol-memoria-navegable-para-agentes-de-ia-24lc">Construyendo Kernel Memory Protocol: memoria navegable para agentes de IA</a></p> </blockquote> <p>The hard part with many AI age…

  400. dev.to — LLM tag TIER_1 English(EN) · tokozen ·

    How Agentic Search Actually Works: The Research Loop Link-Fetching Agents Miss

    <h1> How Agentic Search Actually Works: The Research Loop Link-Fetching Agents Miss </h1> <p>Most agent tutorials show you the same pattern: take a user query, call a search API, grab the top result, stuff the text into your prompt. Done. Ship it.</p> <p>That works fine for trivi…

  401. Mastodon — mastodon.social TIER_1 English(EN) · [email protected] ·

    How to design short-term, long-term, and structured memory for AI assistants, with retrieval mechanics, tradeoffs, failure modes, and real patterns from OpenAI,

    How to design short-term, long-term, and structured memory for AI assistants, with retrieval mechanics, tradeoffs, failure modes, and real patterns from OpenAI, LangGraph, Hermes, and OpenClaw. # Hermes # OpenClaw # Architecture # LLM # AI # RAG # SelfHosting https://www. glukhov…

  402. Mastodon — mastodon.social TIER_1 English(EN) · [email protected] ·

    Universal Memory Protocol proposes a shared format for agent memory across AI systems. Standardizing how agents store and retrieve context sounds useful — but i

    Universal Memory Protocol proposes a shared format for agent memory across AI systems. Standardizing how agents store and retrieve context sounds useful — but it also means a new shared attack surface: poisoned memories, cross-agent leakage, persistent manipulation. Worth watchin…

  403. Mastodon — mastodon.social TIER_1 English(EN) · [email protected] ·

    Δ-Mem: Efficient Online Memory for Large Language Models https://arxiv.org/abs/2605.12357 # HackerNews # Tech # AI

    Δ-Mem: Efficient Online Memory for Large Language Models https://arxiv.org/abs/2605.12357 # HackerNews # Tech # AI

  404. r/cursor TIER_2 English(EN) · /u/EvanBuilds2026 ·

    Why most AI coding agent memory systems fail — and what actually works

    <!-- SC_OFF --><div class="md"><p>I’ve been building my own persistent memory layer for coding agents, and along the way I realized something surprising:</p> <p>Most memory systems out there are basically **just session-based retrieval**. They don’t forget, they don’t manage life…

  405. r/ClaudeAI TIER_2 English(EN) · /u/papoode ·

    Advanced memory + project continuity for AI coding agents, from a biologist’s view.

    <!-- SC_OFF --><div class="md"><p>I'm a biologist and software developer. PhD in genetics, and ~20 years building software products. So I think I have a different view on things like memory. My thoughts on how memory with a coding agent should work:</p> <p>Tuesday morning. New se…