PulseAugur
实时 20:55:24

Contextual Agentic Memory is a Memo, Not True Memory

研究人员正在探索用于LLM代理的先进记忆系统,以提高其推理和学习能力。一种方法E-mem使用分层架构和多个代理来重建情景上下文,而不会丢失关键信息。另一种方法ViLoMem侧重于双流记忆框架,以分别编码视觉和逻辑信息,使代理能够从成功和失败中学习。此外,一篇论文认为,当前的代理记忆系统仅仅是查找,而不是真正的记忆,并提出了一种受神经科学启发的、用于更好泛化和安全的方法。 AI

影响 这些研究论文探讨了增强LLM代理推理、学习和记忆的方法,有可能带来更强大、更有能力的AI系统。

排序理由 多篇arXiv论文提出了通过先进的记忆系统和学习技术来改进LLM代理能力的新研究。

在 arXiv cs.CL 阅读 →

AI 生成摘要 · Google Gemini · 来自 22 个来源。 我们如何撰写摘要 →

Contextual Agentic Memory is a Memo, Not True Memory

报道来源 [22]

  1. arXiv cs.CL TIER_1 English(EN) · Jieping Ye ·

    Prefix Teach, Suffix Fade: Local Teachability Collapse in Strong-to-Weak On-Policy Distillation

    On-policy distillation (OPD) trains a student model on its own rollouts using dense feedback from a stronger teacher. Prior literature suggests that, provided teacher feedback is available, supervising the full sequence of response tokens should monotonically improve performance.…

  2. arXiv cs.LG TIER_1 English(EN) · Weng-Fai Wong ·

    Reward-Weighted On-Policy Distillation with an Open Property-Equivalence Verifier for NL-to-SVA Generation

    LLM-based generation of SystemVerilog Assertions (SVA) is often reported as nearing saturation, with the strongest specialized model reaching ${\sim}76\%$ accuracy on NL2SVA-Human. We show that this aggregate hides a temporal gap: models that appear strong overall still collapse …

  3. arXiv cs.CL TIER_1 English(EN) · Junfeng Fang ·

    Learning to Foresee: Unveiling the Unlocking Efficiency of On-Policy Distillation

    On-policy distillation (OPD) has emerged as an efficient post-training paradigm for large language models. However, existing studies largely attribute this advantage to denser and more stable supervision, while the parameter-level mechanisms underlying OPD's efficiency remain poo…

  4. arXiv cs.AI TIER_1 English(EN) · Mehrdad Farajtabar ·

    Unmasking On-Policy Distillation: Where It Helps, Where It Hurts, and Why

    On-policy distillation offers dense, per-token supervision for training reasoning models; however, it remains unclear under which conditions this signal is beneficial and under which it is detrimental. Which teacher model should be used, and in the case of self-distillation, whic…

  5. arXiv cs.AI TIER_1 English(EN) · Lan-Zhe Guo ·

    TRACE: Distilling Where It Matters via Token-Routed Self On-Policy Alignment

    On-policy self-distillation (self-OPD) densifies reinforcement learning with verifiable rewards (RLVR) by letting a policy teach itself under privileged context. We find that when this guidance spans the full response, all-token KL spends gradients on mostly redundant positions a…

  6. Hugging Face Daily Papers TIER_1 English(EN) ·

    SimCT: Recovering Lost Supervision for Cross-Tokenizer On-Policy Distillation

    On-policy distillation (OPD) is a standard tool for transferring teacher behavior to a smaller student, but it implicitly assumes that teacher and student predictions are comparable token by token, an assumption that fails whenever the two models tokenize the same text differentl…

  7. arXiv cs.CL TIER_1 English(EN) · Xiang Wang ·

    SimCT: Recovering Lost Supervision for Cross-Tokenizer On-Policy Distillation

    On-policy distillation (OPD) is a standard tool for transferring teacher behavior to a smaller student, but it implicitly assumes that teacher and student predictions are comparable token by token, an assumption that fails whenever the two models tokenize the same text differentl…

  8. arXiv cs.LG TIER_1 English(EN) · Nan Jia, Haojin Yang, Xing Ma, Jiesong Lian, Shuailiang Zhang, Weipeng Zhang, Ke Zeng, Xunliang Cai, Zequn Sun ·

    Asymmetric On-Policy Distillation: Bridging Exploitation and Imitation at the Token Level

    arXiv:2605.06387v1 Announce Type: new Abstract: On-policy distillation (OPD) trains a student on its own trajectories with token-level teacher feedback and often outperforms off-policy distillation and standard reinforcement learning. However, we find that its standard advantage …

  9. arXiv cs.AI TIER_1 English(EN) · Zequn Sun ·

    Asymmetric On-Policy Distillation: Bridging Exploitation and Imitation at the Token Level

    On-policy distillation (OPD) trains a student on its own trajectories with token-level teacher feedback and often outperforms off-policy distillation and standard reinforcement learning. However, we find that its standard advantage weighted policy gradient suffers from three stru…

  10. arXiv cs.LG TIER_1 English(EN) · Anastasis Kratsios, A. Martina Neuman, Philipp Petersen ·

    Adaptivity Under Realizability Constraints: Comparing In-Context and Agentic Learning

    arXiv:2605.04995v1 Announce Type: new Abstract: We compare in-context learning with fixed queries and agentic learning with adaptive queries for uniform approximation of task families. We consider two settings: an unrestricted regime, where querying and approximation are arbitrar…

  11. arXiv cs.LG TIER_1 English(EN) · Wenjin Hou, Shangpin Peng, Weinong Wang, Zheng Ruan, Yue Zhang, Zhenglin Zhou, Mingqi Gao, Yifei Chen, Kaiqi Wang, Hongming Yang, Chengquan Zhang, Zhuotao Tian, Han Hu, Yi Yang, Fei Wu, Hehe Fan ·

    Uni-OPD: Unifying On-Policy Distillation with a Dual-Perspective Recipe

    arXiv:2605.03677v1 Announce Type: new Abstract: On-policy distillation (OPD) has recently emerged as an effective post-training paradigm for consolidating the capabilities of specialized expert models into a single student model. Despite its empirical success, the conditions unde…

  12. arXiv cs.LG TIER_1 English(EN) · Hehe Fan ·

    Uni-OPD: Unifying On-Policy Distillation with a Dual-Perspective Recipe

    On-policy distillation (OPD) has recently emerged as an effective post-training paradigm for consolidating the capabilities of specialized expert models into a single student model. Despite its empirical success, the conditions under which OPD yields reliable improvement remain p…

  13. arXiv cs.AI TIER_1 English(EN) · Kaixiang Wang, Yidan Lin, Jiong Lou, Zhaojiacheng Zhou, Bunyod Suvonov, Jie Li ·

    E-mem: Multi-agent based Episodic Context Reconstruction for LLM Agent Memory

    arXiv:2601.21714v2 Announce Type: replace Abstract: The evolution of Large Language Model (LLM) agents towards System~2 reasoning, characterized by deliberative, high-precision problem-solving, requires maintaining rigorous logical integrity over extended horizons. However, preva…

  14. arXiv cs.LG TIER_1 English(EN) · Weihao Bo, Shan Zhang, Yanpeng Sun, Jingjing Wu, Qunyi Xie, Xiao Tan, Kunbin Chen, Wei He, Xiaofan Li, Na Zhao, Jingdong Wang, Zechao Li ·

    Agentic Learner with Grow-and-Refine Multimodal Semantic Memory

    arXiv:2511.21678v2 Announce Type: replace-cross Abstract: MLLMs exhibit strong reasoning on isolated queries, yet they operate de novo -- solving each problem independently and often repeating the same mistakes. Existing memory-augmented agents mainly store past trajectories for …

  15. arXiv cs.LG TIER_1 English(EN) · Jackson Hassell, Dan Zhang, Hannah Kim, Tom Mitchell, Estevam Hruschka ·

    Learning from Supervision with Semantic and Episodic Memory: A Reflective Approach to Agent Adaptation

    arXiv:2510.19897v2 Announce Type: replace-cross Abstract: We investigate how agents built on pretrained large language models (LLMs) can learn target classification functions from labeled examples without parameter updates. While conventional approaches like fine-tuning are often…

  16. arXiv cs.AI TIER_1 English(EN) · Ruozhen Yang, Yucheng Jiang, Yueqi Jiang, Priyanka Kargupta, Yunyi Zhang, Jiawei Han ·

    Grounding Agent Memory in Contextual Intent

    arXiv:2601.10702v2 Announce Type: replace-cross Abstract: Deploying large language models in long-horizon, goal-oriented interactions remains challenging because similar entities and facts recur under different latent goals and constraints, causing memory systems to retrieve cont…

  17. arXiv cs.AI TIER_1 English(EN) · Binyan Xu, Xilin Dai, Kehuan Zhang ·

    Contextual Agentic Memory is a Memo, Not True Memory

    arXiv:2604.27707v1 Announce Type: new Abstract: Current agentic memory systems (vector stores, retrieval-augmented generation, scratchpads, and context-window management) do not implement memory: they implement lookup. We argue that treating lookup as memory is a category error w…

  18. arXiv cs.CL TIER_1 English(EN) · Kehuan Zhang ·

    Contextual Agentic Memory is a Memo, Not True Memory

    Current agentic memory systems (vector stores, retrieval-augmented generation, scratchpads, and context-window management) do not implement memory: they implement lookup. We argue that treating lookup as memory is a category error with provable consequences for agent capability, …

  19. arXiv cs.CV TIER_1 English(EN) · Zuxuan Wu ·

    DiffusionOPD: A Unified Perspective of On-Policy Distillation in Diffusion Models

    Reinforcement learning has emerged as a powerful tool for improving diffusion-based text-to-image models, but existing methods are largely limited to single-task optimization. Extending RL to multiple tasks is challenging: joint optimization suffers from cross-task interference a…

  20. arXiv cs.CV TIER_1 English(EN) · Dengyang Jiang, Xin Jin, Dongyang Liu, Zanyi Wang, Mingzhe Zheng, Ruoyi Du, Xiangpeng Yang, Qilong Wu, Zhen Li, Peng Gao, Harry Yang, Steven Hoi ·

    D-OPSD: On-Policy Self-Distillation for Continuously Tuning Step-Distilled Diffusion Models

    arXiv:2605.05204v1 Announce Type: new Abstract: The landscape of high-performance image generation models is currently shifting from the inefficient multi-step ones to the efficient few-step counterparts (e.g, Z-Image-Turbo and FLUX.2-klein). However, these models present signifi…

  21. arXiv cs.CV TIER_1 English(EN) · Steven Hoi ·

    D-OPSD: On-Policy Self-Distillation for Continuously Tuning Step-Distilled Diffusion Models

    The landscape of high-performance image generation models is currently shifting from the inefficient multi-step ones to the efficient few-step counterparts (e.g, Z-Image-Turbo and FLUX.2-klein). However, these models present significant challenges for directly continuous supervis…

  22. arXiv stat.ML TIER_1 English(EN) · Philipp Petersen ·

    Adaptivity Under Realizability Constraints: Comparing In-Context and Agentic Learning

    We compare in-context learning with fixed queries and agentic learning with adaptive queries for uniform approximation of task families. We consider two settings: an unrestricted regime, where querying and approximation are arbitrary functions, and a realizable regime, where we r…