PulseAugur
实时 09:34:26

Contextual Agentic Memory is a Memo, Not True Memory

研究人员正在探索用于LLM代理的先进记忆系统,以提高其推理和学习能力。一种方法E-mem使用分层架构和多个代理来重建情景上下文,而不会丢失关键信息。另一种方法ViLoMem侧重于双流记忆框架,以分别编码视觉和逻辑信息,使代理能够从成功和失败中学习。此外,一篇论文认为,当前的代理记忆系统仅仅是查找,而不是真正的记忆,并提出了一种受神经科学启发的、用于更好泛化和安全的方法。 AI

影响 这些研究论文探讨了增强LLM代理推理、学习和记忆的方法,有可能带来更强大、更有能力的AI系统。

排序理由 多篇arXiv论文提出了通过先进的记忆系统和学习技术来改进LLM代理能力的新研究。

在 arXiv cs.CL 阅读 →

AI 生成摘要 · Google Gemini · 来自 22 个来源。 我们如何撰写摘要 →

Contextual Agentic Memory is a Memo, Not True Memory

报道来源 [22]

  1. arXiv cs.CL TIER_1 English(EN) · Jieping Ye ·

    前缀教学、后缀消退:强到弱在线策略蒸馏中的局部可教学性崩溃

    On-policy distillation (OPD) trains a student model on its own rollouts using dense feedback from a stronger teacher. Prior literature suggests that, provided teacher feedback is available, supervising the full sequence of response tokens should monotonically improve performance.…

  2. arXiv cs.LG TIER_1 English(EN) · Weng-Fai Wong ·

    用于NL到SVA生成的带奖励的策略内蒸馏及开放属性等价验证器

    LLM-based generation of SystemVerilog Assertions (SVA) is often reported as nearing saturation, with the strongest specialized model reaching ${\sim}76\%$ accuracy on NL2SVA-Human. We show that this aggregate hides a temporal gap: models that appear strong overall still collapse …

  3. arXiv cs.CL TIER_1 English(EN) · Junfeng Fang ·

    学习预见:揭示 on-policy distillation 的效率解锁

    On-policy distillation (OPD) has emerged as an efficient post-training paradigm for large language models. However, existing studies largely attribute this advantage to denser and more stable supervision, while the parameter-level mechanisms underlying OPD's efficiency remain poo…

  4. arXiv cs.AI TIER_1 English(EN) · Mehrdad Farajtabar ·

    揭秘On-Policy蒸馏:它在哪些方面有帮助,在哪些方面有害,以及原因

    On-policy distillation offers dense, per-token supervision for training reasoning models; however, it remains unclear under which conditions this signal is beneficial and under which it is detrimental. Which teacher model should be used, and in the case of self-distillation, whic…

  5. arXiv cs.AI TIER_1 English(EN) · Lan-Zhe Guo ·

    TRACE:通过令牌路由的自策略对齐来提炼关键之处

    On-policy self-distillation (self-OPD) densifies reinforcement learning with verifiable rewards (RLVR) by letting a policy teach itself under privileged context. We find that when this guidance spans the full response, all-token KL spends gradients on mostly redundant positions a…

  6. Hugging Face Daily Papers TIER_1 English(EN) ·

    SimCT:为跨分词器策略内蒸馏恢复丢失的监督

    On-policy distillation (OPD) is a standard tool for transferring teacher behavior to a smaller student, but it implicitly assumes that teacher and student predictions are comparable token by token, an assumption that fails whenever the two models tokenize the same text differentl…

  7. arXiv cs.CL TIER_1 English(EN) · Xiang Wang ·

    SimCT:为跨分词器策略内蒸馏恢复丢失的监督

    On-policy distillation (OPD) is a standard tool for transferring teacher behavior to a smaller student, but it implicitly assumes that teacher and student predictions are comparable token by token, an assumption that fails whenever the two models tokenize the same text differentl…

  8. arXiv cs.LG TIER_1 English(EN) · Nan Jia, Haojin Yang, Xing Ma, Jiesong Lian, Shuailiang Zhang, Weipeng Zhang, Ke Zeng, Xunliang Cai, Zequn Sun ·

    非对称策略内蒸馏:在 token 层面弥合探索与模仿的鸿沟

    arXiv:2605.06387v1 Announce Type: new Abstract: On-policy distillation (OPD) trains a student on its own trajectories with token-level teacher feedback and often outperforms off-policy distillation and standard reinforcement learning. However, we find that its standard advantage …

  9. arXiv cs.AI TIER_1 English(EN) · Zequn Sun ·

    非对称策略内蒸馏:在 token 层面弥合探索与模仿的鸿沟

    On-policy distillation (OPD) trains a student on its own trajectories with token-level teacher feedback and often outperforms off-policy distillation and standard reinforcement learning. However, we find that its standard advantage weighted policy gradient suffers from three stru…

  10. arXiv cs.LG TIER_1 English(EN) · Anastasis Kratsios, A. Martina Neuman, Philipp Petersen ·

    在可实现性约束下的适应性:上下文学习与代理学习的比较

    arXiv:2605.04995v1 Announce Type: new Abstract: We compare in-context learning with fixed queries and agentic learning with adaptive queries for uniform approximation of task families. We consider two settings: an unrestricted regime, where querying and approximation are arbitrar…

  11. arXiv cs.LG TIER_1 English(EN) · Wenjin Hou, Shangpin Peng, Weinong Wang, Zheng Ruan, Yue Zhang, Zhenglin Zhou, Mingqi Gao, Yifei Chen, Kaiqi Wang, Hongming Yang, Chengquan Zhang, Zhuotao Tian, Han Hu, Yi Yang, Fei Wu, Hehe Fan ·

    Uni-OPD:统一策略内蒸馏,采用双视角配方

    arXiv:2605.03677v1 Announce Type: new Abstract: On-policy distillation (OPD) has recently emerged as an effective post-training paradigm for consolidating the capabilities of specialized expert models into a single student model. Despite its empirical success, the conditions unde…

  12. arXiv cs.LG TIER_1 English(EN) · Hehe Fan ·

    Uni-OPD:统一策略内蒸馏,采用双视角配方

    On-policy distillation (OPD) has recently emerged as an effective post-training paradigm for consolidating the capabilities of specialized expert models into a single student model. Despite its empirical success, the conditions under which OPD yields reliable improvement remain p…

  13. arXiv cs.AI TIER_1 English(EN) · Kaixiang Wang, Yidan Lin, Jiong Lou, Zhaojiacheng Zhou, Bunyod Suvonov, Jie Li ·

    E-mem:基于多智能体的事件上下文重构用于LLM智能体记忆

    arXiv:2601.21714v2 Announce Type: replace Abstract: The evolution of Large Language Model (LLM) agents towards System~2 reasoning, characterized by deliberative, high-precision problem-solving, requires maintaining rigorous logical integrity over extended horizons. However, preva…

  14. arXiv cs.LG TIER_1 English(EN) · Weihao Bo, Shan Zhang, Yanpeng Sun, Jingjing Wu, Qunyi Xie, Xiao Tan, Kunbin Chen, Wei He, Xiaofan Li, Na Zhao, Jingdong Wang, Zechao Li ·

    具有增长与精炼多模态语义记忆的代理学习器

    arXiv:2511.21678v2 Announce Type: replace-cross Abstract: MLLMs exhibit strong reasoning on isolated queries, yet they operate de novo -- solving each problem independently and often repeating the same mistakes. Existing memory-augmented agents mainly store past trajectories for …

  15. arXiv cs.LG TIER_1 English(EN) · Jackson Hassell, Dan Zhang, Hannah Kim, Tom Mitchell, Estevam Hruschka ·

    从语义和情景记忆中学习监督:一种用于智能体适应的反思性方法

    arXiv:2510.19897v2 Announce Type: replace-cross Abstract: We investigate how agents built on pretrained large language models (LLMs) can learn target classification functions from labeled examples without parameter updates. While conventional approaches like fine-tuning are often…

  16. arXiv cs.AI TIER_1 English(EN) · Ruozhen Yang, Yucheng Jiang, Yueqi Jiang, Priyanka Kargupta, Yunyi Zhang, Jiawei Han ·

    将代理记忆植根于上下文意图

    arXiv:2601.10702v2 Announce Type: replace-cross Abstract: Deploying large language models in long-horizon, goal-oriented interactions remains challenging because similar entities and facts recur under different latent goals and constraints, causing memory systems to retrieve cont…

  17. arXiv cs.AI TIER_1 English(EN) · Binyan Xu, Xilin Dai, Kehuan Zhang ·

    Contextual Agentic Memory is a Memo, Not True Memory

    arXiv:2604.27707v1 Announce Type: new Abstract: Current agentic memory systems (vector stores, retrieval-augmented generation, scratchpads, and context-window management) do not implement memory: they implement lookup. We argue that treating lookup as memory is a category error w…

  18. arXiv cs.CL TIER_1 English(EN) · Kehuan Zhang ·

    Contextual Agentic Memory is a Memo, Not True Memory

    Current agentic memory systems (vector stores, retrieval-augmented generation, scratchpads, and context-window management) do not implement memory: they implement lookup. We argue that treating lookup as memory is a category error with provable consequences for agent capability, …

  19. arXiv cs.CV TIER_1 English(EN) · Zuxuan Wu ·

    DiffusionOPD:扩散模型中on-policy蒸馏的统一视角

    Reinforcement learning has emerged as a powerful tool for improving diffusion-based text-to-image models, but existing methods are largely limited to single-task optimization. Extending RL to multiple tasks is challenging: joint optimization suffers from cross-task interference a…

  20. arXiv cs.CV TIER_1 English(EN) · Dengyang Jiang, Xin Jin, Dongyang Liu, Zanyi Wang, Mingzhe Zheng, Ruoyi Du, Xiangpeng Yang, Qilong Wu, Zhen Li, Peng Gao, Harry Yang, Steven Hoi ·

    D-OPSD:用于连续微调步进蒸馏扩散模型的策略内自蒸馏

    arXiv:2605.05204v1 Announce Type: new Abstract: The landscape of high-performance image generation models is currently shifting from the inefficient multi-step ones to the efficient few-step counterparts (e.g, Z-Image-Turbo and FLUX.2-klein). However, these models present signifi…

  21. arXiv cs.CV TIER_1 English(EN) · Steven Hoi ·

    D-OPSD:用于连续调整步进蒸馏扩散模型的策略内自蒸馏

    The landscape of high-performance image generation models is currently shifting from the inefficient multi-step ones to the efficient few-step counterparts (e.g, Z-Image-Turbo and FLUX.2-klein). However, these models present significant challenges for directly continuous supervis…

  22. arXiv stat.ML TIER_1 English(EN) · Philipp Petersen ·

    在可实现性约束下的适应性:对比上下文学习与代理式学习

    We compare in-context learning with fixed queries and agentic learning with adaptive queries for uniform approximation of task families. We consider two settings: an unrestricted regime, where querying and approximation are arbitrary functions, and a realizable regime, where we r…