PulseAugur
实时 11:40:00
English(EN) Efficient Multi-Agent System Training with Data Influence-Oriented Tree Search

新模型改进LLM推理评估和内部状态控制

研究人员开发了一个新框架,旨在最大限度地减少大型语言模型(LLM)激活引导中的“附带损害”,以在不负面影响无关任务性能的情况下控制模型行为。另一篇论文介绍了一种模式感知累积过程奖励模型(SCPRM),通过更准确、风险更敏感地评估推理路径来改进知识图谱问答。此外,一种名为面向数据影响的树搜索(DITS)的新方法,通过识别对模型改进影响最大的数据来增强多智能体系统的训练,其性能优于仅依赖Q值的传统方法。 AI

影响 这些论文引入了改进LLM控制、知识图谱推理准确性和多智能体系统训练效率的新技术,有望带来更强大、更具能力的AI系统。

排序理由 该集群包含三篇在arXiv上发表的独立学术论文,重点关注LLM控制、知识图谱推理和多智能体系统训练方面的新研究。

在 arXiv cs.CL 阅读 →

AI 生成摘要 · Google Gemini · 来自 5 个来源。 我们如何撰写摘要 →

新模型改进LLM推理评估和内部状态控制

报道来源 [5]

  1. arXiv cs.AI TIER_1 English(EN) · Jiujiu Chen, Yazheng Liu, Sihong Xie, Hui Xiong ·

    SCPRM: A Schema-aware Cumulative Process Reward Model for Knowledge Graph Question Answering

    arXiv:2605.02819v1 Announce Type: new Abstract: Large language models excel at complex reasoning, yet evaluating their intermediate steps remains challenging. Although process reward models provide step-wise supervision, they often suffer from a risk compensation effect, where in…

  2. arXiv cs.LG TIER_1 English(EN) · Tam Nguyen, Tu Anh Nguyen, Sina Alemohammad, Richard G. Baraniuk ·

    Minimizing Collateral Damage in Activation Steering

    arXiv:2605.01167v1 Announce Type: new Abstract: Activation steering is a method for controlling Large Language Model (LLM) behavior by intervening in its internal representations to increase the alignment with a specific target feature direction. However, standard interventions, …

  3. arXiv cs.AI TIER_1 English(EN) · Hui Xiong ·

    SCPRM: A Schema-aware Cumulative Process Reward Model for Knowledge Graph Question Answering

    Large language models excel at complex reasoning, yet evaluating their intermediate steps remains challenging. Although process reward models provide step-wise supervision, they often suffer from a risk compensation effect, where incorrect steps are offset by later correct ones, …

  4. Hugging Face Daily Papers TIER_1 English(EN) ·

    SCPRM: A Schema-aware Cumulative Process Reward Model for Knowledge Graph Question Answering

    Large language models excel at complex reasoning, yet evaluating their intermediate steps remains challenging. Although process reward models provide step-wise supervision, they often suffer from a risk compensation effect, where incorrect steps are offset by later correct ones, …

  5. arXiv cs.CL TIER_1 English(EN) · Wentao Shi, Zichun Yu, Fuli Feng, Xiangnan He, Chenyan Xiong ·

    Efficient Multi-Agent System Training with Data Influence-Oriented Tree Search

    arXiv:2502.00955v2 Announce Type: replace Abstract: Monte Carlo Tree Search (MCTS) based methods provide promising approaches for generating synthetic data to enhance the self-training of Large Language Model (LLM) based multi-agent systems (MAS). These methods leverage Q-values …