PulseAugur
LIVE 07:53:09
research · [5 sources] ·
0
research

New models improve LLM reasoning evaluation and control over internal states

Researchers have developed a new framework to minimize "collateral damage" in activation steering for large language models (LLMs), which aims to control model behavior without negatively impacting performance on unrelated tasks. Another paper introduces a Schema-aware Cumulative Process Reward Model (SCPRM) to improve knowledge graph question answering by evaluating reasoning paths more accurately and risk-sensitively. Additionally, a novel approach called Data Influence-oriented Tree Search (DITS) enhances the training of multi-agent systems by identifying the most impactful data for model improvement, outperforming traditional methods that rely solely on Q-values. AI

Summary written by gemini-2.5-flash-lite from 5 sources. How we write summaries →

IMPACT These papers introduce novel techniques for improving LLM control, reasoning accuracy in knowledge graphs, and efficiency in multi-agent system training, potentially leading to more robust and capable AI systems.

RANK_REASON This cluster contains three distinct academic papers published on arXiv, focusing on novel research in LLM control, knowledge graph reasoning, and multi-agent system training.

Read on arXiv cs.CL →

COVERAGE [5]

  1. arXiv cs.AI TIER_1 · Jiujiu Chen, Yazheng Liu, Sihong Xie, Hui Xiong ·

    SCPRM: A Schema-aware Cumulative Process Reward Model for Knowledge Graph Question Answering

    arXiv:2605.02819v1 Announce Type: new Abstract: Large language models excel at complex reasoning, yet evaluating their intermediate steps remains challenging. Although process reward models provide step-wise supervision, they often suffer from a risk compensation effect, where in…

  2. arXiv cs.LG TIER_1 · Tam Nguyen, Tu Anh Nguyen, Sina Alemohammad, Richard G. Baraniuk ·

    Minimizing Collateral Damage in Activation Steering

    arXiv:2605.01167v1 Announce Type: new Abstract: Activation steering is a method for controlling Large Language Model (LLM) behavior by intervening in its internal representations to increase the alignment with a specific target feature direction. However, standard interventions, …

  3. arXiv cs.AI TIER_1 · Hui Xiong ·

    SCPRM: A Schema-aware Cumulative Process Reward Model for Knowledge Graph Question Answering

    Large language models excel at complex reasoning, yet evaluating their intermediate steps remains challenging. Although process reward models provide step-wise supervision, they often suffer from a risk compensation effect, where incorrect steps are offset by later correct ones, …

  4. Hugging Face Daily Papers TIER_1 ·

    SCPRM: A Schema-aware Cumulative Process Reward Model for Knowledge Graph Question Answering

    Large language models excel at complex reasoning, yet evaluating their intermediate steps remains challenging. Although process reward models provide step-wise supervision, they often suffer from a risk compensation effect, where incorrect steps are offset by later correct ones, …

  5. arXiv cs.CL TIER_1 · Wentao Shi, Zichun Yu, Fuli Feng, Xiangnan He, Chenyan Xiong ·

    Efficient Multi-Agent System Training with Data Influence-Oriented Tree Search

    arXiv:2502.00955v2 Announce Type: replace Abstract: Monte Carlo Tree Search (MCTS) based methods provide promising approaches for generating synthetic data to enhance the self-training of Large Language Model (LLM) based multi-agent systems (MAS). These methods leverage Q-values …