PulseAugur
实时 22:01:58

New Yoked Feature Preference Optimization enhances LLM math reasoning

Researchers have introduced Yoked Feature Preference Optimization (YFPO), a novel framework designed to enhance the mathematical reasoning capabilities of large language models. Unlike existing methods that rely solely on external preference data, YFPO incorporates internal neuron activation patterns to guide the optimization process. By identifying neurons associated with mathematical concepts and logical reasoning, YFPO constructs an auxiliary reward signal that complements external supervision. Preliminary experiments on a small-scale model using the GSM8K benchmark indicate that this neuron-guided approach can potentially improve reasoning performance and offers a more interpretable path for model fine-tuning. AI

影响 Introduces a novel neuron-guided approach to LLM fine-tuning, potentially improving mathematical reasoning and interpretability.

排序理由 Publication of an academic paper detailing a new method for LLM fine-tuning. [lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.CL 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。 我们如何撰写摘要 →

New Yoked Feature Preference Optimization enhances LLM math reasoning

报道来源 [1]

  1. arXiv cs.CL TIER_1 English(EN) · Yifan Le ·

    YFPO: A Preliminary Study of Yoked Feature Preference Optimization with Neuron-Guided Rewards for Mathematical Reasoning

    Preference optimization has become an important post-training paradigm for improving the reasoning abilities of large language models. Existing methods typically rely on externally constructed preference data, using preferred and dispreferred responses as sample-level supervision…