PulseAugur
LIVE 07:38:19
research · [4 sources] ·
0
research

New methods enhance LLM reasoning for long-context and multilingual tasks

Researchers have developed new methods for improving large language model reasoning capabilities, particularly for long-context and multilingual tasks. One approach, OGLS-SD, uses outcome-guided logit steering to calibrate teacher model responses during on-policy self-distillation, leading to more stable and effective reasoning. Another method, dGRPO, combines on-policy optimization with distillation to enhance long-context reasoning and introduces a new dataset called LongBlocks. Additionally, COPSD specifically targets low-resource languages by transferring reasoning behavior from high-resource languages through self-distillation, showing significant improvements in multilingual mathematical reasoning. AI

Summary written by gemini-2.5-flash-lite from 4 sources. How we write summaries →

IMPACT These new techniques offer improved stability and effectiveness for LLM reasoning, particularly in challenging long-context and multilingual scenarios, potentially broadening their applicability.

RANK_REASON Multiple arXiv papers detailing new methods for improving LLM reasoning.

Read on arXiv cs.CL →

COVERAGE [4]

  1. arXiv cs.CL TIER_1 · Bingbing Wen ·

    STOP: Structured On-Policy Pruning of Long-Form Reasoning in Low-Data Regimes

    Long chain-of-thought (Long CoT) reasoning improves performance on multi-step problems, but it also induces overthinking: models often generate low-yield reasoning that increases inference cost and latency. This inefficiency is especially problematic in low-data fine-tuning regim…

  2. arXiv cs.AI TIER_1 · Weitong Zhang ·

    OGLS-SD: On-Policy Self-Distillation with Outcome-Guided Logit Steering for LLM Reasoning

    We study {on-policy self-distillation} (OPSD), where a language model improves its reasoning ability by distilling privileged teacher distributions along its own on-policy trajectories. Despite the performance gains of OPSD, we identify a common but often overlooked mismatch betw…

  3. arXiv cs.CL TIER_1 · André F. T. Martins ·

    Combining On-Policy Optimization and Distillation for Long-Context Reasoning in Large Language Models

    Adapting large language models (LLMs) to long-context tasks requires post-training methods that remain accurate and coherent over thousands of tokens. Existing approaches are limited in several ways: 1) off-policy methods such as supervised fine-tuning (SFT) and knowledge distill…

  4. arXiv cs.CL TIER_1 · Hinrich Schütze ·

    Crosslingual On-Policy Self-Distillation for Multilingual Reasoning

    Large language models (LLMs) have achieved remarkable progress in mathematical reasoning, but this ability is not equally accessible across languages. Especially low-resource languages exhibit much lower reasoning performance. To address this, we propose Crosslingual On-Policy Se…