Researchers have developed a novel technique called "Weak-to-Strong Elicitation via Mismatched Wrong Drafts" to improve the capabilities of large language models. This method involves using mathematically incorrect drafts from a smaller, domain-specific model to train a larger model, outperforming standard reinforcement learning fine-tuning. The technique showed significant gains on MATH-500 and out-of-distribution AIME 2025/2026 benchmarks, achieving a new state-of-the-art for the Mathstral-7B model. AI
IMPACT This research suggests a more efficient method for enhancing LLM performance on complex tasks like mathematics, potentially reducing the need for extensive on-policy fine-tuning.
RANK_REASON The cluster describes a new research paper detailing a novel method for improving LLM capabilities. [lever_c_demoted from research: ic=1 ai=1.0]
- AIME 2025
- AIME 2026
- GRPO
- MATH-500
- Mathstral-7B
- Qwen2.5-Math-1.5B
- Weak-to-Strong Elicitation via Mismatched Wrong Drafts
- WizardMath
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →