Researchers have introduced VOLTBench, a new benchmark designed to systematically measure the length volatility of long-form text generation from large language models. Through analysis of attention traces, they identified internal patterns contributing to this instability. To address the issue, they propose Stable Generation via Logits Boosting (GLoBo), a decoding-stage optimization that significantly improves length accuracy and stability without requiring additional training. AI
Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →
IMPACT Introduces a new benchmark and method to improve stability and accuracy in long-form LLM generation.
RANK_REASON This is a research paper introducing a new benchmark and mitigation strategy for LLM generation. [lever_c_demoted from research: ic=1 ai=1.0]