Output Length Constrained Summarization using GRPO on tiny LLMs | smolcluster
A researcher explored output length-constrained summarization for small language models, specifically Qwen2.5-0.5B-Instruct and LFM-2.5-350M. The project investigated whether these models could produce high-quality summaries of Reddit posts within a strict 64-token limit. Experiments revealed that a staged training curriculum, focusing on length penalties first then quality rewards, outperformed joint training, with METEOR and ROUGE-L proving to be the most effective reward combination. AI
IMPACT Demonstrates that smaller models can be effectively trained for specific tasks with careful reward engineering and staged curricula.