Brief · PulseAugur

TOOL · r/LocalLLaMA English(EN) · 1w

Output Length Constrained Summarization using GRPO on tiny LLMs | smolcluster

A researcher explored output length-constrained summarization for small language models, specifically Qwen2.5-0.5B-Instruct and LFM-2.5-350M. The project investigated whether these models could produce high-quality summaries of Reddit posts within a strict 64-token limit. Experiments revealed that a staged training curriculum, focusing on length penalties first then quality rewards, outperformed joint training, with METEOR and ROUGE-L proving to be the most effective reward combination. AI

IMPACT Demonstrates that smaller models can be effectively trained for specific tasks with careful reward engineering and staged curricula.

vLLM
GRPO
MLX
Qwen2.5-0.5B-Instruct
LFM-2.5-350M
East-Muffin-6472
smolcluster