LoRA parameter placement impacts GRPO fine-tuning, not SFT

By PulseAugur Editorial · Summary by gemini-2.5-flash-lite from 1 source

Researchers have investigated the parameter placement problem within Low-Rank Adaptation (LoRA) for fine-tuning large language models. Their study reveals that for Supervised Fine-Tuning (SFT), the specific placement of trainable parameters in the LoRA adapter's B matrix does not significantly impact performance. However, under Gradient-based Reinforcement Learning (GRPO), random parameter placement fails to improve the base model, while informed placement recovers standard LoRA accuracy. This difference is attributed to the gradient structure, with SFT gradients being stable and GRPO gradients being near-orthogonal, necessitating a gradient-informed approach for effective learning in the latter. AI

Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →

IMPACT Identifies critical parameter placements for effective GRPO fine-tuning, potentially optimizing resource usage for specific LLM adaptation tasks.

RANK_REASON The cluster contains an academic paper detailing a novel finding in model fine-tuning techniques. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.AI →

COVERAGE [1]

arXiv cs.AI TIER_1 · Charles Lovering · 2026-05-12 14:46

Not How Many, But Which: Parameter Placement in Low-Rank Adaptation

We study the \textit{parameter placement problem}: given a fixed budget of $k$ trainable entries within the B matrix of a LoRA adapter (A frozen), does the choice of which $k$ matter? Under supervised fine-tuning, random and informed subsets achieve comparable performance. Under …

COVERAGE [1]

Not How Many, But Which: Parameter Placement in Low-Rank Adaptation

RELATED ENTITIES

RELATED TOPICS