PreFT: Prefill-only finetuning for efficient inference
Researchers have developed PreFT, a novel parameter-efficient finetuning method designed to improve the efficiency of serving personalized large language models. PreFT optimizes for serving throughput by applying adapters only during the prefill stage and discarding them for the decoding stage. This approach significantly increases throughput, with minimal impact on performance, and offers a more favorable accuracy-throughput tradeoff for personalized LLM serving. AI
IMPACT Enables more efficient serving of personalized LLMs, potentially reducing infrastructure costs and improving user experience.