Brief · PulseAugur

TOOL · arXiv cs.AI English(EN) · 11h

T-POP: Test-Time Personalization with Online Preference Feedback

Researchers have developed T-POP, a new method for personalizing large language models in real-time using online preference feedback. This approach addresses the cold-start problem by learning a reward function from user interactions without updating the LLM's parameters. T-POP employs dueling bandits to efficiently balance exploration of user preferences and exploitation of learned knowledge, demonstrating significant improvements over existing methods in data efficiency and personalization speed. AI

IMPACT Enables rapid, data-efficient LLM personalization for new users without model retraining.

LLMs
Zhongxiang Dai