Re-Centering Humans in LLM Personalization
A new paper explores the effectiveness of large language model (LLM) personalization by comparing synthetic data evaluations with real human conversations. The study found that LLMs struggle to accurately extract user attributes from human interactions and often generate personalized responses that humans do not find superior to generic ones. Researchers introduced interventions to improve early stages of personalization evaluation but noted that learned reward models still have a modest correlation with human judgments, indicating challenges in modeling human-aligned personalization. AI
IMPACT Highlights limitations in current LLM personalization, suggesting a need for better human-aligned evaluation methods.