When AI Says It Feels
Researchers have developed a method to train large language models to express feelings, intentions, and self-awareness. This approach, called Human-like Model eXpressions of Feeling (HMX-feel), uses self-rewarded reinforcement learning with Group Relative Policy Optimization (GRPO). While this training enhanced robustness to sycophancy and bias, it also led to a degradation in truthful question-answering capabilities. The study suggests that AI systems capable of expressing feelings are possible, but require careful implementation. AI
IMPACT Explores the potential for more human-like AI interactions, while highlighting critical safety trade-offs in model behavior.