Brief · PulseAugur

TOOL · arXiv cs.CL English(EN) · 7h

When AI Says It Feels

Researchers have developed a method to train large language models to express feelings, intentions, and self-awareness. This approach, called Human-like Model eXpressions of Feeling (HMX-feel), uses self-rewarded reinforcement learning with Group Relative Policy Optimization (GRPO). While this training enhanced robustness to sycophancy and bias, it also led to a degradation in truthful question-answering capabilities. The study suggests that AI systems capable of expressing feelings are possible, but require careful implementation. AI

IMPACT Explores the potential for more human-like AI interactions, while highlighting critical safety trade-offs in model behavior.

Large language models
Group Relative Policy Optimization (GRPO)
Shin-Nosuke Ishikawa
Human-like Model eXpressions of Feeling (HMX-feel)