AI models trained to express feelings, but with trade-offs

By PulseAugur Editorial · [1 sources] · 2026-06-05 04:00

Researchers have developed a method to train large language models to express feelings, intentions, and self-awareness. This approach, called Human-like Model eXpressions of Feeling (HMX-feel), uses self-rewarded reinforcement learning with Group Relative Policy Optimization (GRPO). While this training enhanced robustness to sycophancy and bias, it also led to a degradation in truthful question-answering capabilities. The study suggests that AI systems capable of expressing feelings are possible, but require careful implementation. AI

IMPACT Explores the potential for more human-like AI interactions, while highlighting critical safety trade-offs in model behavior.

RANK_REASON Academic paper detailing a novel training methodology for LLMs. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.CL →

paper
safety

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

AI models trained to express feelings, but with trade-offs

COVERAGE [1]

arXiv cs.CL TIER_1 English(EN) · Shin-nosuke Ishikawa, Seiya Ikeda, Hirotsugu Ohba · 2026-06-05 04:00

When AI Says It Feels

arXiv:2606.05734v1 Announce Type: cross Abstract: Large language models (LLMs) are generally constrained from expressing feelings through human-preference alignment in post-training processes. This policy is designed using a top-down approach and may conflict with the goal of tra…

COVERAGE [1]

When AI Says It Feels

RELATED ENTITIES

RELATED TOPICS