LLMs struggle with fine-grained emotion recognition in zero-shot tests

By PulseAugur Editorial · [2 sources] · 2026-07-01 14:04

A new research paper evaluates the zero-shot emotion recognition capabilities of three leading large language models: Claude Sonnet 4.6, ChatGPT (GPT-5.4), and Gemini 2.5-Flash. The study found that Gemini achieved the highest accuracy at 39.9%, closely followed by GPT-5.4 and Claude. However, all models struggled with specific emotions like love, confusion, and shame, and McNemar tests indicated no statistically significant differences in their performance. The research highlights the current limitations of these frontier AI systems in accurately classifying fine-grained emotions without specific training examples. AI

IMPACT Highlights current limitations in LLM zero-shot fine-grained emotion classification, suggesting areas for future model development.

RANK_REASON The cluster contains an academic paper evaluating LLM capabilities on a specific task. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.CL →

AI-generated summary · Google Gemini · from 2 sources. How we write summaries →

LLMs struggle with fine-grained emotion recognition in zero-shot tests

COVERAGE [2]

arXiv cs.CL TIER_1 English(EN) · Lawrence Obiuwevwi, Krzysztof J. Rechowicz, Jessica M. Johnson, Vikas Ashok, Sachin Shetty, Sampath Jayarathna · 2026-07-02 04:00

Quantifying the Affective Gap: A Zero-Shot Evaluation of LLMs on Fine-Grained Emotion Taxonomies

arXiv:2607.00968v1 Announce Type: new Abstract: Emotion recognition in natural language is a foundational challenge in affective computing, with critical implications for human-computer interaction, mental health support, and conversational AI. This paper presents a rigorous, uni…
arXiv cs.CL TIER_1 English(EN) · Sampath Jayarathna · 2026-07-01 14:04

Quantifying the Affective Gap: A Zero-Shot Evaluation of LLMs on Fine-Grained Emotion Taxonomies

Emotion recognition in natural language is a foundational challenge in affective computing, with critical implications for human-computer interaction, mental health support, and conversational AI. This paper presents a rigorous, unified zero-shot evaluation of three leading comme…

COVERAGE [2]

Quantifying the Affective Gap: A Zero-Shot Evaluation of LLMs on Fine-Grained Emotion Taxonomies

Quantifying the Affective Gap: A Zero-Shot Evaluation of LLMs on Fine-Grained Emotion Taxonomies

RELATED ENTITIES

RELATED TOPICS