PulseAugur
EN
LIVE 09:47:59

LLMs struggle with fine-grained emotion recognition in zero-shot tests

A new research paper evaluates the zero-shot emotion recognition capabilities of three leading large language models: Claude Sonnet 4.6, ChatGPT (GPT-5.4), and Gemini 2.5-Flash. The study found that Gemini achieved the highest accuracy at 39.9%, closely followed by GPT-5.4 and Claude. However, all models struggled with specific emotions like love, confusion, and shame, and McNemar tests indicated no statistically significant differences in their performance. The research highlights the current limitations of these frontier AI systems in accurately classifying fine-grained emotions without specific training examples. AI

IMPACT Highlights current limitations in LLM zero-shot fine-grained emotion classification, suggesting areas for future model development.

RANK_REASON The cluster contains an academic paper evaluating LLM capabilities on a specific task. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.CL →

AI-generated summary · Google Gemini · from 2 sources. How we write summaries →

LLMs struggle with fine-grained emotion recognition in zero-shot tests

COVERAGE [2]

  1. arXiv cs.CL TIER_1 English(EN) · Lawrence Obiuwevwi, Krzysztof J. Rechowicz, Jessica M. Johnson, Vikas Ashok, Sachin Shetty, Sampath Jayarathna ·

    Quantifying the Affective Gap: A Zero-Shot Evaluation of LLMs on Fine-Grained Emotion Taxonomies

    arXiv:2607.00968v1 Announce Type: new Abstract: Emotion recognition in natural language is a foundational challenge in affective computing, with critical implications for human-computer interaction, mental health support, and conversational AI. This paper presents a rigorous, uni…

  2. arXiv cs.CL TIER_1 English(EN) · Sampath Jayarathna ·

    Quantifying the Affective Gap: A Zero-Shot Evaluation of LLMs on Fine-Grained Emotion Taxonomies

    Emotion recognition in natural language is a foundational challenge in affective computing, with critical implications for human-computer interaction, mental health support, and conversational AI. This paper presents a rigorous, unified zero-shot evaluation of three leading comme…