A new research paper evaluates the zero-shot emotion recognition capabilities of three leading large language models: Claude Sonnet 4.6, ChatGPT (GPT-5.4), and Gemini 2.5-Flash. The study found that Gemini achieved the highest accuracy at 39.9%, closely followed by GPT-5.4 and Claude. However, all models struggled with specific emotions like love, confusion, and shame, and McNemar tests indicated no statistically significant differences in their performance. The research highlights the current limitations of these frontier AI systems in accurately classifying fine-grained emotions without specific training examples. AI
IMPACT Highlights current limitations in LLM zero-shot fine-grained emotion classification, suggesting areas for future model development.
RANK_REASON The cluster contains an academic paper evaluating LLM capabilities on a specific task. [lever_c_demoted from research: ic=1 ai=1.0]
- April 2026
- boltuix/emotions
- ChatGPT
- Claude
- Claude Sonnet 4.6
- Gemini
- Gemini 2.5-Flash
- GPT-5.4
- McNemar tests
AI-generated summary · Google Gemini · from 2 sources. How we write summaries →