PulseAugur
EN
LIVE 07:44:14

New benchmark SPLIT tests LLM empathy in English and Ukrainian

A new benchmark called SPLIT has been developed to evaluate the cross-lingual empathy and cultural grounding of Large Language Models (LLMs) in crisis-related situations, specifically focusing on English and Ukrainian. The benchmark includes 500 prompts across five categories: Stress, Panic, Loneliness, Internal Displacement, and Tension. Evaluations of Gemini 2.5-Flash and Llama 3.3 70B Instruct showed a degradation in performance when handling Ukrainian, while DeepSeek-V3 maintained stability. The study also noted that human and AI evaluators have weak agreement on empathy and naturalness but diverge on cultural grounding, suggesting that generating Ukrainian text does not equate to providing culturally appropriate emotional support. AI

IMPACT This benchmark could drive the development of more culturally sensitive and empathetic LLMs for crisis support in low-resource languages.

RANK_REASON The cluster contains an academic paper introducing a new benchmark for LLM evaluation. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.AI →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

New benchmark SPLIT tests LLM empathy in English and Ukrainian

COVERAGE [1]

  1. arXiv cs.AI TIER_1 English(EN) · Anna Chorna ·

    SPLIT: Cross-Lingual Empathy and Cultural Grounding in English and Ukrainian LLM Responses

    arXiv:2607.02049v1 Announce Type: cross Abstract: Large Language Models are increasingly deployed in emotional-support contexts and crisis-related situations. Nevertheless, their cross-lingual abilities in these circumstances remain underexplored. Existing benchmarks emphasize mu…