New benchmark tests LLMs on Persian poetry's meaning and form

By PulseAugur Editorial · [1 sources] · 2026-06-10 04:00

Researchers have developed GhazalBench, a new benchmark designed to evaluate how well large language models understand and reproduce the exact surface form of Persian ghazals. The benchmark tests two key abilities: understanding poetic meaning and accessing canonical surface form under various cues. Current multilingual LLMs show a notable gap, generally grasping the meaning but failing to accurately complete verses in open-ended tasks, though recognition-based tasks show improvement. This limitation appears to stem from insufficient training data rather than architectural constraints, as demonstrated by stronger performance on English sonnets. AI

IMPACT Highlights the need for LLM evaluation frameworks that assess cultural text nuances, potentially guiding future model development for culturally specific applications.

RANK_REASON The cluster contains a research paper introducing a new benchmark for evaluating LLMs. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.CL →

paper
other

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

COVERAGE [1]

arXiv cs.CL TIER_1 English(EN) · Ghazal Kalhor, Yadollah Yaghoobzadeh · 2026-06-10 04:00

GhazalBench: Evaluating LLM Understanding and Canonical Surface-Form Access in Persian Ghazals

arXiv:2603.09979v2 Announce Type: replace Abstract: Persian poetry plays an active role in Iranian cultural practice, where verses by canonical poets such as Hafez are frequently quoted, paraphrased, or completed from partial cues. Supporting such interactions requires language m…

COVERAGE [1]

GhazalBench: Evaluating LLM Understanding and Canonical Surface-Form Access in Persian Ghazals

RELATED ENTITIES

RELATED TOPICS