Brief · PulseAugur

TOOL · arXiv cs.CL English(EN) · 8h

GhazalBench: Evaluating LLM Understanding and Canonical Surface-Form Access in Persian Ghazals

Researchers have developed GhazalBench, a new benchmark designed to evaluate how well large language models understand and reproduce the exact surface form of Persian ghazals. The benchmark tests two key abilities: understanding poetic meaning and accessing canonical surface form under various cues. Current multilingual LLMs show a notable gap, generally grasping the meaning but failing to accurately complete verses in open-ended tasks, though recognition-based tasks show improvement. This limitation appears to stem from insufficient training data rather than architectural constraints, as demonstrated by stronger performance on English sonnets. AI

IMPACT Highlights the need for LLM evaluation frameworks that assess cultural text nuances, potentially guiding future model development for culturally specific applications.

LLMs
GhazalBench
Persian ghazals
Ghazal Kalhor