New YOMI-Bench benchmark reveals poor LLM performance on Japanese kanji

By PulseAugur Editorial · [1 sources] · 2026-07-01 09:13

A new benchmark called YOMI-Bench has been developed to evaluate the kanji reading and phonological understanding capabilities of large language models (LLMs) specifically for the Japanese language. The benchmark was created because Japanese kanji can have multiple readings, making accurate inference challenging for LLMs. Evaluations using YOMI-Bench revealed that both Japanese-specific and commercial LLMs performed poorly on tasks requiring an understanding of kanji readings. AI

IMPACT Highlights limitations in current LLMs for nuanced linguistic tasks, potentially driving development of more culturally-aware language models.

RANK_REASON The cluster describes a new academic benchmark for evaluating LLMs, published on arXiv. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.CL →

paper
other

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

New YOMI-Bench benchmark reveals poor LLM performance on Japanese kanji

COVERAGE [1]

arXiv cs.CL TIER_1 English(EN) · Hitomi Yanaka · 2026-07-01 09:13

YOMI-Bench: A Benchmark for Evaluating Kanji Reading and Phonological Understanding of LLMs for Japanese

We propose YOMI-Bench, a benchmark for evaluating kanji reading and phonological understanding of large language models (LLMs) for Japanese. In Japanese, a single kanji character often has multiple possible readings, making it difficult to infer the correct reading from surface-l…

COVERAGE [1]

YOMI-Bench: A Benchmark for Evaluating Kanji Reading and Phonological Understanding of LLMs for Japanese

RELATED ENTITIES

RELATED TOPICS