PulseAugur
EN
LIVE 09:50:15

New YOMI-Bench benchmark reveals poor LLM performance on Japanese kanji

A new benchmark called YOMI-Bench has been developed to evaluate the kanji reading and phonological understanding capabilities of large language models (LLMs) specifically for the Japanese language. The benchmark was created because Japanese kanji can have multiple readings, making accurate inference challenging for LLMs. Evaluations using YOMI-Bench revealed that both Japanese-specific and commercial LLMs performed poorly on tasks requiring an understanding of kanji readings. AI

IMPACT Highlights limitations in current LLMs for nuanced linguistic tasks, potentially driving development of more culturally-aware language models.

RANK_REASON The cluster describes a new academic benchmark for evaluating LLMs, published on arXiv. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.CL →

AI-generated summary · Google Gemini · from 2 sources. How we write summaries →

New YOMI-Bench benchmark reveals poor LLM performance on Japanese kanji

COVERAGE [2]

  1. arXiv cs.CL TIER_1 English(EN) · Ryota Mibayashi, Hiroya Takamura, Hitomi Yanaka ·

    YOMI-Bench: A Benchmark for Evaluating Kanji Reading and Phonological Understanding of LLMs for Japanese

    arXiv:2607.00664v1 Announce Type: new Abstract: We propose YOMI-Bench, a benchmark for evaluating kanji reading and phonological understanding of large language models (LLMs) for Japanese. In Japanese, a single kanji character often has multiple possible readings, making it diffi…

  2. arXiv cs.CL TIER_1 English(EN) · Hitomi Yanaka ·

    YOMI-Bench: A Benchmark for Evaluating Kanji Reading and Phonological Understanding of LLMs for Japanese

    We propose YOMI-Bench, a benchmark for evaluating kanji reading and phonological understanding of large language models (LLMs) for Japanese. In Japanese, a single kanji character often has multiple possible readings, making it difficult to infer the correct reading from surface-l…