PulseAugur
EN
LIVE 10:52:14

Korean NLP research proposes eojeol-based constituency representation

This paper proposes an eojeol-based constituency representation for Korean treebanks, arguing that using morphemes as terminals can obscure syntactic structure and mismatch with dependency resources. The authors demonstrate that the Sejong, Penn Korean, and KAIST treebanks can be compared using a shared eojeol-based constituency backbone under specific normalization assumptions. They outline an annotation scheme that supports cross-treebank comparison, constituency-dependency alignment, and provides a surface-form terminal layer for future Korean constituency parsing. AI

IMPACT This research could improve the accuracy and comparability of Korean natural language processing models by refining their understanding of syntactic structure.

RANK_REASON The item is an academic paper published on arXiv detailing linguistic research. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.CL →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

Korean NLP research proposes eojeol-based constituency representation

COVERAGE [1]

  1. arXiv cs.CL TIER_1 English(EN) · Jungyeul Park, Chulwoo Park ·

    Constituency Structure over Eojeol in Korean Treebanks

    arXiv:2512.22487v2 Announce Type: replace Abstract: The design of Korean constituency treebanks raises a central representational question concerning the choice of terminal units. Although Korean words are morphologically complex, treating morphemes as constituency terminals can …