This paper proposes an eojeol-based constituency representation for Korean treebanks, arguing that using morphemes as terminals can obscure syntactic structure and mismatch with dependency resources. The authors demonstrate that the Sejong, Penn Korean, and KAIST treebanks can be compared using a shared eojeol-based constituency backbone under specific normalization assumptions. They outline an annotation scheme that supports cross-treebank comparison, constituency-dependency alignment, and provides a surface-form terminal layer for future Korean constituency parsing. AI
IMPACT This research could improve the accuracy and comparability of Korean natural language processing models by refining their understanding of syntactic structure.
RANK_REASON The item is an academic paper published on arXiv detailing linguistic research. [lever_c_demoted from research: ic=1 ai=1.0]
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →