New Czech language treebanks released for NLP research · 4 sources tracked

By PulseAugur Editorial · [4 sources] · 2026-06-23 08:59

Researchers have released two new papers detailing advancements in Czech language processing resources. The first paper introduces the Prague Dependency Treebank -- Consolidated 2.0 (PDT-C 2.0), an extensive, uniformly annotated corpus of the Czech language comprising nearly 4 million tokens. This resource, developed over three decades, aims to systematically integrate various linguistic layers, including inter-sentential phenomena like coreference and discourse relations. The second paper presents UD_Czech-PDTC, a large and genre-rich treebank converted for use with Universal Dependencies, highlighting the conversion process and the differences between the two annotation schemes. AI

IMPACT These new, large-scale, and genre-diverse Czech language treebanks will enhance the development and evaluation of NLP tools, particularly for Czech, and facilitate cross-linguistic comparisons.

RANK_REASON The cluster consists of two academic papers published on arXiv detailing new linguistic resources for NLP.

Read on arXiv cs.CL →

paper
other

AI-generated summary · Google Gemini · from 4 sources. How we write summaries →

New Czech language treebanks released for NLP research · 4 sources tracked

COVERAGE [4]

arXiv cs.CL TIER_1 English(EN) · Marie Mikulov\'a, Ji\v{r}\'i M\'irovsk\'y, Milan Straka, Pavl\'ina Synkov\'a, Jan \v{S}t\v{e}p\'anek, Barbora \v{S}t\v{e}p\'ankov\'a, Jan Haji\v{c} · 2026-06-24 04:00

Prague Dependency Treebank -- Consolidated 2.0: Enriching a Complex Annotation Scheme

arXiv:2606.24324v1 Announce Type: new Abstract: The Prague Dependency Treebank framework is unique in its attempt to systematically include and link different layers of language, including a meaning representation with several types of inter-sentential phenomena, especially coref…
arXiv cs.CL TIER_1 English(EN) · Marie Mikulov\'a, Barbora \v{S}t\v{e}p\'ankov\'a, Daniel Zeman, Jan \v{S}t\v{e}p\'anek, Milan Straka, Jan Haji\v{c} · 2026-06-24 04:00

Meet UD_Czech-PDTC: A Large and Genre-Rich Treebank in Universal Dependencies

arXiv:2606.24337v1 Announce Type: new Abstract: Czech has been part of Universal Dependencies since its first release in 2015. It has also been one of the best represented languages, with the Prague Dependency Treebank being order of magnitude larger than most other UD treebanks.…
arXiv cs.CL TIER_1 English(EN) · Jan Hajič · 2026-06-23 09:22

Meet UD_Czech-PDTC: A Large and Genre-Rich Treebank in Universal Dependencies

Czech has been part of Universal Dependencies since its first release in 2015. It has also been one of the best represented languages, with the Prague Dependency Treebank being order of magnitude larger than most other UD treebanks. More recently, three other datasets from the Pr…
arXiv cs.CL TIER_1 English(EN) · Jan Hajič · 2026-06-23 08:59

Prague Dependency Treebank -- Consolidated 2.0: Enriching a Complex Annotation Scheme

The Prague Dependency Treebank framework is unique in its attempt to systematically include and link different layers of language, including a meaning representation with several types of inter-sentential phenomena, especially coreference and discourse relations. We present its s…

COVERAGE [4]

Prague Dependency Treebank -- Consolidated 2.0: Enriching a Complex Annotation Scheme

Meet UD_Czech-PDTC: A Large and Genre-Rich Treebank in Universal Dependencies

Meet UD_Czech-PDTC: A Large and Genre-Rich Treebank in Universal Dependencies

Prague Dependency Treebank -- Consolidated 2.0: Enriching a Complex Annotation Scheme

RELATED ENTITIES

RELATED TOPICS