PulseAugur
EN
LIVE 02:07:54

New Czech language treebanks released for NLP research · 4 sources tracked

Researchers have released two new papers detailing advancements in Czech language processing resources. The first paper introduces the Prague Dependency Treebank -- Consolidated 2.0 (PDT-C 2.0), an extensive, uniformly annotated corpus of the Czech language comprising nearly 4 million tokens. This resource, developed over three decades, aims to systematically integrate various linguistic layers, including inter-sentential phenomena like coreference and discourse relations. The second paper presents UD_Czech-PDTC, a large and genre-rich treebank converted for use with Universal Dependencies, highlighting the conversion process and the differences between the two annotation schemes. AI

IMPACT These new, large-scale, and genre-diverse Czech language treebanks will enhance the development and evaluation of NLP tools, particularly for Czech, and facilitate cross-linguistic comparisons.

RANK_REASON The cluster consists of two academic papers published on arXiv detailing new linguistic resources for NLP.

Read on arXiv cs.CL →

AI-generated summary · Google Gemini · from 4 sources. How we write summaries →

New Czech language treebanks released for NLP research · 4 sources tracked

COVERAGE [4]

  1. arXiv cs.CL TIER_1 English(EN) · Marie Mikulov\'a, Ji\v{r}\'i M\'irovsk\'y, Milan Straka, Pavl\'ina Synkov\'a, Jan \v{S}t\v{e}p\'anek, Barbora \v{S}t\v{e}p\'ankov\'a, Jan Haji\v{c} ·

    Prague Dependency Treebank -- Consolidated 2.0: Enriching a Complex Annotation Scheme

    arXiv:2606.24324v1 Announce Type: new Abstract: The Prague Dependency Treebank framework is unique in its attempt to systematically include and link different layers of language, including a meaning representation with several types of inter-sentential phenomena, especially coref…

  2. arXiv cs.CL TIER_1 English(EN) · Marie Mikulov\'a, Barbora \v{S}t\v{e}p\'ankov\'a, Daniel Zeman, Jan \v{S}t\v{e}p\'anek, Milan Straka, Jan Haji\v{c} ·

    Meet UD_Czech-PDTC: A Large and Genre-Rich Treebank in Universal Dependencies

    arXiv:2606.24337v1 Announce Type: new Abstract: Czech has been part of Universal Dependencies since its first release in 2015. It has also been one of the best represented languages, with the Prague Dependency Treebank being order of magnitude larger than most other UD treebanks.…

  3. arXiv cs.CL TIER_1 English(EN) · Jan Hajič ·

    Meet UD_Czech-PDTC: A Large and Genre-Rich Treebank in Universal Dependencies

    Czech has been part of Universal Dependencies since its first release in 2015. It has also been one of the best represented languages, with the Prague Dependency Treebank being order of magnitude larger than most other UD treebanks. More recently, three other datasets from the Pr…

  4. arXiv cs.CL TIER_1 English(EN) · Jan Hajič ·

    Prague Dependency Treebank -- Consolidated 2.0: Enriching a Complex Annotation Scheme

    The Prague Dependency Treebank framework is unique in its attempt to systematically include and link different layers of language, including a meaning representation with several types of inter-sentential phenomena, especially coreference and discourse relations. We present its s…