Researchers release comprehensive Russian legislative corpus for NLP tasks

By PulseAugur Editorial · [1 sources] · 2026-04-29 04:00

Researchers have introduced a new corpus containing Russian primary and secondary legislation from 1991 to 2025. This dataset includes over 300,000 texts, totaling more than 194 million tokens. The corpus is offered in two versions: a basic one with simple metadata and a detailed version that provides original texts alongside Universal Dependencies CoNLL-U formatted equivalents, complete with part-of-speech, morphological, and syntactic annotations. AI

IMPACT Provides a new, large-scale dataset for NLP research, potentially enabling advancements in legal text analysis and Russian language understanding.

RANK_REASON This is a research paper describing a new dataset.

Read on arXiv cs.CL →

paper
other

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

Researchers release comprehensive Russian legislative corpus for NLP tasks

COVERAGE [1]

arXiv cs.CL TIER_1 English(EN) · Denis Saveliev, Ruslan Kuchakov · 2026-04-29 04:00

The Russian Legislative Corpus

arXiv:2406.04855v3 Announce Type: replace Abstract: We present a comprehensive corpus of Russian primary and secondary legislation adopted between 1991 and 2025, comprising 304,382 texts (194,425,905 tokens). The corpus is available in two versions: the basic version contains tex…

COVERAGE [1]

The Russian Legislative Corpus

RELATED ENTITIES

RELATED TOPICS