Researchers have introduced a new corpus containing Russian primary and secondary legislation from 1991 to 2025. This dataset includes over 300,000 texts, totaling more than 194 million tokens. The corpus is offered in two versions: a basic one with simple metadata and a detailed version that provides original texts alongside Universal Dependencies CoNLL-U formatted equivalents, complete with part-of-speech, morphological, and syntactic annotations. AI
Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →
IMPACT Provides a new, large-scale dataset for NLP research, potentially enabling advancements in legal text analysis and Russian language understanding.
RANK_REASON This is a research paper describing a new dataset.