PulseAugur
LIVE 12:27:47
research · [1 source] ·
0
research

Researchers release comprehensive Russian legislative corpus for NLP tasks

Researchers have introduced a new corpus containing Russian primary and secondary legislation from 1991 to 2025. This dataset includes over 300,000 texts, totaling more than 194 million tokens. The corpus is offered in two versions: a basic one with simple metadata and a detailed version that provides original texts alongside Universal Dependencies CoNLL-U formatted equivalents, complete with part-of-speech, morphological, and syntactic annotations. AI

Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →

IMPACT Provides a new, large-scale dataset for NLP research, potentially enabling advancements in legal text analysis and Russian language understanding.

RANK_REASON This is a research paper describing a new dataset.

Read on arXiv cs.CL →

COVERAGE [1]

  1. arXiv cs.CL TIER_1 · Denis Saveliev, Ruslan Kuchakov ·

    The Russian Legislative Corpus

    arXiv:2406.04855v3 Announce Type: replace Abstract: We present a comprehensive corpus of Russian primary and secondary legislation adopted between 1991 and 2025, comprising 304,382 texts (194,425,905 tokens). The corpus is available in two versions: the basic version contains tex…