PulseAugur
EN
LIVE 13:26:39

DLT-Corpus released: 2.98B tokens for Distributed Ledger Technology NLP

Researchers have introduced DLT-Corpus, a substantial text collection designed for Distributed Ledger Technology (DLT) research, comprising 2.98 billion tokens from over 22 million documents. This corpus includes scientific literature, patents, and social media posts, aiming to address the limited scope of existing NLP resources for DLT. The researchers demonstrated its utility by analyzing technology emergence patterns and market-innovation correlations, finding that scientific literature often precedes patent and social media appearances. They also released LedgerBERT, a DLT-specific NLP model, and a sentiment analysis dataset. AI

IMPACT Provides a large-scale dataset and specialized model to advance NLP research in the growing Distributed Ledger Technology sector.

RANK_REASON This is a research paper introducing a new dataset and model for a specific domain. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.CL →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

DLT-Corpus released: 2.98B tokens for Distributed Ledger Technology NLP

COVERAGE [1]

  1. arXiv cs.CL TIER_1 English(EN) · Walter Hernandez Cruz, Peter Devine, Nikhil Vadgama, Paolo Tasca, Jiahua Xu ·

    DLT-Corpus: A Large-Scale Text Collection for the Distributed Ledger Technology Domain

    arXiv:2602.22045v2 Announce Type: replace Abstract: We introduce DLT-Corpus, the largest domain-specific text collection for Distributed Ledger Technology (DLT) research to date: 2.98 billion tokens from 22.12 million documents spanning scientific literature (37,440 publications)…