English(EN) DLT-Corpus: A Large-Scale Text Collection for the Distributed Ledger Technology Domain

DLT-Corpus发布：面向分布式账本技术NLP的29.8亿个词元

作者 PulseAugur 编辑部 · [1 个来源] · 2026-05-29 04:00

研究人员推出了DLT-Corpus，这是一个为分布式账本技术（DLT）研究设计的大型文本集合，包含来自超过2200万份文档的29.8亿个词元。该语料库包括科学文献、专利和社会媒体帖子，旨在解决现有DLT自然语言处理（NLP）资源范围有限的问题。研究人员通过分析技术出现模式和市场创新相关性来证明其效用，发现科学文献通常先于专利和社会媒体出现。他们还发布了LedgerBERT，一个DLT特定的NLP模型，以及一个情感分析数据集。 AI

影响为推动分布式账本技术领域不断增长的NLP研究提供了一个大规模数据集和专用模型。

排序理由这是一篇介绍特定领域新数据集和模型的学术论文。[lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.CL 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.CL TIER_1 English(EN) · Walter Hernandez Cruz, Peter Devine, Nikhil Vadgama, Paolo Tasca, Jiahua Xu · 2026-05-29 04:00

DLT-Corpus: A Large-Scale Text Collection for the Distributed Ledger Technology Domain

arXiv:2602.22045v2 Announce Type: replace Abstract: We introduce DLT-Corpus, the largest domain-specific text collection for Distributed Ledger Technology (DLT) research to date: 2.98 billion tokens from 22.12 million documents spanning scientific literature (37,440 publications)…

报道来源 [1]

DLT-Corpus: A Large-Scale Text Collection for the Distributed Ledger Technology Domain

相关实体

相关话题