新基准评估多语言国会图书馆主题词分配

作者 PulseAugur 编辑部 · [2 个来源] · 2026-06-03 02:58

研究人员推出了LCSHBench，这是一个新的基准数据集，旨在评估国会图书馆主题词（LCSH）的自动化主题分类。该数据集包含15种语言的22,346本书籍，来源于哈佛大学、哥伦比亚大学和普林斯顿大学的目录，并且仅当至少有两个机构就LCSH分配达成一致时才选择记录。LCSHBench同时考虑了精确匹配和概念匹配，解决了图书馆在主题上一致但在精确标题表达上存在差异的常见问题。 AI

影响为多语言主题分类提供标准化评估，有望提高AI跨语言组织和检索信息的能力。

排序理由该集群描述了一个特定NLP任务的新基准数据集。

在 arXiv cs.IR (Information Retrieval) 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。我们如何撰写摘要 →

报道来源 [2]

arXiv cs.AI TIER_1 English(EN) · Kwok Leong Tang · 2026-06-04 04:00

LCSHBench：一个多语言、基于共识的国会图书馆主题词分配基准

arXiv:2606.04382v1 Announce Type: cross Abstract: Automated subject cataloging assigns controlledvocabulary headings to bibliographic records, but LCSH has no standard public benchmark. We introduce LCSHBench: 22,346 books in 15 languages from the openly licensed Harvard, Columbi…
arXiv cs.IR (Information Retrieval) TIER_1 English(EN) · Kwok Leong Tang · 2026-06-03 02:58

LCSHBench：一个多语言、共识驱动的国会图书馆主题词分配基准

Automated subject cataloging assigns controlledvocabulary headings to bibliographic records, but LCSH has no standard public benchmark. We introduce LCSHBench: 22,346 books in 15 languages from the openly licensed Harvard, Columbia, and Princeton catalogs. Records enter only when…

报道来源 [2]

LCSHBench：一个多语言、基于共识的国会图书馆主题词分配基准

LCSHBench：一个多语言、共识驱动的国会图书馆主题词分配基准

相关实体

相关话题