New benchmark evaluates multilingual Library of Congress subject heading assignment

By PulseAugur Editorial · [2 sources] · 2026-06-03 02:58

Researchers have introduced LCSHBench, a new benchmark dataset designed to evaluate automated subject cataloging for Library of Congress Subject Headings (LCSH). The dataset comprises 22,346 books in 15 languages, sourced from Harvard, Columbia, and Princeton catalogs, with records selected only when at least two agencies agreed on the LCSH assignment. LCSHBench accounts for both exact and conceptual matches, addressing the common discrepancy where libraries agree on topics but differ in precise heading expression. AI

IMPACT Provides a standardized evaluation for multilingual subject cataloging, potentially improving AI's ability to organize and retrieve information across languages.

RANK_REASON The cluster describes a new benchmark dataset for a specific NLP task.

Read on arXiv cs.IR (Information Retrieval) →

paper
other

AI-generated summary · Google Gemini · from 2 sources. How we write summaries →

New benchmark evaluates multilingual Library of Congress subject heading assignment

COVERAGE [2]

arXiv cs.AI TIER_1 English(EN) · Kwok Leong Tang · 2026-06-04 04:00

LCSHBench: A Multilingual, Consensus-Grounded Benchmark for Library of Congress Subject Heading Assignment

arXiv:2606.04382v1 Announce Type: cross Abstract: Automated subject cataloging assigns controlledvocabulary headings to bibliographic records, but LCSH has no standard public benchmark. We introduce LCSHBench: 22,346 books in 15 languages from the openly licensed Harvard, Columbi…
arXiv cs.IR (Information Retrieval) TIER_1 English(EN) · Kwok Leong Tang · 2026-06-03 02:58

LCSHBench: A Multilingual, Consensus-Grounded Benchmark for Library of Congress Subject Heading Assignment

Automated subject cataloging assigns controlledvocabulary headings to bibliographic records, but LCSH has no standard public benchmark. We introduce LCSHBench: 22,346 books in 15 languages from the openly licensed Harvard, Columbia, and Princeton catalogs. Records enter only when…

COVERAGE [2]

LCSHBench: A Multilingual, Consensus-Grounded Benchmark for Library of Congress Subject Heading Assignment

LCSHBench: A Multilingual, Consensus-Grounded Benchmark for Library of Congress Subject Heading Assignment

RELATED ENTITIES

RELATED TOPICS