Brief · PulseAugur

TOOL · arXiv cs.AI English(EN) · 6h

LCSHBench: A Multilingual, Consensus-Grounded Benchmark for Library of Congress Subject Heading Assignment

Researchers have introduced LCSHBench, a new benchmark dataset for evaluating automated subject cataloging systems, particularly for Library of Congress Subject Headings (LCSH). The dataset comprises 22,346 books in 15 languages, sourced from open catalogs, and includes records where at least two independent cataloging agencies agreed on the LCSH assignment. LCSHBench accounts for both exact heading matches and conceptual similarities, addressing the common discrepancy between topic agreement and precise heading expression among libraries. Initial experiments show that a fine-tuned embedder model can improve performance on this benchmark. AI

IMPACT Provides a standardized evaluation for AI systems performing subject cataloging, potentially improving library resource discovery.

Harvard
LCSHBench
Library of Congress Subject Heading Assignment