PulseAugur
EN
LIVE 10:04:54

New benchmark LCSHBench aids AI subject cataloging

Researchers have introduced LCSHBench, a new benchmark dataset for evaluating automated subject cataloging systems, particularly for Library of Congress Subject Headings (LCSH). The dataset comprises 22,346 books in 15 languages, sourced from open catalogs, and includes records where at least two independent cataloging agencies agreed on the LCSH assignment. LCSHBench accounts for both exact heading matches and conceptual similarities, addressing the common discrepancy between topic agreement and precise heading expression among libraries. Initial experiments show that a fine-tuned embedder model can improve performance on this benchmark. AI

IMPACT Provides a standardized evaluation for AI systems performing subject cataloging, potentially improving library resource discovery.

RANK_REASON The cluster describes a new academic paper introducing a benchmark dataset for AI research. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.AI →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

COVERAGE [1]

  1. arXiv cs.AI TIER_1 English(EN) · Kwok Leong Tang ·

    LCSHBench: A Multilingual, Consensus-Grounded Benchmark for Library of Congress Subject Heading Assignment

    arXiv:2606.04382v1 Announce Type: cross Abstract: Automated subject cataloging assigns controlledvocabulary headings to bibliographic records, but LCSH has no standard public benchmark. We introduce LCSHBench: 22,346 books in 15 languages from the openly licensed Harvard, Columbi…