Brief · PulseAugur

TOOL · arXiv cs.CL English(EN) · 12h

Disentangling Similarity and Relatedness in Topic Models

Researchers have developed a new method to distinguish between thematic relatedness and taxonomic similarity in topic models, particularly those augmented with large language models. They created a synthetic benchmark using LLM annotations to train a neural scorer capable of measuring these two semantic axes. This scorer revealed that different topic model families occupy distinct positions in the similarity-relatedness space and that optimizing for one axis can degrade performance on tasks requiring the other. AI

IMPACT Provides a framework for evaluating the semantic nuances captured by topic models, potentially improving their application in downstream NLP tasks.

Large Language Models
Latent Dirichlet Allocation
Hanlin Xiao