Brief · PulseAugur

RESEARCH · arXiv cs.CL English(EN) · 21h · [2 sources]

Recovering the Zipfian Distribution in Unsupervised Term Discovery

Researchers have published a paper proposing graph-based clustering as a superior method for unsupervised term discovery in speech processing. Unlike traditional center-based methods like K-means, which create uniform distributions, graph clustering, particularly using the Leiden algorithm, generates more Zipf-like distributions that better represent natural lexicons. This approach demonstrated superior performance across three languages for both word and syllable discovery. AI

IMPACT This research could lead to more accurate and natural lexicon generation in speech processing systems.

unsupervised term discovery
K-means
Leiden algorithm
BIRCH
GMM