Connecting Speech to Words through Images
Researchers have developed a novel method to create a spoken word vocabulary without relying on explicit text supervision. This approach uses images and their spoken descriptions to build a lexicon of written words, then aligns these with relevant audio segments. The system leverages unsupervised word discovery techniques to link spoken word segments to their written counterparts, demonstrating effectiveness in spoken word retrieval and keyword spotting tasks. AI
IMPACT Enables low-resource language development and improves interpretability in speech-to-text systems.