Brief

last 24h

[2/2] 222 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

TOOL · arXiv cs.AI English(EN) · 10h

ASKD-Whisper: Adaptive Self-knowledge Distillation for Efficient and Low-Latency Automatic Speech Recognition

Researchers have developed Adaptive Self-Knowledge Distillation (ASKD), a novel framework for compressing large AI models. This method dynamically reduces reliance on a teacher model's predictions during training, encouraging the student model to develop independent reasoning. ASKD was applied to distill the Whisper speech recognition model into a more efficient version, ASKD-Whisper, which achieved a 5x reduction in inference latency and a 1.07% lower word error rate compared to its teacher. AI

IMPACT This technique could enable more efficient deployment of large ASR models on resource-constrained devices.
RESEARCH · Hugging Face Daily Papers English(EN) · 6d · [4 sources]

Comprehensive Benchmarking of Long-Form Speech Generation in Diverse Scenarios

Researchers have developed new methods to improve the efficiency and performance of speech processing models. FastSLM introduces a hierarchical temporal abstractor to compress audio data significantly while retaining crucial acoustic details, outperforming state-of-the-art models with fewer resources. SALSA offers a lightweight adaptation technique for speech-aware large language models, enhancing their generalization to diverse and out-of-domain speech by learning specific steering vectors. Additionally, a novel training optimization method allows for the joint adjustment of performance and computational complexity in speech models, enabling dynamic size optimization without post-hoc pruning. AI

IMPACT These advancements aim to improve the efficiency and adaptability of speech models, potentially enabling more robust and versatile AI applications in audio processing and language understanding.

Brief

ASKD-Whisper: Adaptive Self-knowledge Distillation for Efficient and Low-Latency Automatic Speech Recognition

Comprehensive Benchmarking of Long-Form Speech Generation in Diverse Scenarios