Speaker Verification with Speech-Aware LLMs: Evaluation and Augmentation
Researchers have developed a new method to evaluate and enhance the speaker verification capabilities of speech-aware Large Language Models (LLMs). Initial benchmarks revealed that current speech-aware LLMs exhibit weak speaker discrimination, with error rates exceeding 20% on the VoxCeleb1 dataset. To address this, a lightweight augmentation technique was introduced, which injects speaker embeddings into an LLM and trains only LoRA adapters. This approach, demonstrated on TinyLLaMA-1.1B, resulted in an ECAPA-LLM that achieved a 1.03% error rate on VoxCeleb1-E, nearing the performance of dedicated speaker verification systems while retaining a natural language interface. AI
IMPACT This research could lead to LLMs with enhanced capabilities for understanding and verifying speaker identity, potentially impacting voice assistants and security applications.