Brief · PulseAugur

TOOL · arXiv cs.AI English(EN) · 7h

Speaker Verification with Speech-Aware LLMs: Evaluation and Augmentation

Researchers have developed a new method to evaluate and enhance the speaker verification capabilities of speech-aware Large Language Models (LLMs). Initial benchmarks revealed that current speech-aware LLMs exhibit weak speaker discrimination, with error rates exceeding 20% on the VoxCeleb1 dataset. To address this, a lightweight augmentation technique was introduced, which injects speaker embeddings into an LLM and trains only LoRA adapters. This approach, demonstrated on TinyLLaMA-1.1B, resulted in an ECAPA-LLM that achieved a 1.03% error rate on VoxCeleb1-E, nearing the performance of dedicated speaker verification systems while retaining a natural language interface. AI

IMPACT This research could lead to LLMs with enhanced capabilities for understanding and verifying speaker identity, potentially impacting voice assistants and security applications.

Lora
ECAPA-TDNN
TinyLLaMA-1.1B
Speech-Aware LLMs
VoxCeleb1
ECAPA-LLM
VoxCeleb1-E
Thomas Thebaud