Unifying Acoustic Features and Text with Multimodal LLMs for Neurodegenerative Screening
Researchers have developed NeurMLLM, a novel multimodal large language model designed for staging neurodegenerative diseases like Alzheimer's and Parkinson's. This framework integrates acoustic features from speech, text transcripts, and demographic data into a unified sequence for an LLM. By employing vision transformers to encode audio spectrograms and Mel-frequency cepstral coefficients, NeurMLLM achieves superior performance compared to traditional machine learning and existing LLM-based methods on the Bridge2AI-Voice dataset, demonstrating the potential of multimodal LLMs in improving disease staging accuracy and accessibility. AI
IMPACT This research demonstrates a novel application of multimodal LLMs for medical screening, potentially improving diagnostic accuracy and accessibility for neurodegenerative diseases.