Brief · PulseAugur

TOOL · arXiv cs.AI English(EN) · 8h

Unifying Acoustic Features and Text with Multimodal LLMs for Neurodegenerative Screening

Researchers have developed NeurMLLM, a novel multimodal large language model designed for staging neurodegenerative diseases like Alzheimer's and Parkinson's. This framework integrates acoustic features from speech, text transcripts, and demographic data into a unified sequence for an LLM. By employing vision transformers to encode audio spectrograms and Mel-frequency cepstral coefficients, NeurMLLM achieves superior performance compared to traditional machine learning and existing LLM-based methods on the Bridge2AI-Voice dataset, demonstrating the potential of multimodal LLMs in improving disease staging accuracy and accessibility. AI

IMPACT This research demonstrates a novel application of multimodal LLMs for medical screening, potentially improving diagnostic accuracy and accessibility for neurodegenerative diseases.

Parkinson's disease
Alzheimer's disease
Vision Transformers
NeurMLLM
Bridge2AI-Voice dataset