SIREM: Speech-Informed MRI Reconstruction with Learned Sampling
Researchers have developed new methods for real-time MRI (rtMRI) of speech production by integrating acoustic information with visual data. One approach, Speech-Guided Multimodal Learning, uses phonological representations derived from speech to guide articulator localization and fuses visual and acoustic encoders for precise segmentation. Another method, SIREM, reconstructs rtMRI by combining an audio-driven component with MRI data, allowing for faster acquisition and reconstruction while maintaining anatomical accuracy. These techniques aim to improve the visualization of vocal tract motion for speech science and clinical applications. AI
IMPACT Advances in multimodal AI for medical imaging could lead to faster, more accurate diagnostic tools for speech and vocal tract disorders.