A pilot study explored the use of multimodal large language models (MLLMs) for analyzing pathological movements in seizure videos. The research found that MLLMs, without specific training, outperformed traditional computer vision models on many seizure features, particularly recognizing postural and contextual elements. While MLLMs struggled with subtle, high-frequency movements, targeted preprocessing techniques improved their performance, and their explanations for predictions showed high faithfulness to expert reasoning. AI
Summary written by gemini-2.5-flash-lite from 2 sources. How we write summaries →
IMPACT Demonstrates potential for adapting general-purpose MLLMs for specialized clinical video analysis, offering a path toward interpretable diagnostic assistance.
RANK_REASON This is a research paper published on arXiv evaluating the capabilities of existing models.