A pilot study explored the use of multimodal large language models (MLLMs) for analyzing pathological movements in seizure videos. The research found that MLLMs, without specific training, outperformed traditional computer vision models on many seizure features, particularly recognizing postural and contextual elements. While MLLMs struggled with subtle, high-frequency movements, targeted preprocessing techniques improved their performance, and their explanations for predictions showed high faithfulness to expert reasoning. AI
影响 Demonstrates potential for adapting general-purpose MLLMs for specialized clinical video analysis, offering a path toward interpretable diagnostic assistance.
排序理由 This is a research paper published on arXiv evaluating the capabilities of existing models.
AI 生成摘要 · Google Gemini · 来自 2 个来源。 我们如何撰写摘要 →