Researchers have developed a new framework to evaluate how well Multimodal Large Language Models (MLLMs) can identify misinformation in Chinese short videos. The study utilized a dataset of 200 videos annotated for deceptive patterns like experimental errors and logical fallacies. Results showed that Gemini-2.5-Pro performed best, achieving a belief score of 71.5, while another model, o3, performed poorly with a score of 35.2. The evaluation also revealed that MLLMs are susceptible to biases, such as those presented by authoritative channel IDs. AI
Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →
IMPACT This research highlights MLLM vulnerabilities to misinformation and biases, suggesting a need for improved robustness in multimodal AI systems.
RANK_REASON This is a research paper introducing a new evaluation framework and dataset for MLLMs.