Brief · PulseAugur

TOOL · arXiv cs.AI English(EN) · 12h

COGNITION: From Evaluation to Defense against Multimodal LLM CAPTCHA Solvers

A new research paper details how multimodal large language models (MLLMs) can effectively solve visual CAPTCHAs, posing a significant security risk. The study evaluated seven MLLMs across 18 CAPTCHA types, finding that current models can solve many recognition-oriented and low-interaction CAPTCHAs with human-like cost and speed. Researchers propose defense strategies, including incorporating fine-grained localization and implicit counting, which reduced MLLM success rates from over 95% to 0% on a hardened CAPTCHA type. The paper emphasizes the urgent need to redesign CAPTCHAs as MLLM capabilities advance. AI

MLLMs
multimodal large language models
CAPTCHA
Junyu Wang