A new research paper details how multimodal large language models (MLLMs) can effectively solve visual CAPTCHAs, posing a significant security risk. The study evaluated seven MLLMs across 18 CAPTCHA types, finding that current models can solve many recognition-oriented and low-interaction CAPTCHAs with human-like cost and speed. Researchers propose defense strategies, including incorporating fine-grained localization and implicit counting, which reduced MLLM success rates from over 95% to 0% on a hardened CAPTCHA type. The paper emphasizes the urgent need to redesign CAPTCHAs as MLLM capabilities advance. AI
RANK_REASON The cluster contains a research paper detailing new findings and proposed solutions. [lever_c_demoted from research: ic=1 ai=1.0]
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →