Qwen 3.5 122B leads local VLMs in detecting AI-generated hand errors

By PulseAugur Editorial · [1 sources] · 2026-06-02 17:40

A user tested four local Visual Language Models (VLMs) to determine their effectiveness in detecting poorly generated hands in AI images. Qwen 3.5 122B emerged as the best performer, offering 100% precision with a decent recall, though it occasionally missed subtle anatomical errors. Gemma 4 26B and Qwen3-VL were found to be ineffective, with Gemma rejecting too many images and Qwen3-VL passing most through. AI

IMPACT Identifies a practical application for VLMs in improving AI image generation quality by detecting common errors.

RANK_REASON User-conducted benchmark of existing models for a specific task. [lever_c_demoted from research: ic=1 ai=0.7]

Read on r/StableDiffusion →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

Qwen 3.5 122B leads local VLMs in detecting AI-generated hand errors

COVERAGE [1]

r/StableDiffusion TIER_2 English(EN) · /u/dh7net · 2026-06-02 17:40

I tested 4 local VLMs as "bad hands" detectors. Here's which one works best as a judge

<div class="md">We all know that hands can be hard for small local models, so I tried to find the best way to detect bad hands with my local setup (GX10 Spark). I though any VLM like Gemma would work, but not at all. So I had to test several of the…

COVERAGE [1]

I tested 4 local VLMs as "bad hands" detectors. Here's which one works best as a judge

RELATED ENTITIES

RELATED TOPICS