Mind the Gap: Diagnosing Constraint Discovery Failures in Text-in-Image Editing
A new research paper titled "Mind the Gap: Diagnosing Constraint Discovery Failures in Text-in-Image Editing" explores the challenges multimodal large language models (MLLMs) face in identifying relevant visual dependencies for specific tasks. The study found that MLLMs only achieve 46% recall when unguided, but this improves to 94% when constraints are explicitly provided. The research suggests that providing case-specific causal explanations is more effective than region names or type labels for improving constraint discovery, and highlights the need for precision-aware elicitation to avoid false positives. AI