Thinking Without Images: Internalizing Visual Manipulation with On-Policy Self-Distillation
Researchers have developed a new self-distillation framework called Imagine-OPD to improve visual reasoning in AI models. This method trains models to "imagine" relevant visual cues rather than relying on external tools for image cropping, reducing inference time and computational cost. Experiments show Imagine-OPD outperforms existing methods on vision-centric benchmarks while being more efficient. AI
IMPACT This approach could lead to more efficient visual reasoning models, reducing computational costs for AI applications that rely on image analysis.