Look-Closer-Then-Diagnose: Confidence-Aware Ultrasound VQA via Active Zooming
Researchers have developed a new framework for ultrasound image analysis that mimics how sonographers actively zoom into specific regions before making a diagnosis. This "Zoom-then-Diagnose" approach aims to improve the accuracy of Vision-Language Models (VLMs) in medical contexts by enabling lesion-focused reasoning. The system also incorporates an uncertainty-aware reward mechanism to gauge prediction consistency, encouraging caution when ambiguity is present. Experiments on liver, breast, and thyroid datasets showed a significant improvement in lesion localization, indicating the model's enhanced diagnostic capabilities. AI
IMPACT Enhances diagnostic accuracy in medical imaging by enabling models to focus on relevant regions and account for ambiguity.