Self-Distillation Policy Optimization via Visual Feedback: Bridging Code and Visual Artifacts
Researchers have developed a new self-distillation policy optimization framework called Visual-SDPO, designed to improve code-generating large language models. This method uses visual feedback from rendered outputs, such as charts or web pages, to guide the model. By pinpointing specific code segments responsible for visual defects, the system enhances the model's ability to produce visually accurate artifacts, outperforming existing methods by over 10 points on benchmarks. AI
IMPACT Enhances LLM capabilities in generating visually accurate code, potentially improving tools for data visualization and web development.