Brief · PulseAugur

TOOL · 雷峰网 (Leiphone) 中文(ZH) · 1w

Code-Driven Visual Perception: Why "Understanding Code" is the Real Key for Large Models to Conquer STEM Problems | CVPR 2026

Researchers from Shanghai Jiao Tong University and the Qwen team have introduced CodePercept, a novel approach to enhance large language models' visual perception capabilities, particularly for STEM tasks. Their research suggests that improving visual perception, rather than just reasoning, is the key bottleneck for models tackling science and math problems. CodePercept leverages code as a precise language for visual understanding, enabling models to generate executable code that accurately represents image content, thereby overcoming the inherent ambiguity of natural language descriptions. AI

IMPACT This approach could significantly improve LLMs' ability to understand and solve complex STEM problems by enhancing their visual perception through precise code-based representations.

GPT-5
large language models
Shanghai Jiao Tong University
Qwen3-VL-Plus
Claude Opus 4.1
Qwen2.5-VL-72B
Seed 1.6-Vision
ICC-1M
CodePercept
STEM2Code-Eval
Qwen team