Code-Driven Visual Perception: Why "Understanding Code" is the Real Key for Large Models to Conquer STEM Problems | CVPR 2026
Researchers from Shanghai Jiao Tong University and the Qwen team have introduced CodePercept, a novel approach to enhance large language models' visual perception capabilities, particularly for STEM tasks. Their research suggests that improving visual perception, rather than just reasoning, is the key bottleneck for models tackling science and math problems. CodePercept leverages code as a precise language for visual understanding, enabling models to generate executable code that accurately represents image content, thereby overcoming the inherent ambiguity of natural language descriptions. AI
IMPACT This approach could significantly improve LLMs' ability to understand and solve complex STEM problems by enhancing their visual perception through precise code-based representations.