Brief · PulseAugur

TOOL · arXiv cs.AI English(EN) · 6h

AlgoVeri: An Aligned Benchmark for Verified Code Generation on Classical Algorithms

A new benchmark called AlgoVeri has been developed to evaluate the performance of AI models in generating formally verified code for classical algorithms. The benchmark tests models across three languages: Dafny, Verus, and Lean, revealing significant capability gaps. While Gemini-3 Flash shows moderate success in Dafny, its performance drops considerably in Verus and Lean, highlighting challenges with memory constraints and explicit proof construction. AI

IMPACT Highlights limitations in current AI models for generating formally verified code, suggesting areas for future research and development in formal verification tools.

Gemini-3 Flash
Lean
Dafny
Verus
AlgoVeri
Haoyu Zhao