Dafny
PulseAugur coverage of Dafny — every cluster mentioning Dafny across labs, papers, and developer communities, ranked by signal.
3 day(s) with sentiment data
-
New benchmark MINIF2F-DAFNY tests LLMs for mathematical theorem proving
Researchers have developed MINIF2F-DAFNY, a new benchmark for evaluating Large Language Models (LLMs) in mathematical theorem proving. This system translates the miniF2F benchmark to Dafny, an auto-active verifier, enab…
-
New benchmark reveals AI struggles with verified code generation
A new benchmark called AlgoVeri has been developed to evaluate the performance of AI models in generating formally verified code for classical algorithms. The benchmark tests models across three languages: Dafny, Verus,…
-
AI models improve code generation with new verification techniques
Researchers have developed new methods to improve the ability of large language models to generate correct code and proofs. One approach, TTRL-CoCoV, uses confidence-conditioned verification to enhance coverage and accu…
-
Researchers develop graph construction for imperative programs using neural methods
Researchers have developed a pipeline to convert imperative programs and their annotations into typed, attributed graphs. This process combines abstract syntax tree parsing with semantic embeddings from models like Sent…
-
SEVerA framework verifies self-evolving AI agents for safety and correctness
Researchers have introduced SEVerA, a framework designed to synthesize self-evolving AI agents with formal safety and correctness guarantees. This approach treats agentic code generation as a constrained learning proble…
-
AI models achieve high verification success with formal code generation
Researchers have developed a new dataset, NL2VC-60, containing 60 algorithmic problems to aid in generating verified code from natural language. They evaluated seven open-weight LLMs using various prompting strategies, …