Researchers have developed a data curation pipeline called Wonda to improve the training of Small Language Models (SLMs) for program verification. This pipeline normalizes raw verifier output and uses LLMs to rewrite and augment invariants, ensuring provable quality. Fine-tuning SLMs like Qwen3, Llama-3.1, and Mistral AI on Wonda-curated data significantly boosts invariant correctness and speedup rates. Notably, a 4B Qwen3 model achieved performance comparable to much larger models like GPT-OSS-120B and even matched the verification time of GPT-5.2 on the InvBench suite. AI
IMPACT This research could accelerate the development and adoption of smaller, more efficient language models for specialized tasks like program verification.
RANK_REASON The cluster contains an academic paper detailing a new method for data curation to improve SLM performance on program verification tasks. [lever_c_demoted from research: ic=1 ai=1.0]
- GitHub
- GPT-5.2
- GPT-OSS-120B
- Guy Katz
- InvBench
- Llama-3.1
- Mistral AI
- Qwen3
- Small Language Models
- Wonda
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →