AutoBaxBuilder: Bootstrapping Code Security Benchmarking
Researchers have developed AutoBaxBuilder, an automated pipeline designed to generate code security benchmarks for large language models. This system uses LLMs to create functional tests and security exploits, significantly reducing the manual effort and cost typically required for benchmark creation. The generated benchmark, AutoBaxBench, has been released publicly and evaluated on current LLMs, demonstrating a substantial reduction in human effort by a factor of 12. AI
IMPACT Automates the creation of security benchmarks for LLM-generated code, enabling more rigorous testing and faster iteration.