PulseAugur
EN
LIVE 09:51:29

New benchmark TestEvo-Bench evaluates AI agents on code and test co-evolution

Researchers have introduced TestEvo-Bench, a new benchmark designed to evaluate AI agents on their ability to co-evolve tests with code changes. The benchmark includes tasks for generating new tests and updating existing ones, grounded in real commit histories and executable environments. TestEvo-Bench aims to reduce data leakage by periodically mining new tasks, with the current snapshot containing over 1200 tasks from 152 open-source Java projects. AI

IMPACT This benchmark could drive improvements in AI agents' ability to understand and generate code and tests, leading to more robust software development tools.

RANK_REASON The cluster contains an academic paper introducing a new benchmark for AI research. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.AI →

AI-generated summary · Google Gemini · from 2 sources. How we write summaries →

New benchmark TestEvo-Bench evaluates AI agents on code and test co-evolution

COVERAGE [2]

  1. arXiv cs.AI TIER_1 English(EN) · Jiale Amber Wang, Kaiyuan Wang, Pengyu Nie ·

    TestEvo-Bench: An Executable and Live Benchmark for Test and Code Co-Evolution

    arXiv:2607.02469v1 Announce Type: cross Abstract: Software tests and code evolve together: a code change should be followed by new or updated tests that record the new software behavior. Yet existing test generation and update benchmarks often isolate the test from the code chang…

  2. arXiv cs.AI TIER_1 English(EN) · Pengyu Nie ·

    TestEvo-Bench: An Executable and Live Benchmark for Test and Code Co-Evolution

    Software tests and code evolve together: a code change should be followed by new or updated tests that record the new software behavior. Yet existing test generation and update benchmarks often isolate the test from the code change, and rely on static metadata that does not verif…