PulseAugur
LIVE 03:33:53
research · [2 sources] ·
0
research

New benchmark evaluates LLMs' effectiveness in generating API test cases from requirements

Researchers have introduced RESTestBench, a new benchmark designed to evaluate the effectiveness of Large Language Models (LLMs) in generating test cases for REST APIs from natural language requirements. Traditional metrics are insufficient for these LLM-generated tests, which aim to validate functional behavior. RESTestBench includes three REST services with precise and vague requirement variants, along with a novel mutation testing metric to assess fault detection against specific requirements. AI

Summary written by gemini-2.5-flash-lite from 2 sources. How we write summaries →

IMPACT Provides a new evaluation framework for LLM-generated API tests, potentially improving the reliability of AI-driven software testing.

RANK_REASON The cluster describes a new benchmark and associated research paper published on arXiv.

Read on arXiv cs.AI →

COVERAGE [2]

  1. arXiv cs.AI TIER_1 · Peter Schrammel ·

    RESTestBench: A Benchmark for Evaluating the Effectiveness of LLM-Generated REST API Test Cases from NL Requirements

    Existing REST API testing tools are typically evaluated using code coverage and crash-based fault metrics. However, recent LLM-based approaches increasingly generate tests from NL requirements to validate functional behaviour, making traditional metrics weak proxies for whether g…

  2. Hugging Face Daily Papers TIER_1 ·

    RESTestBench: A Benchmark for Evaluating the Effectiveness of LLM-Generated REST API Test Cases from NL Requirements

    Existing REST API testing tools are typically evaluated using code coverage and crash-based fault metrics. However, recent LLM-based approaches increasingly generate tests from NL requirements to validate functional behaviour, making traditional metrics weak proxies for whether g…