新基准评估LLM根据需求生成API测试用例的有效性

作者 PulseAugur 编辑部 · [2 个来源] · 2026-04-28 16:59

研究人员推出了RESTestBench，一个旨在评估大型语言模型（LLM）根据自然语言需求为REST API生成测试用例的有效性的新基准。传统的指标不足以评估这些LLM生成的旨在验证功能行为的测试。RESTestBench包含三个具有精确和模糊需求变体的REST服务，以及一种新颖的变异测试指标，用于评估针对特定需求的故障检测能力。 AI

影响为LLM生成的API测试提供了一个新的评估框架，有可能提高AI驱动的软件测试的可靠性。

排序理由该集群描述了在arXiv上发布的一个新基准和相关的研究论文。

在 arXiv cs.AI 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。我们如何撰写摘要 →

报道来源 [2]

arXiv cs.AI TIER_1 English(EN) · Peter Schrammel · 2026-04-28 16:59

RESTestBench: A Benchmark for Evaluating the Effectiveness of LLM-Generated REST API Test Cases from NL Requirements

Existing REST API testing tools are typically evaluated using code coverage and crash-based fault metrics. However, recent LLM-based approaches increasingly generate tests from NL requirements to validate functional behaviour, making traditional metrics weak proxies for whether g…
Hugging Face Daily Papers TIER_1 English(EN) · 2026-04-28 16:59

RESTestBench: A Benchmark for Evaluating the Effectiveness of LLM-Generated REST API Test Cases from NL Requirements

Existing REST API testing tools are typically evaluated using code coverage and crash-based fault metrics. However, recent LLM-based approaches increasingly generate tests from NL requirements to validate functional behaviour, making traditional metrics weak proxies for whether g…

报道来源 [2]

RESTestBench: A Benchmark for Evaluating the Effectiveness of LLM-Generated REST API Test Cases from NL Requirements

RESTestBench: A Benchmark for Evaluating the Effectiveness of LLM-Generated REST API Test Cases from NL Requirements

相关实体

相关话题