Hugging Face expands voice agent benchmark to 3 domains, 121 tools

By PulseAugur Editorial · [2 sources] · 2026-06-04 12:24

Hugging Face has released EVA-Bench Data 2.0, an expanded benchmark for evaluating voice agents. This new version broadens its scope to three enterprise domains: Airline Customer Service Management, Enterprise IT Service Management, and Healthcare HR Service Delivery. The updated dataset includes 213 scenarios across 121 tools, a significant increase from its previous iteration, and has been validated against leading models like GPT-5.4, Gemini 3.1 Pro, and Claude Opus 4.6. AI

IMPACT Provides a more comprehensive and realistic evaluation framework for voice agents, pushing development towards better handling of complex enterprise tasks.

RANK_REASON Release of a new version of an evaluation benchmark dataset with expanded scope and scenarios.

Read on Hugging Face Blog →

AI-generated summary · Google Gemini · from 2 sources. How we write summaries →

COVERAGE [2]

Hugging Face Blog TIER_1 English(EN) · 2026-06-04 12:24

EVA-Bench Data 2.0: 3 Domains, 121 Tools, 213 Scenarios
Mastodon — fosstodon.org TIER_1 日本語(JA) · [email protected] · 2026-06-04 15:40

EVA-Bench Data 2.0: 3 Domains, 121 Tools, 213 Scenarios https:// huggingface.co/blog/ServiceNow -AI/eva-bench-data ※AI-generated auto-post (headline + link) # AI # GenerativeAI # LLM # AIGenerated

【EVA-Bench Data 2.0：3つのドメイン、121のツール、213のシナリオ】 https:// huggingface.co/blog/ServiceNow -AI/eva-bench-data ※AI生成の自動投稿（見出し＋リンク） # AI # 生成AI # LLM # AIGenerated

LINKS huggingface.co/…/eva-bench-data

COVERAGE [2]

EVA-Bench Data 2.0: 3 Domains, 121 Tools, 213 Scenarios

EVA-Bench Data 2.0: 3 Domains, 121 Tools, 213 Scenarios https:// huggingface.co/blog/ServiceNow -AI/eva-bench-data ※AI-generated auto-post (headline + link) # AI # GenerativeAI # LLM # AIGenerated

RELATED ENTITIES

RELATED TOPICS