New Ko-WideSearch benchmark reveals web agents struggle with breadth-search tasks

By PulseAugur Editorial · [3 sources] · 2026-06-25 00:00

A new benchmark called Ko-WideSearch has been developed to evaluate the breadth-search capabilities of web agents, focusing on exhaustive set enumeration rather than depth-based question answering. This Korean-language benchmark, constructed via an automated pipeline, comprises 228 tables across 190 entities and sixteen categories. Initial testing with twenty web agents revealed consistent failures in accurately recovering row-level attributes, even when the overall set membership was correctly identified, indicating a significant challenge for current AI systems. AI

IMPACT Highlights a critical gap in web agent capabilities, suggesting current models need improvement in structured data extraction and exhaustive enumeration.

RANK_REASON The cluster describes a new academic benchmark for evaluating AI capabilities, published on arXiv.

Read on Hugging Face Daily Papers →

paper
other

AI-generated summary · Google Gemini · from 3 sources. How we write summaries →

New Ko-WideSearch benchmark reveals web agents struggle with breadth-search tasks

COVERAGE [3]

arXiv cs.CL TIER_1 English(EN) · Minbyul Jeong · 2026-06-29 04:00

Ko-WideSearch: A Korean Breadth-Search Benchmark for Exhaustive Set Enumeration by Web Agents

arXiv:2606.27595v1 Announce Type: new Abstract: Web-agent benchmarks overwhelmingly measure depth -- pinning one obscure answer behind a chain of constraints -- while breadth, exhaustively enumerating a closed set and filling each item's attributes, is barely evaluated, especiall…
arXiv cs.CL TIER_1 English(EN) · Minbyul Jeong · 2026-06-25 22:51

Ko-WideSearch: A Korean Breadth-Search Benchmark for Exhaustive Set Enumeration by Web Agents

Web-agent benchmarks overwhelmingly measure depth -- pinning one obscure answer behind a chain of constraints -- while breadth, exhaustively enumerating a closed set and filling each item's attributes, is barely evaluated, especially outside English. Breadth is also hard to build…
Hugging Face Daily Papers TIER_1 English(EN) · 2026-06-25 00:00

Ko-WideSearch: A Korean Breadth-Search Benchmark for Exhaustive Set Enumeration by Web Agents

A Korean web-agent benchmark evaluates breadth of search capabilities by requiring complete enumeration of entity memberships with attribute tables, revealing consistent failures in row recovery despite accurate set identification.

COVERAGE [3]

Ko-WideSearch: A Korean Breadth-Search Benchmark for Exhaustive Set Enumeration by Web Agents

Ko-WideSearch: A Korean Breadth-Search Benchmark for Exhaustive Set Enumeration by Web Agents

Ko-WideSearch: A Korean Breadth-Search Benchmark for Exhaustive Set Enumeration by Web Agents

RELATED ENTITIES

RELATED TOPICS