English(EN) Ko-WideSearch: A Korean Breadth-Search Benchmark for Exhaustive Set Enumeration by Web Agents

新的Ko-WideSearch基准显示网络代理在广度搜索任务中存在困难

作者 PulseAugur 编辑部 · [3 个来源] · 2026-06-25 00:00

一个名为Ko-WideSearch的新基准已被开发出来，用于评估网络代理的广度搜索能力，重点关注穷举集枚举而非基于深度的问答。这个韩语基准通过自动化流程构建，包含190个实体和16个类别的228张表格。对20个网络代理的初步测试显示，即使在整体集合成员资格被正确识别的情况下，它们在准确恢复行级属性方面也持续失败，这表明当前AI系统面临重大挑战。 AI

影响突出了网络代理能力的一个关键差距，表明当前模型在结构化数据提取和穷举枚举方面需要改进。

排序理由该集群描述了一个用于评估AI能力的新学术基准，已在arXiv上发布。

在 Hugging Face Daily Papers 阅读 →

AI 生成摘要 · Google Gemini · 来自 3 个来源。我们如何撰写摘要 →

报道来源 [3]

arXiv cs.CL TIER_1 English(EN) · Minbyul Jeong · 2026-06-29 04:00

Ko-WideSearch：面向网络代理穷举集枚举的韩国广度搜索基准

arXiv:2606.27595v1 Announce Type: new Abstract: Web-agent benchmarks overwhelmingly measure depth -- pinning one obscure answer behind a chain of constraints -- while breadth, exhaustively enumerating a closed set and filling each item's attributes, is barely evaluated, especiall…
arXiv cs.CL TIER_1 English(EN) · Minbyul Jeong · 2026-06-25 22:51

Ko-WideSearch: 一个用于网络代理穷举集枚举的韩国广度搜索基准

Web-agent benchmarks overwhelmingly measure depth -- pinning one obscure answer behind a chain of constraints -- while breadth, exhaustively enumerating a closed set and filling each item's attributes, is barely evaluated, especially outside English. Breadth is also hard to build…
Hugging Face Daily Papers TIER_1 English(EN) · 2026-06-25 00:00

Ko-WideSearch: 一个用于网络代理穷举集枚举的韩国广度搜索基准

A Korean web-agent benchmark evaluates breadth of search capabilities by requiring complete enumeration of entity memberships with attribute tables, revealing consistent failures in row recovery despite accurate set identification.

报道来源 [3]

Ko-WideSearch：面向网络代理穷举集枚举的韩国广度搜索基准

Ko-WideSearch: 一个用于网络代理穷举集枚举的韩国广度搜索基准

Ko-WideSearch: 一个用于网络代理穷举集枚举的韩国广度搜索基准

相关实体

相关话题