Researchers have introduced DiscoBench, a new benchmark designed to evaluate the ability of large language model (LLM) powered search agents to handle ambiguous queries. The benchmark includes 211 samples and 463 ambiguity instances across 11 domains, focusing on how agents identify vagueness, ask clarifying questions, and recover from incorrect search paths. Experiments indicate that ambiguity detection and clarification are distinct skills, and that agents often perform worse by repeatedly searching rather than asking for clarification, highlighting a gap in interactive problem-solving capabilities. AI
IMPACT This benchmark could drive improvements in LLM search agents, making them more effective at handling real-world, ambiguous user queries.
RANK_REASON The cluster describes a new benchmark for evaluating LLM search agents, which is a research contribution.
- arXiv
- DiscoBench
- Hugging Face
- large language models
- alphaXiv
- CatalyzeX Code Finder for Papers
- CORE Recommender
- DagsHub
- Gotit.pub
- Influence Flower
- LLMs
- ScienceCast
AI-generated summary · Google Gemini · from 2 sources. How we write summaries →