PulseAugur
EN
LIVE 20:45:28

New EntSQL benchmark tests Text-to-SQL in enterprise knowledge

Researchers have introduced EntSQL, a new benchmark designed to evaluate Text-to-SQL capabilities in enterprise settings. Unlike previous benchmarks, EntSQL focuses on grounding SQL generation in long-context, proprietary business documents. The benchmark includes 1,066 aligned Chinese-English examples across five business domains, many of which require knowledge beyond the immediate question and schema. Current systems struggle with this task, with the best performing model achieving only 15.9% accuracy on English inputs when provided with long-form documents. AI

IMPACT Highlights the challenge of applying LLMs to enterprise-specific data, potentially driving development of more context-aware Text-to-SQL systems.

RANK_REASON The cluster describes a new academic benchmark for evaluating AI capabilities.

Read on arXiv cs.CL →

AI-generated summary · Google Gemini · from 2 sources. How we write summaries →

COVERAGE [2]

  1. arXiv cs.CL TIER_1 English(EN) · Chengxi Liao, Tao Xu, Zulong Chen, Chuanfei Xu, Yiyan Wang, Xinyun Wang, Yanlong Zhang, Xiaojun Chen, Zhibo Yang, Zeyi Wen ·

    EntSQL: A Benchmark for Grounding Text-to-SQL in Long-Context Enterprise Knowledge

    arXiv:2606.03363v1 Announce Type: new Abstract: Text-to-SQL enables natural language access to databases, and recent LLMs have substantially advanced its capabilities. Existing benchmarks such as Spider, BIRD, and Spider~2.0 evaluate schema generalization, large-scale databases, …

  2. arXiv cs.CL TIER_1 English(EN) · Zeyi Wen ·

    EntSQL: A Benchmark for Grounding Text-to-SQL in Long-Context Enterprise Knowledge

    Text-to-SQL enables natural language access to databases, and recent LLMs have substantially advanced its capabilities. Existing benchmarks such as Spider, BIRD, and Spider~2.0 evaluate schema generalization, large-scale databases, and realistic workflows, but largely overlook en…