New retrieval system enhances text-to-SQL accuracy with catalog metadata

By PulseAugur Editorial · [1 sources] · 2026-06-30 04:00

Researchers have developed a new retrieval system called Schema-First Retrieval designed to improve the accuracy of text-to-SQL systems. This system embeds catalog metadata rather than raw warehouse data, indexing five types of catalog objects: tables, columns, metrics, relationships, and query history. By employing parallel vector search, lineage expansion, cross-encoder reranking, workload memory, and access-control gates, the system aims to provide more relevant schema context before SQL generation. Evaluations on datasets like CRUSH4SQL and BIRD demonstrated significant improvements in table recall and a substantial reduction in SQL execution errors. AI

IMPACT This approach could significantly improve the reliability and usability of natural language interfaces for data analytics.

RANK_REASON The cluster contains a research paper detailing a novel technical approach. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.AI →

paper
infra

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

New retrieval system enhances text-to-SQL accuracy with catalog metadata

COVERAGE [1]

arXiv cs.AI TIER_1 English(EN) · Adarsh Agrawal, Shashank Indukuri · 2026-06-30 04:00

Schema-First Retrieval: Embedding Catalogs for Natural Language Analytics

arXiv:2606.28387v1 Announce Type: cross Abstract: Enterprise text-to-SQL systems often fail before SQL is generated: the model receives the wrong schema context. Modern warehouses contain thousands of tables, abbreviated columns, informal metrics, hidden join conventions, and per…

COVERAGE [1]

Schema-First Retrieval: Embedding Catalogs for Natural Language Analytics

RELATED ENTITIES

RELATED TOPICS