English(EN) How Far Do On-Prem Open LLMs Get on Text-to-SQL? A Cross-Family Size x Technique Frontier on BIRD

评估本地部署大语言模型在BIRD基准上的Text-to-SQL能力

作者 PulseAugur 编辑部 · [2 个来源] · 2026-06-29 03:15

一篇新论文使用BIRD基准评估了本地部署的、开源权重的大语言模型（LLMs）在Text-to-SQL任务上的性能。研究发现，较新的模型一代，如Qwen2.5-Coder和Llama-3.x，在同等规模下显著优于CodeLlama-Instruct等旧模型。诸如自我纠错等关键技术在不同模型家族中均显示出持续的优势，而模式链接（schema linking）未带来可衡量的改进，自洽性（self-consistency）因计算成本高而价值不高。 AI

影响为本地部署大语言模型在SQL生成方面的实际性能提供了见解，指导了对数据隐私有约束的组织的选择。

排序理由该集群包含一篇评估大语言模型在特定任务上性能的研究论文。

在 arXiv cs.CL 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。我们如何撰写摘要 →

报道来源 [2]

arXiv cs.CL TIER_1 English(EN) · Vladimir Beskorovainyi · 2026-06-30 04:00

How Far Do On-Prem Open LLMs Get on Text-to-SQL? A Cross-Family Size x Technique Frontier on BIRD

arXiv:2606.29733v1 Announce Type: new Abstract: Organizations that cannot send data to a cloud API increasingly ask: how good is Text-to-SQL if the model must run on-premises on open weights, and which popular accuracy "recipes" are worth their compute? We answer with an honest, …
arXiv cs.CL TIER_1 English(EN) · Vladimir Beskorovainyi · 2026-06-29 03:15

How Far Do On-Prem Open LLMs Get on Text-to-SQL? A Cross-Family Size x Technique Frontier on BIRD

Organizations that cannot send data to a cloud API increasingly ask: how good is Text-to-SQL if the model must run on-premises on open weights, and which popular accuracy "recipes" are worth their compute? We answer with an honest, fully reproducible benchmark on the BIRD develop…

报道来源 [2]

How Far Do On-Prem Open LLMs Get on Text-to-SQL? A Cross-Family Size x Technique Frontier on BIRD

How Far Do On-Prem Open LLMs Get on Text-to-SQL? A Cross-Family Size x Technique Frontier on BIRD

相关实体

相关话题