PulseAugur
实时 09:10:46
English(EN) How Far Do On-Prem Open LLMs Get on Text-to-SQL? A Cross-Family Size x Technique Frontier on BIRD

评估本地部署大语言模型在BIRD基准上的Text-to-SQL能力

一篇新论文使用BIRD基准评估了本地部署的、开源权重的大语言模型(LLMs)在Text-to-SQL任务上的性能。研究发现,较新的模型一代,如Qwen2.5-Coder和Llama-3.x,在同等规模下显著优于CodeLlama-Instruct等旧模型。诸如自我纠错等关键技术在不同模型家族中均显示出持续的优势,而模式链接(schema linking)未带来可衡量的改进,自洽性(self-consistency)因计算成本高而价值不高。 AI

影响 为本地部署大语言模型在SQL生成方面的实际性能提供了见解,指导了对数据隐私有约束的组织的选择。

排序理由 该集群包含一篇评估大语言模型在特定任务上性能的研究论文。

在 arXiv cs.CL 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。 我们如何撰写摘要 →

评估本地部署大语言模型在BIRD基准上的Text-to-SQL能力

报道来源 [2]

  1. arXiv cs.CL TIER_1 English(EN) · Vladimir Beskorovainyi ·

    How Far Do On-Prem Open LLMs Get on Text-to-SQL? A Cross-Family Size x Technique Frontier on BIRD

    arXiv:2606.29733v1 Announce Type: new Abstract: Organizations that cannot send data to a cloud API increasingly ask: how good is Text-to-SQL if the model must run on-premises on open weights, and which popular accuracy "recipes" are worth their compute? We answer with an honest, …

  2. arXiv cs.CL TIER_1 English(EN) · Vladimir Beskorovainyi ·

    How Far Do On-Prem Open LLMs Get on Text-to-SQL? A Cross-Family Size x Technique Frontier on BIRD

    Organizations that cannot send data to a cloud API increasingly ask: how good is Text-to-SQL if the model must run on-premises on open weights, and which popular accuracy "recipes" are worth their compute? We answer with an honest, fully reproducible benchmark on the BIRD develop…