PulseAugur
EN
LIVE 04:40:18

On-prem LLMs evaluated for Text-to-SQL on BIRD benchmark

A new paper evaluates the performance of on-premises, open-weight Large Language Models (LLMs) on Text-to-SQL tasks using the BIRD benchmark. The study found that newer model generations, such as Qwen2.5-Coder and Llama-3.x, significantly outperform older models like CodeLlama-Instruct at comparable sizes. Key techniques like self-correction showed consistent benefits across model families, while schema linking provided no measurable improvement, and self-consistency offered poor value for its computational cost. AI

IMPACT Provides insights into the practical performance of on-premises LLMs for SQL generation, guiding choices for organizations with data privacy constraints.

RANK_REASON The cluster contains a research paper evaluating LLM performance on a specific task.

Read on arXiv cs.CL →

AI-generated summary · Google Gemini · from 2 sources. How we write summaries →

On-prem LLMs evaluated for Text-to-SQL on BIRD benchmark

COVERAGE [2]

  1. arXiv cs.CL TIER_1 English(EN) · Vladimir Beskorovainyi ·

    How Far Do On-Prem Open LLMs Get on Text-to-SQL? A Cross-Family Size x Technique Frontier on BIRD

    arXiv:2606.29733v1 Announce Type: new Abstract: Organizations that cannot send data to a cloud API increasingly ask: how good is Text-to-SQL if the model must run on-premises on open weights, and which popular accuracy "recipes" are worth their compute? We answer with an honest, …

  2. arXiv cs.CL TIER_1 English(EN) · Vladimir Beskorovainyi ·

    How Far Do On-Prem Open LLMs Get on Text-to-SQL? A Cross-Family Size x Technique Frontier on BIRD

    Organizations that cannot send data to a cloud API increasingly ask: how good is Text-to-SQL if the model must run on-premises on open weights, and which popular accuracy "recipes" are worth their compute? We answer with an honest, fully reproducible benchmark on the BIRD develop…