A new paper evaluates the performance of on-premises, open-weight Large Language Models (LLMs) on Text-to-SQL tasks using the BIRD benchmark. The study found that newer model generations, such as Qwen2.5-Coder and Llama-3.x, significantly outperform older models like CodeLlama-Instruct at comparable sizes. Key techniques like self-correction showed consistent benefits across model families, while schema linking provided no measurable improvement, and self-consistency offered poor value for its computational cost. AI
IMPACT Provides insights into the practical performance of on-premises LLMs for SQL generation, guiding choices for organizations with data privacy constraints.
RANK_REASON The cluster contains a research paper evaluating LLM performance on a specific task.
- BIRD
- CodeLlama-Instruct
- CodeLlama-Instruct (13B)
- CodeLlama-Instruct (34B)
- CodeLlama-Instruct (7B)
- Llama 3.3-70B
- Llama-3.3 (8B)
- Llama-3.x
- Qwen2.5-Coder
- Qwen2.5 Coder 14B
- qwen2.5-coder:32b
- Qwen2.5-Coder 7B
- Volodymyr Bezkorovainyi
AI-generated summary · Google Gemini · from 2 sources. How we write summaries →