How Far Do On-Prem Open LLMs Get on Text-to-SQL? A Cross-Family Size x Technique Frontier on BIRD

arXiv:2606.29733v1 Announce Type: cross Abstract: Organizations that cannot send data to a cloud API increasingly ask: how good is Text-to-SQL if the model must run on-premises on open weights, and which popular accuracy "recipes" are worth their compute? We answer with an honest, fully reproducible benchmark on the BIRD development split (n=1534, Execution Accuracy), evaluating three open model families across two generations -- Qwen2.5-Coder (7B/14B/32B), CodeLlama-Instruct (7B/13B/34B), and Llama-3.x (8B, 70B) -- under one matched protocol, ablating a model-agnostic recipe (schema linking,
The proliferation of open-source LLMs combined with increasing data privacy concerns is driving the immediate need to evaluate their performance in constrained environments.
This research provides crucial benchmarks for organizations needing to deploy AI models on-premises, directly impacting their ability to leverage advanced AI without relying on cloud solutions or risking sensitive data.
The understanding of which open-source LLMs and techniques are most effective for Text-to-SQL tasks in on-premise settings is now significantly clearer, influencing deployment strategies.
- · Organizations with strict data privacy requirements
- · Open-source LLM developers
- · AI data security solution providers
- · Cloud AI API providers (for specific use cases)
- · Companies relying solely on proprietary models for sensitive data
Increased adoption of on-premise open LLMs for sensitive enterprise data tasks like Text-to-SQL.
A shift in compute and data architectures towards hybrid or fully on-premise AI deployments for certain industries.
Enhanced data sovereignty and reduced reliance on external cloud providers for critical AI functions across various sectors.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.LG