
arXiv:2605.26394v1 Announce Type: new Abstract: Multi-turn Text-to-SQL is central to enterprise analytics yet remains predominantly evaluated in single-turn settings. We introduce EnterpriseMem-Bench, a multi-turn Text-to-SQL benchmark of 300 sessions and 1,400 turns built programmatically from three enterprise domains (BIRD financial, SEC EDGAR, Northwind), with deterministic ground truth and per-turn memory-critical annotation. We evaluate five frontier models -- GPT-5 mini, GPT-5.2, Claude Sonnet 4.5, Sonnet 4.6, and Opus 4.6 -- across five memory conditions enabling a three-way ablation is
The proliferation of advanced large language models necessitates more complex evaluation benchmarks reflective of real-world enterprise use cases, addressing a current gap in assessment for multi-turn interactions.
Improved multi-turn Text-to-SQL capabilities are critical for enhancing enterprise analytics, making data more accessible and valuable through natural language interfaces, impacting productivity across various sectors.
The introduction of EnterpriseMem-Bench provides a standardized, rigorous evaluation method for memory architectures in multi-turn Text-to-SQL, pushing AI models towards more practical and robust performance in complex business analytics.
- · Enterprise Analytics Software Providers
- · Businesses Adopting Advanced AI Tools
- · AI Model Developers (e.g., OpenAI, Anthropic)
- · Traditional SQL Query Developers
- · Businesses with Inefficient Data Access
Enterprises gain more effective natural language interfaces for database interaction, streamlining analytic workflows.
Increased demand for AI models proficient in multi-turn context understanding, driving further innovation in memory architectures.
The development of truly autonomous 'data agents' that can independently explore and analyze enterprise databases based on complex, evolving user queries.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.CL