Querying an astronomical database using large language models: the ALeRCE text-to-SQL system

arXiv:2606.18108v1 Announce Type: cross Abstract: We develop a text-to-SQL (structured query language) system based on large language models (LLMs) using in-context learning and apply it to the Automatic Learning for the Rapid Classification of Events (ALeRCE) astronomical database. ALeRCE is a community broker for the Zwicky Transient Facility and the Vera C. Rubin Observatory. The system enables users to query the database in natural language (NL) and generates executable SQL queries. To develop and evaluate the system, we constructed a dataset of 110 NL/SQL pairs. We propose a step-by-step
The proliferation of advanced large language models has enabled their application to specialized domains like scientific database querying, leveraging their natural language understanding capabilities.
This development demonstrates a practical application of AI agents for accessing complex data repositories, potentially expanding access to scientific information beyond expert users.
Specialized scientific databases can now be queried using natural language, lowering the barrier to entry for researchers and non-domain experts.
- · Astronomers
- · Scientific researchers
- · Large language model developers
- · Data scientists
- · Traditional database query specialists
Increased accessibility and efficiency in querying astronomical and other scientific databases.
Acceleration of scientific discovery through faster data retrieval and analysis, fostering interdisciplinary research.
The development of similar AI-powered interfaces for other complex data systems across various scientific and industrial sectors.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.AI