Abstract
Large language models (LLMs) have revolutionized natural language interfaces for databases, particularly in text-to-SQL conversion. However, current approaches often generate unreliable outputs when faced with ambiguity or insufficient context.
We present Reliable Text-to-SQL (RTS), a novel framework that enhances query generation reliability by incorporating abstention and human-in-the-loop mechanisms. RTS focuses on the critical schema linking phase, which aims to identify the key database elements needed for generating SQL queries. It autonomously detects potential errors during the answer generation process and responds by either abstaining or engaging in user interaction. A vital component of RTS is the Branching Point Prediction (BPP) which utilizes statistical conformal techniques on the hidden layers of the LLM model for schema linking, providing probabilistic guarantees on schema linking accuracy.
We validate our approach through comprehensive experiments on the BIRD benchmark, demonstrating significant improvements in robustness and reliability. Our findings highlight the potential of combining transparent-box LLMs with human-in-the-loop processes to create more robust natural language interfaces for databases. For the BIRD benchmark, our approach achieves near-perfect schema linking accuracy, autonomously involving a human when needed. Combined with query generation, we demonstrate that near-perfect schema linking and a small query generation model can almost match SOTA accuracy achieved with a model orders of magnitude larger than the one we use.
Topics

No keywords indexed for this article. Browse by subject →

References
115
[1]
2024. BIRD-SQL Benchmark Website. https://bird-bench.github.io/
[2]
Moloud Abdar Farhad Pourpanah Sadiq Hussain Dana Rezazadegan Li Liu Mohammad Ghavamzadeh Paul Fieguth Xiaochun Cao Abbas Khosravi U Rajendra Acharya et al. 2021. A review of uncertainty quantification in deep learning:: Techniques applications and challenges. (2021). 10.1016/j.inffus.2021.05.008
[3]
Anastasios N. Angelopoulos and Stephen Bates. 2022. A Gentle Introduction to Conformal Prediction and Distribution- Free Uncertainty Quantification. arXiv:2107.07511 [cs.LG] https://arxiv.org/abs/2107.07511
[5]
Anastasios Nikolas Angelopoulos, Stephen Bates, Adam Fisch, Lihua Lei, and Tal Schuster. 2024. Conformal Risk Control. In The Twelfth International Conference on Learning Representations.
[9]
Rina Foygel Barber, Emmanuel J. Candes, Aaditya Ramdas, and Ryan J. Tibshirani. 2023. Conformal prediction beyond exchangeability. arXiv:2202.13415 [stat.ME] https://arxiv.org/abs/2202.13415
[10]
Luca Beurer-Kellner, Marc Fischer, and Martin Vechev. 2024. Guiding LLMs The Right Way: Fast, Non-Invasive Constrained Generation. In Forty-first International Conference on Machine Learning.
[13]
Tom Brown Benjamin Mann Nick Ryder Melanie Subbiah Jared D Kaplan Prafulla Dhariwal Arvind Neelakantan Pranav Shyam Girish Sastry Amanda Askell et al. 2020. Language models are few-shot learners. Advances in neural information processing systems 33 (2020) 1877--1901.
[15]
Ruichu Cai, Jinjie Yuan, Boyan Xu, and Zhifeng Hao. 2021. Sadga: Structure-aware dual graph aggregation network for text-to-sql. Advances in Neural Information Processing Systems 34 (2021), 7664--7676.
[17]
Shuaichen Chang and Eric Fosler-Lussier. 2023. How to Prompt LLMs for Text-to-SQL: A Study in Zero-shot, Single-domain, and Cross-domain Settings. In NeurIPS 2023 Second Table Representation Learning Workshop.
[20]
A Survey on Evaluation of Large Language Models

Yupeng Chang, Xu Wang, Jindong Wang et al.

ACM Transactions on Intelligent Systems and Techno... 10.1145/3641289
[21]
Xinyun Chen, Maxwell Lin, Nathanael Schärli, and Denny Zhou. 2024. Teaching Large Language Models to Self-Debug. In The Twelfth International Conference on Learning Representations.
[22]
DongHyun Choi, Myeong Cheol Shin, EungGyun Kim, and Dong Ryeol Shin. 2021. Ryansql: Recursively applying sketch-based slot fillings for complex text-to-sql in cross-domain databases. Computational Linguistics 47, 2 (2021), 309--332.
[23]
DeepSeek-AI. 2024. DeepSeek LLM: Scaling Open-Source Language Models with Longtermism. arXiv preprint arXiv:2401.02954 (2024). https://github.com/deepseek-ai/DeepSeek-LLM
[24]
Calibration of Pre-trained Transformers

Shrey Desai, Greg Durrett

Proceedings of the 2020 Conference on Empirical Me... 10.18653/v1/2020.emnlp-main.21
[25]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).
[29]
Yuankai Fan, Zhenying He, Tonghui Ren, Can Huang, Yinan Jing, Kai Zhang, and X Sean Wang. 2024. Metasql: A generate-then-rank framework for natural language to sql translation. arXiv preprint arXiv:2402.17144 (2024).
[30]
António Farinhas, Chrysoula Zerva, Dennis Ulmer, and André F. T. Martins. 2024. Non-Exchangeable Conformal Risk Control. arXiv:2310.01262 [cs.LG] https://arxiv.org/abs/2310.01262
[32]
Yarin Gal and Zoubin Ghahramani. 2016. Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In international conference on machine learning. PMLR, 1050--1059.
[35]
Matteo Gasparin and Aaditya Ramdas. 2024. Merging uncertainty sets via majority vote. arXiv:2401.09379 [stat.ME] https://arxiv.org/abs/2401.09379
[38]
Patrizio Giovannotti. 2023. Evaluating machine translation quality with conformal predictive distributions. In Conformal and Probabilistic Prediction with Applications. PMLR 413--429.
[39]
Chunxi Guo, Zhiliang Tian, Jintao Tang, Shasha Li, Zhihua Wen, Kaixuan Wang, and Ting Wang. 2023. Retrievalaugmented gpt-3.5-based text-to-sql framework with sample-aware prompting and dynamic revision chain. In International Conference on Neural Information Processing. Springer, 341--356.
[40]
Chunxi Guo, Zhiliang Tian, Jintao Tang, Pancheng Wang, Zhihua Wen, Kang Yang, and Ting Wang. 2023. Prompting GPT-3.5 for Text-to-SQL with De-semanticization and Skeleton Retrieval. In Pacific Rim International Conference on Artificial Intelligence. 262--274.
[42]
Zijin Hong, Zheng Yuan, Hao Chen, Qinggang Zhang, Feiran Huang, and Xiao Huang. 2024. Knowledge-to-sql: Enhancing sql generation with data expert llm. arXiv preprint arXiv:2402.11517 (2024).
[44]
Survey of Hallucination in Natural Language Generation

Ziwei Ji, Nayeon Lee, Rita Frieske et al.

ACM Computing Surveys 10.1145/3571730
[49]
Siqi Kou, Lanxiang Hu, Zhezhi He, Zhijie Deng, and Hao Zhang. 2024. CLLMs: Consistency Large Language Models. In Forty-first International Conference on Machine Learning.
[50]
Lorenz Kuhn, Yarin Gal, and Sebastian Farquhar. 2023. Semantic Uncertainty: Linguistic Invariances for Uncertainty Estimation in Natural Language Generation. In The Eleventh International Conference on Learning Representations.

Showing 50 of 115 references

Metrics
6
Citations
115
References
Details
Published
Feb 10, 2025
Vol/Issue
3(1)
Pages
1-30
License
View
Cite This Article
Kaiwen Chen, Yueting Chen, Nick Koudas, et al. (2025). Reliable Text-to-SQL with Adaptive Abstention. Proceedings of the ACM on Management of Data, 3(1), 1-30. https://doi.org/10.1145/3709719