Reliable Text-to-SQL with Adaptive Abstention

Kaiwen Chen; Yueting Chen; Nick Koudas; Xiaohui Yu

doi:10.1145/3709719

journal article Feb 10, 2025

Reliable Text-to-SQL with Adaptive Abstention

Kaiwen Chen

Yueting Chen

Nick Koudas

Xiaohui Yu

Proceedings of the ACM on Management of Data Vol. 3 No. 1 pp. 1-30 · Association for Computing Machinery (ACM)

View at Publisher Save 10.1145/3709719

Abstract

Large language models (LLMs) have revolutionized natural language interfaces for databases, particularly in text-to-SQL conversion. However, current approaches often generate unreliable outputs when faced with ambiguity or insufficient context.
We present Reliable Text-to-SQL (RTS), a novel framework that enhances query generation reliability by incorporating abstention and human-in-the-loop mechanisms. RTS focuses on the critical schema linking phase, which aims to identify the key database elements needed for generating SQL queries. It autonomously detects potential errors during the answer generation process and responds by either abstaining or engaging in user interaction. A vital component of RTS is the Branching Point Prediction (BPP) which utilizes statistical conformal techniques on the hidden layers of the LLM model for schema linking, providing probabilistic guarantees on schema linking accuracy.
We validate our approach through comprehensive experiments on the BIRD benchmark, demonstrating significant improvements in robustness and reliability. Our findings highlight the potential of combining transparent-box LLMs with human-in-the-loop processes to create more robust natural language interfaces for databases. For the BIRD benchmark, our approach achieves near-perfect schema linking accuracy, autonomously involving a human when needed. Combined with query generation, we demonstrate that near-perfect schema linking and a small query generation model can almost match SOTA accuracy achieved with a model orders of magnitude larger than the one we use.

Topics

No keywords indexed for this article. Browse by subject →

References

115

[1]

2024. BIRD-SQL Benchmark Website. https://bird-bench.github.io/

[2]

Moloud Abdar Farhad Pourpanah Sadiq Hussain Dana Rezazadegan Li Liu Mohammad Ghavamzadeh Paul Fieguth Xiaochun Cao Abbas Khosravi U Rajendra Acharya et al. 2021. A review of uncertainty quantification in deep learning:: Techniques applications and challenges. (2021). 10.1016/j.inffus.2021.05.008

[3]

Anastasios N. Angelopoulos and Stephen Bates. 2022. A Gentle Introduction to Conformal Prediction and Distribution- Free Uncertainty Quantification. arXiv:2107.07511 [cs.LG] https://arxiv.org/abs/2107.07511

[4]

10.1561/9781638281597

[5]

Anastasios Nikolas Angelopoulos, Stephen Bates, Adam Fisch, Lihua Lei, and Tal Schuster. 2024. Conformal Risk Control. In The Twelfth International Conference on Learning Representations.

[6]

10.18653/v1/2023.genbench-1.3

[7]

10.18653/v1/2023.findings-emnlp.68

[8]

10.1214/23-aos2276

[9]

Rina Foygel Barber, Emmanuel J. Candes, Aaditya Ramdas, and Ryan J. Tibshirani. 2023. Conformal prediction beyond exchangeability. arXiv:2202.13415 [stat.ME] https://arxiv.org/abs/2202.13415

[10]

Luca Beurer-Kellner, Marc Fischer, and Martin Vechev. 2024. Guiding LLMs The Right Way: Fast, Non-Invasive Constrained Generation. In Forty-first International Conference on Machine Learning.

[11]

10.18653/v1/p19-1448

[12]

10.18653/v1/d19-1378

[13]

Tom Brown Benjamin Mann Nick Ryder Melanie Subbiah Jared D Kaplan Prafulla Dhariwal Arvind Neelakantan Pranav Shyam Girish Sastry Amanda Askell et al. 2020. Language models are few-shot learners. Advances in neural information processing systems 33 (2020) 1877--1901.

[14]

10.1109/icde51399.2021.00220

[15]

Ruichu Cai, Jinjie Yuan, Boyan Xu, and Zhifeng Hao. 2021. Sadga: Structure-aware dual graph aggregation network for text-to-sql. Advances in Neural Information Processing Systems 34 (2021), 7664--7676.

[16]

10.18653/v1/2021.acl-long.198

[17]

Shuaichen Chang and Eric Fosler-Lussier. 2023. How to Prompt LLMs for Text-to-SQL: A Study in Zero-shot, Single-domain, and Cross-domain Settings. In NeurIPS 2023 Second Table Representation Learning Workshop.

[18]

10.18653/v1/2023.findings-emnlp.944

[19]

10.1609/aaai.v34i05.6246

[20]

A Survey on Evaluation of Large Language Models

Yupeng Chang, Xu Wang, Jindong Wang et al.

ACM Transactions on Intelligent Systems and Techno... 10.1145/3641289

[21]

Xinyun Chen, Maxwell Lin, Nathanael Schärli, and Denny Zhou. 2024. Teaching Large Language Models to Self-Debug. In The Twelfth International Conference on Learning Representations.

[22]

DongHyun Choi, Myeong Cheol Shin, EungGyun Kim, and Dong Ryeol Shin. 2021. Ryansql: Recursively applying sketch-based slot fillings for complex text-to-sql in cross-domain databases. Computational Linguistics 47, 2 (2021), 309--332.

[23]

DeepSeek-AI. 2024. DeepSeek LLM: Scaling Open-Source Language Models with Longtermism. arXiv preprint arXiv:2401.02954 (2024). https://github.com/deepseek-ai/DeepSeek-LLM

[24]

Calibration of Pre-trained Transformers

Shrey Desai, Greg Durrett

Proceedings of the 2020 Conference on Empirical Me... 10.18653/v1/2020.emnlp-main.21

[25]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).

[26]

10.18653/v1/p16-1004

[27]

10.18653/v1/p18-1068

[28]

10.18653/v1/d19-1543

[29]

Yuankai Fan, Zhenying He, Tonghui Ren, Can Huang, Yinan Jing, Kai Zhang, and X Sean Wang. 2024. Metasql: A generate-then-rank framework for natural language to sql translation. arXiv preprint arXiv:2402.17144 (2024).

[30]

António Farinhas, Chrysoula Zerva, Dennis Ulmer, and André F. T. Martins. 2024. Non-Exchangeable Conformal Risk Control. arXiv:2310.01262 [cs.LG] https://arxiv.org/abs/2310.01262

[31]

10.14778/3583140.3583165

[32]

Yarin Gal and Zoubin Ghahramani. 2016. Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In international conference on machine learning. PMLR, 1050--1059.

[33]

10.18653/v1/2021.findings-emnlp.174

[34]

10.14778/3641204.3641221

[35]

Matteo Gasparin and Aaditya Ramdas. 2024. Merging uncertainty sets via majority vote. arXiv:2401.09379 [stat.ME] https://arxiv.org/abs/2401.09379

[36]

10.1007/s10462-023-10562-9

[37]

10.18653/v1/2024.naacl-long.366

[38]

Patrizio Giovannotti. 2023. Evaluating machine translation quality with conformal predictive distributions. In Conformal and Probabilistic Prediction with Applications. PMLR 413--429.

[39]

Chunxi Guo, Zhiliang Tian, Jintao Tang, Shasha Li, Zhihua Wen, Kaixuan Wang, and Ting Wang. 2023. Retrievalaugmented gpt-3.5-based text-to-sql framework with sample-aware prompting and dynamic revision chain. In International Conference on Neural Information Processing. Springer, 341--356.

[40]

Chunxi Guo, Zhiliang Tian, Jintao Tang, Pancheng Wang, Zhihua Wen, Kang Yang, and Ting Wang. 2023. Prompting GPT-3.5 for Text-to-SQL with De-semanticization and Skeleton Retrieval. In Pacific Rim International Conference on Artificial Intelligence. 262--274.

[41]

10.18653/v1/p19-1444

[42]

Zijin Hong, Zheng Yuan, Hao Chen, Qinggang Zhang, Feiran Huang, and Xiao Huang. 2024. Knowledge-to-sql: Enhancing sql generation with data expert llm. arXiv preprint arXiv:2402.11517 (2024).

[43]

10.18653/v1/2022.findings-acl.99

[44]

Survey of Hallucination in Natural Language Generation

Ziwei Ji, Nayeon Lee, Rita Frieske et al.

ACM Computing Surveys 10.1145/3571730

[45]

10.18653/v1/2023.emnlp-main.574

[46]

10.1162/tacl_a_00407

[47]

10.1145/3448016.3457543

[48]

10.1007/s00778-022-00776-8

[49]

Siqi Kou, Lanxiang Hu, Zhezhi He, Zhijie Deng, and Hao Zhang. 2024. CLLMs: Consistency Large Language Models. In Forty-first International Conference on Machine Learning.

[50]

Lorenz Kuhn, Yarin Gal, and Sebastian Farquhar. 2023. Semantic Uncertainty: Linguistic Invariances for Uncertainty Estimation in Natural Language Generation. In The Eleventh International Conference on Learning Representations.

Showing 50 of 115 references

Metrics

6

Citations

115

References

Details

Published: Feb 10, 2025
Vol/Issue: 3(1)
Pages: 1-30
License: View

Authors

K

Kaiwen Chen

University of Toronto, Toronto, ON, Canada

Y

Yueting Chen

Seattle University, Seattle, WA, USA

N

Nick Koudas

University of Toronto, Toronto, ON, Canada

X

Xiaohui Yu

York University, Toronto, ON, Canada

Cite This Article

Kaiwen Chen, Yueting Chen, Nick Koudas, et al. (2025). Reliable Text-to-SQL with Adaptive Abstention. Proceedings of the ACM on Management of Data, 3(1), 1-30. https://doi.org/10.1145/3709719

Reliable Text-to-SQL with Adaptive Abstention

You May Also Like