Auto-Test: Learning Semantic-Domain Constraints for Unsupervised Error Detection in Tables
In this work, we observe that there is an important class of data-quality constraints that we call
Semantic-Domain Constraints,
which can be reliably inferred and automatically applied to
any tables,
without requiring domain-experts to manually specify on a per-table basis. We develop a principled framework to systematically learn such constraints from table corpora using large-scale statistical tests, which can further be distilled into a core set of constraints using our optimization framework, with provable quality guarantees. Extensive evaluations show that this new class of constraints can be used to both (1) directly detect errors on real tables in the wild, and (2) augment existing expert-driven data-cleaning techniques as a new class of complementary constraints.
Our code and data are available at https://github.com/qixuchen/AutoTest for future research.
No keywords indexed for this article. Browse by subject →
Varun Chandola, Arindam Banerjee, Vipin Kumar
Showing 50 of 82 references
- Published
- Jun 17, 2025
- Vol/Issue
- 3(3)
- Pages
- 1-27
You May Also Like
Reham Omar, Ishika Dhall · 2023
43 citations
Ziniu Wu, Parimarjan Negi · 2023
39 citations
Jianyang Gao, Cheng Long · 2024
39 citations
Liana Patel, Peter Kraft · 2024
37 citations
Jiayao Zhang, Qiheng Sun · 2023
34 citations