Abstract
Estimating the output size of a query is a fundamental yet longstanding problem in database query processing. Traditional cardinality estimators used by database systems can routinely underestimate the true output size by orders of magnitude, which leads to significant system performance penalty. Recently, upper bounds have been proposed that are based on information inequalities and incorporate sizes and max-degrees from input relations, yet their main benefit is limited to cyclic queries, because they degenerate to rather trivial formulas on acyclic queries.

We introduce a significant extension of the upper bounds, by incorporating l
p
-norms of the degree sequences of join attributes. Our bounds are significantly lower than previously known bounds, even when applied to acyclic queries. These bounds are also based on information theory, they come with a matching query evaluation algorithm, are computable in exponential time in the query size, and are provably tight when all degrees are ''simple''.
Topics

No keywords indexed for this article. Browse by subject →

References
26
[5]
Size Bounds and Query Plans for Relational Joins

Albert Atserias, Martin Grohe, Daniel Marx

SIAM Journal on Computing 10.1137/110859440
[6]
Convex Optimization

Stephen Boyd, Lieven Vandenberghe

10.1017/cbo9780511804441
[13]
Size and Treewidth Bounds for Conjunctive Queries

Georg Gottlob, Stephanie Tien Lee, Gregory Valiant et al.

Journal of the ACM 10.1145/2220357.2220363
[15]
Axel Hertzschuch, Claudio Hartmann, Dirk Habich, and Wolfgang Lehner. 2021. Simplicity Done Right for Join Ordering. In CIDR 2021. www.cidrdb.org. http://cidrdb.org/cidr2021/papers/cidr2021_paper01.pdf
[16]
Sai Vikneshwar Mani Jayaraman, Corey Ropell, and Atri Rudra. 2021. Worst-case Optimal Binary Join Algorithms under General $ell_p$ Constraints. CoRR , Vol. abs/2112.01003 (2021). https://arxiv.org/abs/2112.01003
[22]
Jure Leskovec and Andrej Krevl. 2014. SNAP Datasets: Stanford Large Network Dataset Collection. http://snap.stanford.edu/data.
[25]
Raghu Ramakrishnan and Johannes Gehrke. 2003. Database management systems (3. ed.). McGraw-Hill.
[26]
Doron Zeilberger. 1984. A combinatorial proof of Newton's identities. Discrete mathematics, Vol. 49, 3 (1984).
Cited By
6
Proceedings of the ACM on Managemen...
Metrics
6
Citations
26
References
Details
Published
May 10, 2024
Vol/Issue
2(2)
Pages
1-24
License
View
Funding
NSF-BSF Award: 2109922
NSF-IIS Award: 2314527
NSF-SHF Award: 2312195
Simons Program on Logic and Algorithms in Databases and AI
Cite This Article
Mahmoud Abo Khamis, Vasileios Nakos, Dan Olteanu, et al. (2024). Join Size Bounds using l p -Norms on Degree Sequences. Proceedings of the ACM on Management of Data, 2(2), 1-24. https://doi.org/10.1145/3651597