journal article Dec 18, 2024

Connectivity-Oriented Property Graph Partitioning for Distributed Graph Pattern Query Processing

Abstract
Graph pattern query is a powerful tool for extracting crucial information from property graphs. With the exponential growth of sizes, property graphs are typically divided into multiple subgraphs (referred to as
partitions
) and stored across various sites in distributed environments. Existing graph partitioning methods have not been efficiently optimized for pattern queries, resulting in numerous query matches across multiple partitions, called
crossing matches.
Identifying these matches requires much inter-partition communication, which is the primary performance bottleneck in distributed query processing. To address this issue, this paper introduces a novel connectivity-oriented relationship-disjoint partitioning method, namely RCP (Relationship Connectivity Partitioning), aimed at enhancing the efficiency of graph pattern query processing by reducing crossing matches. By employing each weakly connected component of the subgraph, which is induced by different relationship labels, as a basic unit of partition, RCP ensures that matches for both
variable-length path
and
labeled graph pattern
queries are not crossing matches. Here,
variable-length path
and
labeled graph pattern
are two common components in graph pattern queries to identify paths meeting specific label constraints and retrieve subgraphs with consistent relationship types, respectively. Moreover, in the query processing phase, we further demonstrate that all graph pattern queries, belonging to these two basic queries or their extensions, can be executed independently under RCP, thereby avoiding crossing matches. In experiments, we implemented two prototype distributed property graph systems based on Neo4j and JanusGraph, which use declarative and functional query language, respectively. Experimental results on billion-scale datasets demonstrate that our approach brings a performance improvement of nearly two orders of magnitude over state-of-the-art partitioning methods.
Topics

No keywords indexed for this article. Browse by subject →

References
42
[1]
Daniel J. Abadi Adam Marcus Samuel R. Madden and Kate Hollenbach. 2007. Scalable Semantic Web Data Management Using Vertical Partitioning. In VLDB. VLDB Endowment 411--422.
[6]
On random graphs. I.

P. Erdős, A. Rényi

Publicationes Mathematicae Debrecen 10.5486/pmd.1959.6.3-4.12
[9]
Wenfei Fan, Ruiqi Xu, Qiang Yin, Wenyuan Yu, and Jingren Zhou. 2022. Application-Driven Graph Partitioning. The VLDB Journal, Vol. 32, 1 (apr 2022), 149--172.
[12]
Joseph E. Gonzalez, Reynold S. Xin, Ankur Dave, Daniel Crankshaw, Michael J. Franklin, and Ion Stoica. 2014. GraphX: Graph Processing in a Distributed Dataflow Framework. In OSDI. USENIX Association, USA, 599--613.
[14]
Kongzhang Hao, Zhengyi Yang, Longbin Lai, Zhengmin Lai, Xin Jin, and Xuemin Lin. 2019. PatMat: A Distributed Pattern Matching Engine with Cypher. In CIKM. Association for Computing Machinery, New York, NY, USA, 2921--2924.
[15]
JanusGraph. 2023. Distributed open source massively scalable graph database. http://janusgraph.org/
[22]
Claudio Martella, Dionysios Logothetis, Andreas Loukas, and Georgos Siganos. 2017. Spinner: Scalable Graph Partitioning in the Cloud. In 2017 IEEE 33rd International Conference on Data Engineering (ICDE). 1083--1094.
[24]
Neo4j. 2024. The world's leading graph database as a fully-managed cloud service - zero-admin globally available and always-on. https://neo4j.com/
[25]
Haohan Pang Peng Gan Pingpeng Yuan Hai Jin and Qiangsheng Hua. 2019. Partitioning Large-Scale Property Graph for Efficient Distributed Query Processing. In HPCC/SmartCity/DSS. 1643--1650. 10.1109/hpcc/smartcity/dss.2019.00225
[26]
Peng Peng, M. Tamer Özsu, Lei Zou, Cen Yan, and Chengjun Liu. 2022. MPC: Minimum Property-Cut RDF Graph Partitioning. In 2022 IEEE 38th International Conference on Data Engineering (ICDE). 192--204.
[29]
Amitabha Roy, Laurent Bindschaedler, Jasmina Malicevic, and Willy Zwaenepoel. 2015. Chaos: Scale-out Graph Processing from Secondary Storage. In SOSP. Association for Computing Machinery, New York, NY, USA, 410--424.
[31]
Alexander Schätzle, Martin Przyjaciel-Zablocki, Antony Neu, and Georg Lausen. 2014. Sempala: Interactive SPARQL Query Processing on Hadoop. In ISWC. Springer-Verlag, Berlin, Heidelberg, 164--179.
[32]
Alexander Schätzle, Martin Przyjaciel-Zablocki, Simon Skilevic, and Georg Lausen. 2016. S2RDF: RDF Querying with SPARQL on Spark. Proc. VLDB Endow., Vol. 9, 10 (jun 2016), 804--815.
[33]
Bin Shao, Haixun Wang, and Yatao Li. 2013. Trinity: A Distributed Graph Engine on a Memory Cloud. In SIGMOD. Association for Computing Machinery, New York, NY, USA, 505--516.
[34]
Claus Stadler, Gezim Sejdiu, Damien Graux, and Jens Lehmann. 2019. Sparklify: A Scalable Software Component for Efficient Evaluation of SPARQL Queries over Distributed RDF Datasets. In ISWC. Springer-Verlag, Berlin, Heidelberg, 293--308.
[35]
Wen Sun, Achille Fokoue, Kavitha Srinivas, Anastasios Kementsietsidis, Gang Hu, and Guotong Xie. 2015. SQLGraph: An Efficient Relational-Based Property Graph Store. In SIGMOD. Association for Computing Machinery, New York, NY, USA, 1887--1901.
[36]
Min Wu, Xinglu Yi, Hui Yu, Yu Liu, and Yujue Wang. 2022. Nebula Graph: An open source distributed graph database. arxiv: 2206.07278 [cs.DB]
[37]
Cong Xie, Ling Yan, Wu-Jun Li, and Zhihua Zhang. 2014. Distributed Power-Law Graph Computing: Theoretical and Empirical Analysis. In NIPS. MIT Press, Cambridge, MA, USA, 1673--1681.
[40]
Chenzi Zhang, Fan Wei, Qin Liu, Zhihao Gavin Tang, and Zhenguo Li. 2017. Graph Edge Partitioning via Neighborhood Heuristic. In SIGKDD. Association for Computing Machinery, New York, NY, USA, 605--614.
Metrics
2
Citations
42
References
Details
Published
Dec 18, 2024
Vol/Issue
2(6)
Pages
1-26
License
View
Funding
the National Natural Science Foundation of China Award: 62472156, U23A20317, 62172146
the Natural Science Foundation of Hunan Province Award: 2023JJ10016
the Creative Research Groups Program of the National Natural Science Foundation of China Award: 62321003
the Key R\&D Program of Hunan Province Award: 2023GK2002
Cite This Article
Min Shi, Peng Peng, Xu Zhou, et al. (2024). Connectivity-Oriented Property Graph Partitioning for Distributed Graph Pattern Query Processing. Proceedings of the ACM on Management of Data, 2(6), 1-26. https://doi.org/10.1145/3698804