Abstract
Graph Neural Networks (GNNs) have received significant attention for demonstrating their capability to handle graph data. However, they are difficult to be deployed in resource-limited devices because of model sizes and scalability constraints imposed by the multi-hop data dependency. In addition, real-world graphs usually possess complex structural information and features. Therefore, to improve the applicability of GNNs and fully encode the complicated topological information, Knowledge Distillation on Graphs (KDG) has been introduced to build a smaller but effective model, leading to model compression and performance improvement. Recently, KDG has achieved considerable progress, with many studies proposed. In this survey, we systematically review these works. Specifically, we first introduce the challenges and bases of KDG, then categorize and summarize the existing work of KDG by answering the following three questions: (1) what to distillate, (2) who to whom, and (3) how to distillate. We offer in-depth comparisons and elucidate the strengths and weaknesses of each design. Finally, we share our thoughts on future research directions.
Topics

No keywords indexed for this article. Browse by subject →

References
89
[1]
Zeyuan Allen-Zhu and Yuanzhi Li. 2020. Towards understanding ensemble, knowledge distillation and self-distillation in deep learning. In Proceedings of the International Conference on Learning Representations.
[3]
Antoine Bordes, Nicolas Usunier, Alberto Garcia-Duran, Jason Weston, and Oksana Yakhnenko. 2013. Translating embeddings for modeling multi-relational data. In Proceedings of the Neural Information Processing Systems.
[4]
George Cazenavette, Tongzhou Wang, Antonio Torralba, Alexei A. Efros, and Jun-Yan Zhu. 2022. Dataset distillation by matching training trajectories. In Proceedings of the Computer Vision and Pattern Recognition.
[5]
Tianlong Chen, Yongduo Sui, Xuxi Chen, Aston Zhang, and Zhangyang Wang. 2021. A unified lottery ticket hypothesis for graph neural networks. In Proceedings of the International conference on machine learning.
[6]
Yuzhao Chen, Yatao Bian, Xi Xiao, Yu Rong, Tingyang Xu, and Junzhou Huang. 2021. On self-distilling graph neural network. In Proceedings of the International Joint Conference on Artificial Intelligence.
[7]
Graph Neural Network-Based Anomaly Detection in Multivariate Time Series

Ailin Deng, Bryan Hooi

Proceedings of the AAAI Conference on Artificial I... 10.1609/aaai.v35i5.16523
[8]
Xiang Deng and Zhongfei Zhang. 2021. Graph-free knowledge distillation for graph neural networks. In Proceedings of the International Joint Conference on Artificial Intelligence.
[9]
Qianggang Ding Sifan Wu Hao Sun Jiadong Guo and Shu-Tao Xia. 2019. Adaptive regularization of labels. arXiv:1908.05474. Retrieved from https://arxiv.org/abs/1908.05474
[11]
Kaituo Feng, Changsheng Li, Ye Yuan, and Guoren Wang. 2022. FreeKD: Free-direction knowledge distillation for graph neural networks. In Proceedings of the Knowledge Discovery and Data Mining.
[13]
Knowledge Distillation: A Survey

Jianping Gou, Baosheng Yu, Stephen J. Maybank et al.

International Journal of Computer Vision 10.1007/s11263-021-01453-z
[14]
Jiongyu Guo, Defang Chen, and Can Wang. 2022. Alignahead: Online cross-layer knowledge extraction on graph neural networks. In Proceedings of the IEEE International Joint Conference on Neural Network.
[15]
Z. Guo, W. Shiao, S. Zhang, Y. Liu, N. V. Chawla, N. Shah, and T. Zhao. 2023. Linkless link prediction via relational distillation. In International Conference on Machine Learning.12012–12033. PMLR.
[16]
Zhichun Guo, Chunhui Zhang, Yujie Fan, Yijun Tian, Chuxu Zhang, and Nitesh Chawla. 2023. Boosting graph neural networks via adaptive knowledge distillation. In Proceedings of the AAAI Conference on Artificial Intelligence.
[17]
William L. Hamilton, Zhitao Ying, and Jure Leskovec. 2017. Inductive representation learning on large graphs. In Proceedings of the Neural Information Processing Systems.
[18]
Anselm Haselhoff, Jan Kronenberger, Fabian Kuppers, and Jonas Schneider. 2021. Towards black-box explainability with gaussian discriminant knowledge distillation. In Proceedings of the Computer Vision and Pattern Recognition.
[19]
Huarui He, Jie Wang, Zhanqiu Zhang, and Feng Wu. 2022. Compressing deep graph neural networks via adversarial knowledge distillation. In Proceedings of the Knowledge Discovery and Data Mining.
[20]
Ruifei He, Shuyang Sun, Jihan Yang, Song Bai, and Xiaojuan Qi. 2022. Knowledge distillation as efficient pre-training: Faster convergence, higher data-efficiency, and better transferability. In Proceedings of the Computer Vision and Pattern Recognition.
[21]
Byeongho Heo, Minsik Lee, Sangdoo Yun, and Jin Young Choi. 2019. Knowledge transfer via distillation of activation boundaries formed by hidden neurons. In Proceedings of the AAAI Conference on Artificial Intelligence.
[22]
Geoffrey Hinton Oriol Vinyals Jeff Dean et al. 2015. Distilling the knowledge in a neural network. arXiv:1503.02531. Retrieved from https://arxiv.org/abs/1503.02531
[23]
Weihua Hu, Matthias Fey, Marinka Zitnik, Yuxiao Dong, Hongyu Ren, Bowen Liu, Michele Catasta, and Jure Leskovec. 2020. Open graph benchmark: Datasets for machine learning on graphs. In Proceedings of the Neural Information Processing Systems.
[24]
Weihua Hu, Bowen Liu, Joseph Gomes, Marinka Zitnik, Percy Liang, Vijay Pande, and Jure Leskovec. 2020. Strategies for pre-training graph neural networks. In Proceedings of the International Conference on Learning Representations.
[25]
Cuiying Huo, Di Jin, Yawen Li, Dongxiao He, Yu-Bin Yang, and Lingfei Wu. 2023. T2-GNN: Graph neural networks for graphs with incomplete features and structure via teacher-student distillation. In Proceedings of the AAAI Conference on Artificial Intelligence.
[26]
Mohammad Izadi Mehran Safayani and Abdolreza Mirzaei. 2024. Knowledge distillation on spatial-temporal graph convolutional network for traffic prediction. arXiv:2401.11798. Retrieved from https://arxiv.org/abs/2401.11798
[27]
Zhihao Jia, Sina Lin, Rex Ying, Jiaxuan You, Jure Leskovec, and Alex Aiken. 2020. Redundancy-free computation for graph neural networks. In Proceedings of the Knowledge Discovery & Data Mining.
[28]
Graph neural network for traffic forecasting: A survey

Weiwei Jiang, Jiayun Luo

Expert Systems with Applications 10.1016/j.eswa.2022.117921
[29]
Wei Jin, Yao Ma, Xiaorui Liu, Xianfeng Tang, Suhang Wang, and Jiliang Tang. 2020. Graph structure learning for robust graph neural networks. In Proceedings of the Knowledge Discovery and Data Mining.
[30]
Wei Jin, Lingxiao Zhao, Shichang Zhang, Yozen Liu, Jiliang Tang, and Neil Shah. 2021. Graph condensation for graph neural networks. In Proceedings of the International Conference on Learning Representations.
[32]
Seung Wook Kim and Hyo-Eun Kim. 2017. Transferring knowledge to smaller network with class-distance loss. In Proceedings of the International Conference on Learning Representations Workshop.
[33]
Thomas N. Kipf and Max Welling. 2017. Semi-supervised classification with graph convolutional networks. In Proceedings of the International Conference on Learning Representations.
[34]
Ziyi Kou, Yijun Tian, Meng Jiang, and Xiangliang Zhang. 2024. FaDE: A face segment driven identity anonymization framework for fair face recognition. In Conference on Information and Knowledge Management.
[36]
Guohao Li, Matthias Muller, Ali Thabet, and Bernard Ghanem. 2019. Deepgcns: Can gcns go as deep as cnns?. In ICCV.
[39]
Yifan Liu, Ke Chen, Chris Liu, Zengchang Qin, Zhenbo Luo, and Jingdong Wang. 2019. Structured knowledge distillation for semantic segmentation. In Proceedings of the computer vision and pattern recognition.
[40]
Rafael Muller, Simon Kornblith, and Geoffrey E Hinton. 2019. When does label smoothing help?. In Proceedings of the Neural Information Processing Systems.
[42]
Mary Phuong and Christoph Lampert. 2019. Towards understanding knowledge distillation. In Proceedings of the International conference on machine learning.
[43]
Yiyue Qian, Yiming Zhang, Yanfang Ye, and Chuxu Zhang. 2021. Distilling meta knowledge on heterogeneous graph for illicit drug trafficker detection on social media. In Proceedings of the Neural Information Processing Systems.
[46]
Saed Rezayi, Handong Zhao, Sungchul Kim, Ryan A Rossi, Nedim Lipka, and Sheng Li. 2021. Edge: Enriching knowledge graph embeddings with external text. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies.
[47]
Luana Ruiz, Luiz Chamon, and Alejandro Ribeiro. 2020. Graphon neural networks and the transferability of graph neural networks. In Proceedings of the Neural Information Processing Systems.
[49]
Siqi Sun, Yu Cheng, Zhe Gan, and Jingjing Liu. 2019. Patient knowledge distillation for bert model compression. In Proceedings of the Conference on Empirical Methods in Natural Language Processing.
[50]
Yijun Tian. 2024. Knowledge-centric Machine Learning on Graphs. Ph. D. Dissertation. University of Notre Dame.

Showing 50 of 89 references

Metrics
37
Citations
89
References
Details
Published
Mar 05, 2025
Vol/Issue
57(8)
Pages
1-16
License
View
Cite This Article
Yijun Tian, Shichao Pei, Xiangliang Zhang, et al. (2025). Knowledge Distillation on Graphs: A Survey. ACM Computing Surveys, 57(8), 1-16. https://doi.org/10.1145/3711121
Related

You May Also Like

Data clustering

A. K. Jain, M. N. Murty · 1999

9,568 citations

Anomaly detection

Varun Chandola, Arindam Banerjee · 2009

8,799 citations

Machine learning in automated text categorization

Fabrizio Sebastiani · 2002

5,027 citations

Object tracking

Alper Yilmaz, Omar Javed · 2006

3,632 citations

A Survey on Bias and Fairness in Machine Learning

Ninareh Mehrabi, Fred Morstatter · 2021

3,466 citations