Matching the Ideal Pruning Method with Knowledge Distillation for Optimal Compression

Leila Malihi; Gunther Heidemann

doi:10.3390/asi7040056

journal article Open Access Jun 29, 2024

Matching the Ideal Pruning Method with Knowledge Distillation for Optimal Compression

Leila Malihi Gunther Heidemann

Applied System Innovation Vol. 7 No. 4 pp. 56 · MDPI AG

View at Publisher Save 10.3390/asi7040056

Abstract

In recent years, model compression techniques have gained significant attention as a means to reduce the computational and memory requirements of deep neural networks. Knowledge distillation and pruning are two prominent approaches in this domain, each offering unique advantages in achieving model efficiency. This paper investigates the combined effects of knowledge distillation and two pruning strategies, weight pruning and channel pruning, on enhancing compression efficiency and model performance. The study introduces a metric called “Performance Efficiency” to evaluate the impact of these pruning strategies on model compression and performance. Our research is conducted on the popular datasets CIFAR-10 and CIFAR-100. We compared diverse model architectures, including ResNet, DenseNet, EfficientNet, and MobileNet. The results emphasize the efficacy of both weight and channel pruning in achieving model compression. However, a significant distinction emerges, with weight pruning showing superior performance across all four architecture types. We realized that the weight pruning method better adapts to knowledge distillation than channel pruning. Pruned models show a significant reduction in parameters without a significant reduction in accuracy.

Topics

No keywords indexed for this article. Browse by subject →

References

27

[1]

Malihi, L., and Heidemann, G. (2023). Efficient and Controllable Model Compression through Sequential Knowledge Distillation and Pruning. Big Data Cogn. Comput., 7. 10.3390/bdcc7030154

[2]

Hinton, G.E., Vinyals, O., and Dean, J. (2015). Distilling the Knowledge in a Neural Network. arXiv.

[3]

Zagoruyko, S., and Komodakis, N. (2017). Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer. arXiv.

[4]

Ahn, S., Hu, S.X., Damianou, A., Lawrence, N.D., and Dai, Z. (2019, January 16–20). Variational Information Distillation for Knowledge Transfer. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA. 10.1109/cvpr.2019.00938

[5]

Romero, A., Ballas, N., Kahou, S.E., Chassang, A., Gatta, C., and Bengio, Y. (2015). FitNets: Hints for Thin Deep Nets. arXiv.

[6]

Tian, Y., Krishnan, D., and Isola, P. (2022). Contrastive Representation Distillation. arXiv.

[7]

Tung, F., and Mori, G. (November, January 27). Similarity-Preserving Knowledge Distillation. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea.

[8]

Pham, T.X., Niu, A., Kang, Z., Madjid, S.R., Hong, J.W., Kim, D., Tee, J.T.J., and Yoo, C.D. (2022). Self-Supervised Visual Representation Learning via Residual Momentum. arXiv. 10.1109/access.2023.3325842

[9]

Xu, K., Lai, R., Li, Y., and Gu, L. (2020). Feature Normalized Knowledge Distillation for Image Classification. Computer Vision—ECCV 2020 ECCV 2020, Springer. Lecture Notes in Computer Science. 10.1007/978-3-030-58595-2_40

[10]

Chen "Cross-Layer Distillation with Semantic Calibration" Proc. AAAI Conf. Artif. Intell. (2021)

[11]

Chen, D., Mei, J.-P., Zhang, H., Wang, C., Feng, Y., and Chen, C. (2022, January 19–24). Knowledge Distillation with the Reused Teacher Classifier. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA. 10.1109/cvpr52688.2022.01163

[12]

Han, S., Mao, H., and Dally, W.J. (2016). Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding. arXiv.

[13]

Han, S., Pool, J., Tran, J., and Dally, W.J. (2015). Learning Both Weights and Connections for Efficient Neural Networks. Adv. Neural Inf. Process. Syst., 28.

[14]

Li, H., Kadav, A., Durdanovic, I., Samet, H., and Graf, H.P. (2017). Pruning Filters for Efficient ConvNets. arXiv.

[15]

Lin, S., Ji, R., Yan, C., Zhang, B., Cao, L., Ye, Q., Huang, F., and Doermann, D. (2019, January 16–20). Towards Optimal Structured CNN Pruning via Generative Adversarial Learning. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA. 10.1109/cvpr.2019.00290

[16]

Molchanov, P., Tyree, S., Karras, T., Aila, T., and Kautz, J. (2017). Pruning Convolutional Neural Networks for Resource Efficient Inference. arXiv.

[17]

Ding, X., Ding, G., Guo, Y., Han, J., and Yan, C. (2019, January 9–15). Approximated Oracle Filter Pruning for Destructive CNN Width Optimization. Proceedings of the 36th International Conference on Machine Learning, Long Beach, CA, USA.

[18]

Aghli, N., and Ribeiro, E. (2021, January 19–25). Combining Weight Pruning and Knowledge Distillation for CNN Compression. Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Nashville, TN, USA. 10.1109/cvprw53098.2021.00356

[19]

Xie "Model Compression via Pruning and Knowledge Distillation for Person Re-Identification" J. Ambient Intell. Humaniz. Comput. (2021) 10.1007/s12652-020-02312-4

[20]

Cui "Joint Structured Pruning and Dense Knowledge Distillation for Efficient Transformer Model Compression" Neurocomputing (2021) 10.1016/j.neucom.2021.05.084

[21]

Kim, J., Chang, S., and Kwak, N. (2021). PQK: Model Compression via Pruning, Quantization, and Knowledge Distillation. arXiv. 10.21437/interspeech.2021-248

[22]

Wang "Progressive Multi-Level Distillation Learning for Pruning Network" Complex Intell. Syst. (2023) 10.1007/s40747-023-01036-0

[23]

Wen, W., Wu, C., Wang, Y., Chen, Y., and Li, H. (2016). Learning Structured Sparsity in Deep Neural Networks. Adv. Neural Inf. Process. Syst., 29.

[24]

Krizhevsky, A. (2009). Learning Multiple Layers of Features from Tiny Images, University of Toronto. Technical Report.

[25]

Densely Connected Convolutional Networks

Gao Huang, Zhuang Liu, Laurens Van Der Maaten et al.

2017 IEEE Conference on Computer Vision and Patter... 10.1109/cvpr.2017.243

[26]

Deep Residual Learning for Image Recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren et al.

2016 IEEE Conference on Computer Vision and Patter... 10.1109/cvpr.2016.90

[27]

Furlanello, T., Lipton, Z.C., Tschannen, M., Itti, L., and Anandkumar, A. (2018, January 10–15). Born Again Neural Networks. Proceedings of the 35th International Conference on Machine Learning, Stockholm, Sweden.

Metrics

9

Citations

27

References

Details

Published: Jun 29, 2024
Vol/Issue: 7(4)
Pages: 56
License: View

Authors

L

Leila Malihi

Department of Computer Vision, Institute of Cognitive Science, Osnabrück University, 49074 Osnabrück, Germany

G

Gunther Heidemann

Department of Computer Vision, Institute of Cognitive Science, Osnabrück University, 49074 Osnabrück, Germany

Funding

Osnabrück University “Open Access Publizieren” of the “Deutsche Forschungsgemeinschaft” (DFG) Award: DFG-4321

Cite This Article

Leila Malihi, Gunther Heidemann (2024). Matching the Ideal Pruning Method with Knowledge Distillation for Optimal Compression. Applied System Innovation, 7(4), 56. https://doi.org/10.3390/asi7040056

Matching the Ideal Pruning Method with Knowledge Distillation for Optimal Compression

You May Also Like