Edge Intelligence: A Review of Deep Neural Network Inference in Resource-Limited Environments

Dat Ngo; Hyun-Cheol Park; Bongsoon Kang

doi:10.3390/electronics14122495

journal article Open Access Jun 19, 2025

Edge Intelligence: A Review of Deep Neural Network Inference in Resource-Limited Environments

Dat Ngo

Hyun-Cheol Park

Bongsoon Kang

Electronics Vol. 14 No. 12 pp. 2495 · MDPI AG

View at Publisher Save 10.3390/electronics14122495

Abstract

Deploying deep neural networks (DNNs) in resource-limited environments—such as smartwatches, IoT nodes, and intelligent sensors—poses significant challenges due to constraints in memory, computing power, and energy budgets. This paper presents a comprehensive review of recent advances in accelerating DNN inference on edge platforms, with a focus on model compression, compiler optimizations, and hardware–software co-design. We analyze the trade-offs between latency, energy, and accuracy across various techniques, highlighting practical deployment strategies on real-world devices. In particular, we categorize existing frameworks based on their architectural targets and adaptation mechanisms and discuss open challenges such as runtime adaptability and hardware-aware scheduling. This review aims to guide the development of efficient and scalable edge intelligence solutions.

Topics

No keywords indexed for this article. Browse by subject →

References

236

[1]

Efficient Processing of Deep Neural Networks: A Tutorial and Survey

Vivienne Sze, Yu-Hsin Chen, Tien-Ju Yang et al.

Proceedings of the IEEE 2017 10.1109/jproc.2017.2761740

[2]

Kingma, D., and Ba, J. (2014). Adam: A Method for Stochastic Optimization. arXiv.

[3]

Leon, B. (1998). Online Learning and Neural Networks, Cambridge University Press. Chapter Online Algorithms and Stochastic Approximations.

[4]

Statista (2025, March 17). Number of Licensed Cellular Internet of Things (IoT) Connections Worldwide from 2021 to 2030. Available online: https://www.statista.com/statistics/1403316/global-licensed-cellular-iot-connections/.

[5]

Pouyanfar "A Survey on Deep Learning: Algorithms, Techniques, and Applications" ACM Comput. Surv. (2018)

[6]

Alom, M.Z., Taha, T.M., Yakopcic, C., Westberg, S., Sidike, P., Nasrin, M.S., Hasan, M., Van Essen, B.C., Awwal, A.A.S., and Asari, V.K. (2019). A State-of-the-Art Survey on Deep Learning Theory and Architectures. Electronics, 8. 10.3390/electronics8030292

[7]

Model Compression and Acceleration for Deep Neural Networks: The Principles, Progress, and Challenges

Yu Cheng, Duo Wang, Pan Zhou et al.

IEEE Signal Processing Magazine 2018 10.1109/msp.2017.2765695

[8]

Wang "Deep Neural Network Approximation for Custom Hardware: Where We’ve Been, Where We’re Going" ACM Comput. Surv. (2019) 10.1145/3214306

[9]

Capra, M., Bussolino, B., Marchisio, A., Shafique, M., Masera, G., and Martina, M. (2020). An Updated Survey of Efficient Hardware Architectures for Accelerating Deep Convolutional Neural Networks. Future Internet, 12. 10.3390/fi12070113

[10]

Moolchandani "Accelerating CNN Inference on ASICs: A Survey" J. Syst. Archit. (2021) 10.1016/j.sysarc.2020.101887

[11]

Shawahna "FPGA-Based Accelerators of Deep Learning Networks for Learning and Classification: A Review" IEEE Access (2019) 10.1109/access.2018.2890150

[12]

Mittal "A survey of FPGA-based accelerators for convolutional neural networks" Neural Comput. Appl. (2020) 10.1007/s00521-018-3761-1

[13]

Li "The Deep Learning Compiler: A Comprehensive Survey" IEEE Trans. Parallel Distrib. Syst. (2021) 10.1109/tpds.2020.3030548

[14]

Howard, A., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., and Adam, H. (2017). MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv.

[15]

Xception: Deep Learning with Depthwise Separable Convolutions

Francois Chollet

2017 IEEE Conference on Computer Vision and Patter... 10.1109/cvpr.2017.195

[16]

MobileNetV2: Inverted Residuals and Linear Bottlenecks

Mark Sandler, Andrew Howard, Menglong Zhu et al.

2018 IEEE/CVF Conference on Computer Vision and Pa... 10.1109/cvpr.2018.00474

[17]

Iandola, F., Han, S., Moskewicz, M., Ashraf, K., Dally, W., and Keutzer, K. (2016). SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size. arXiv.

[18]

ImageNet classification with deep convolutional neural networks

Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton

Communications of the ACM 2017 10.1145/3065386

[19]

Gholami, A., Kwon, K., Wu, B., Tai, Z., Yue, X., Jin, P., Zhao, S., and Keutzer, K. (2018, January 18–22). SqueezeNext: Hardware-Aware Neural Network Design. Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Salt Lake City, UT, USA. 10.1109/cvprw.2018.00215

[20]

ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices

Xiangyu Zhang, Xinyu Zhou, Mengxiao Lin et al.

2018 IEEE/CVF Conference on Computer Vision and Pa... 10.1109/cvpr.2018.00716

[21]

CondenseNet: An Efficient DenseNet Using Learned Group Convolutions

Gao Huang, Shichen Liu, Laurens Van Der Maaten et al.

2018 IEEE/CVF Conference on Computer Vision and Pa... 10.1109/cvpr.2018.00291

[22]

Xiong, Y., Kim, H., and Hedau, V. (2019). ANTNets: Mobile Convolutional Neural Networks for Resource Efficient Image Classification. arXiv.

[23]

Winograd "On multiplication of 2 × 2 matrices" Linear Algebra Its Appl. (1971) 10.1016/0024-3795(71)90009-7

[24]

Lavin, A., and Gray, S. (2016, January 27–30). Fast Algorithms for Convolutional Neural Networks. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA. 10.1109/cvpr.2016.435

[25]

Meng, L., and Brothers, J. (2019). Efficient Winograd Convolution via Integer Arithmetic. arXiv.

[26]

Lu, L., Liang, Y., Xiao, Q., and Yan, S. (May, January 30). Evaluating Fast Algorithms for Convolutional Neural Networks on FPGAs. Proceedings of the 2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM), Napa, CA, USA. 10.1109/fccm.2017.64

[27]

Kala, S., Mathew, J., Jose, B.R., and Nalesh, S. (2019, January 5–9). Evaluating Fast Algorithms for Convolutional Neural Networks on FPGAs. Proceedings of the 2019 32nd International Conference on VLSI Design and 2019 18th International Conference on Embedded Systems (VLSID), Delhi, India.

[28]

Hardieck, M., Kumm, M., Moller, K., and Zipf, P. (2019, January 24–26). Reconfigurable Convolutional Kernels for Neural Networks on FPGAs. Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Seaside, CA, USA. 10.1145/3289602.3293905

[29]

Searching for MobileNetV3

Andrew Howard, Mark Sandler, Bo Chen et al.

2019 IEEE/CVF International Conference on Computer... 10.1109/iccv.2019.00140

[30]

Yang, T.J., Howard, A., Chen, B., Zhang, X., Go, A., Sandler, M., Sze, V., and Adam, H. (2018, January 8–14). NetAdapt: Platform-Aware Neural Network Adaptation for Mobile Applications. Proceedings of the Computer Vision–ECCV 2018: 15th European Conference, Munich, Germany. 10.1007/978-3-030-01249-6_18

[31]

Smithson, S., Yang, G., Gross, W., and Meyer, B. (2016, January 7–10). Neural networks designing neural networks: Multi-objective hyper-parameter optimization. Proceedings of the 35th International Conference on Computer-Aided Design (ICCAD), Austin, TX, USA. 10.1145/2966986.2967058

[32]

Zhang, L., Yang, Y., Jiang, Y., Zhu, W., and Liu, Y. (2020, January 14–19). Fast Hardware-Aware Neural Architecture Search. Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Seattle, WA, USA. 10.1109/cvprw50498.2020.00354

[33]

Tan, M., Chen, B., Pang, R., Vasudevan, V., Sandler, M., Howard, A., and Le, Q. (2019, January 15–20). MnasNet: Platform-Aware Neural Architecture Search for Mobile. Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA. 10.1109/cvpr.2019.00293

[34]

FBNet: Hardware-Aware Efficient ConvNet Design via Differentiable Neural Architecture Search

Bichen Wu, Kurt Keutzer, Xiaoliang Dai et al.

2019 IEEE/CVF Conference on Computer Vision and Pa... 10.1109/cvpr.2019.01099

[35]

Cai, H., Zhu, L., and Han, S. (2019, January 6–9). ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware. Proceedings of the 7th International Conference on Learning Representations (ICLR), New Orleans, LA, USA.

[36]

Sinha, N., Shabayek, A., Kacem, A., Rostami, P., Shneider, C., and Aouada, D. (2024, January 3–8). Hardware Aware Evolutionary Neural Architecture Search using Representation Similarity Metric. Proceedings of the 2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA. 10.1109/wacv57701.2024.00261

[37]

Han, C., Chuang, G., Tianzhe, W., Zhekai, Z., and Song, H. (2020, January 26–30). Once-for-All: Train One Network and Specialize it for Efficient Deployment. Proceedings of the 8th International Conference on Learning Representations (ICLR), Addis Ababa, Ethiopia.

[38]

Dai "NeST: A Neural Network Synthesis Tool Based on a Grow-and-Prune Paradigm" IEEE Trans. Comput. (2019) 10.1109/tc.2019.2914438

[39]

Cao, S., Zhang, C., Yao, Z., Xiao, W., Nie, L., Zhan, D., Liu, Y., Wu, M., and Zhang, L. (2019, January 24–26). Efficient and Effective Sparse LSTM on FPGA with Bank-Balanced Sparsity. Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Seaside, CA, USA. 10.1145/3289602.3293898

[40]

Zhu, M., Zhang, T., Gu, Z., and Xie, Y. (2019, January 12–16). Sparse Tensor Core: Algorithm and Hardware Co-Design for Vector-wise Sparse Neural Networks on Modern GPUs. Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, Columbus, OH, USA. 10.1145/3352460.3358269

[41]

Wen, W., Wu, C., Wang, Y., Chen, Y., and Li, H. (2016, January 5–10). Learning structured sparsity in deep neural networks. Proceedings of the 30th International Conference on Neural Information Processing Systems, Barcelona, Spain.

[42]

Mao, H., Han, S., Pool, J., Li, W., Liu, X., Wang, Y., and Dally, W.J. (2017, January 21–26). Exploring the Granularity of Sparsity in Convolutional Neural Networks. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Honolulu, HI, USA. 10.1109/cvprw.2017.241

[43]

Huang, Q., Zhou, K., You, S., and Neumann, U. (2018, January 12–15). Learning to Prune Filters in Convolutional Neural Networks. Proceedings of the 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), Lake Tahoe, NV, USA. 10.1109/wacv.2018.00083

[44]

Yu, J., Lukefahr, A., Palframan, D., Dasika, G., Das, R., and Mahlke, S. (2017, January 24–28). Scalpel: Customizing DNN pruning to the underlying hardware parallelism. Proceedings of the 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA), Toronto, ON, Canada. 10.1145/3079856.3080215

[45]

Han, S., Pool, J., Tran, J., and Dally, W. (2015, January 7–12). Learning both weights and connections for efficient neural networks. Proceedings of the 29th International Conference on Neural Information Processing Systems-Volume 1, Montreal, QC, Canada.

[46]

Simonyan, K., and Zisserman, A. (2015). Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv.

[47]

Jelcicova, Z., and Verhelst, M. (2022). Delta Keyword Transformer: Bringing Transformers to the Edge through Dynamically Pruned Multi-Head Self-Attention. arXiv.

[48]

Huang, S., Liu, N., Liang, Y., Peng, H., Li, H., Xu, D., Xie, M., and Ding, C. (2022, January 6–7). An Automatic and Efficient BERT Pruning for Edge AI Systems. Proceedings of the 23rd International Symposium on Quality Electronic Design (ISQED), Santa Clara, CA, USA. 10.1109/isqed54688.2022.9806197

[49]

Manessi, F., Rozza, A., Bianco, S., Napoletano, P., and Schettini, R. (2018, January 20–24). Automated Pruning for Deep Neural Network Compression. Proceedings of the 24th International Conference on Pattern Recognition (ICPR), Beijing, China. 10.1109/icpr.2018.8546129

[50]

Molchanov, P., Tyree, S., Karras, T., Aila, T., and Kautz, J. (2017). Pruning Convolutional Neural Networks for Resource Efficient Inference. arXiv.

Showing 50 of 236 references

Metrics

27

Citations

236

References

Details

Published: Jun 19, 2025
Vol/Issue: 14(12)
Pages: 2495
License: View

Authors

D

Dat Ngo

Department of Computer Engineering, Korea National University of Transportation, Chungju 27469, Republic of Korea

H

Hyun-Cheol Park

Department of Computer Engineering, Korea National University of Transportation, Chungju 27469, Republic of Korea

B

Bongsoon Kang

Department of Electronics Engineering, Dong-A University, Busan 49315, Republic of Korea

Funding

National Research Foundation of Korea (NRF) Award: NRF-2023R1A2C1004592

Cite This Article

Dat Ngo, Hyun-Cheol Park, Bongsoon Kang (2025). Edge Intelligence: A Review of Deep Neural Network Inference in Resource-Limited Environments. Electronics, 14(12), 2495. https://doi.org/10.3390/electronics14122495

Edge Intelligence: A Review of Deep Neural Network Inference in Resource-Limited Environments

You May Also Like