journal article Open Access Oct 13, 2024

Occlusion Removal in Light-Field Images Using CSPDarknet53 and Bidirectional Feature Pyramid Network: A Multi-Scale Fusion-Based Approach

Applied Sciences Vol. 14 No. 20 pp. 9332 · MDPI AG
View at Publisher Save 10.3390/app14209332
Abstract
Occlusion removal in light-field images remains a significant challenge, particularly when dealing with large occlusions. An architecture based on end-to-end learning is proposed to address this challenge that interactively combines CSPDarknet53 and the bidirectional feature pyramid network for efficient light-field occlusion removal. CSPDarknet53 acts as the backbone, providing robust and rich feature extraction across multiple scales, while the bidirectional feature pyramid network enhances comprehensive feature integration through an advanced multi-scale fusion mechanism. To preserve efficiency without sacrificing the quality of the extracted feature, our model uses separable convolutional blocks. A simple refinement module based on half-instance initialization blocks is integrated to explore the local details and global structures. The network’s multi-perspective approach guarantees almost total occlusion removal, enabling it to handle occlusions of varying sizes or complexity. Numerous experiments were run on sparse and dense datasets with varying degrees of occlusion severity in order to assess the performance. Significant advancements over the current cutting-edge techniques are shown in the findings for the sparse dataset, while competitive results are obtained for the dense dataset.
Topics

No keywords indexed for this article. Browse by subject →

References
67
[1]
Joshi, N., Avidan, S., Matusik, W., and Kriegman, D.J. (2007, January 14–21). Synthetic aperture tracking: Tracking through occlusions. Proceedings of the 2007 IEEE 11th International Conference on Computer Vision, Rio de Janeiro, Brazil. 10.1109/iccv.2007.4409032
[2]
Ren, M., Liu, R., Hong, H., Ren, J., and Xiao, G. (2017). Fast object detection in light field imaging by integrating deep learning with defocusing. Appl. Sci., 7. 10.3390/app7121309
[3]
Yang "A new hybrid synthetic aperture imaging model for tracking and seeing people through occlusion" IEEE Trans. Circuits Syst. Video Technol. (2013) 10.1109/tcsvt.2013.2242553
[4]
Yang, T., Zhang, Y., Yu, J., Li, J., Ma, W., Tong, X., Yu, R., and Ran, L. (2014, January 6–12). All-in-focus synthetic aperture imaging. Proceedings of the Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland. Proceedings, Part VI 13.
[5]
You Only Look Once: Unified, Real-Time Object Detection

Joseph Redmon, Santosh Divvala, Ross Girshick et al.

2016 IEEE Conference on Computer Vision and Patter... 10.1109/cvpr.2016.91
[6]
YOLO9000: Better, Faster, Stronger

Joseph Redmon, Ali Farhadi

2017 IEEE Conference on Computer Vision and Patter... 10.1109/cvpr.2017.690
[7]
Redmon, J., and Farhadi, A. (2018). Yolov3: An incremental improvement. arXiv.
[8]
Kasem "Deep learning for table detection and structure recognition: A survey" Acm Comput. Surv. (2022)
[9]
Kasem, M.S., Mahmoud, M., and Kang, H.S. (2023). Advancements and Challenges in Arabic Optical Character Recognition: A Comprehensive Survey. arXiv.
[10]
Lin "Image privacy protection scheme based on high-quality reconstruction DCT compression and nonlinear dynamics" Expert Syst. Appl. (2024) 10.1016/j.eswa.2024.124891
[11]
Liu, G., Reda, F.A., Shih, K.J., Wang, T.C., Tao, A., and Catanzaro, B. (2018, January 8–14). Image inpainting for irregular holes using partial convolutions. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany. 10.1007/978-3-030-01252-6_6
[12]
Mahmoud, M., and Kang, H.S. (2023). Ganmasker: A two-stage generative adversarial network for high-quality face mask removal. Sensors, 23. 10.3390/s23167094
[13]
Mahmoud, M., Kasem, M.S., and Kang, H.S. (2024). A Comprehensive Survey of Masked Faces: Recognition, Detection, and Unmasking. arXiv. 10.3390/app14198781
[14]
Lin "Camera array based light field microscopy" Biomed. Opt. Express (2015) 10.1364/boe.6.003179
[15]
Vaish, V., Wilburn, B., Joshi, N., and Levoy, M. (July, January 27). Using plane+ parallax for calibrating dense camera arrays. Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004, Washington, DC, USA.
[16]
Venkataraman "Picam: An ultra-thin high performance monolithic camera array" ACM Trans. Graph. (TOG) (2013) 10.1145/2508363.2508390
[17]
Wilburn, B., Joshi, N., Vaish, V., Levoy, M., and Horowitz, M. (July, January 27). High-speed videography using a dense camera array. Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004, Washington, DC, USA.
[18]
Wilburn "High performance imaging using large camera arrays" ACM Trans. Graph. (2005) 10.1145/1073204.1073259
[19]
Ng, R., Levoy, M., Brédif, M., Duval, G., Horowitz, M., and Hanrahan, P. (2005). Light Field Photography with a Hand-Held Plenoptic Camera. [Ph.D. Thesis, Stanford University].
[20]
Wang "Selective light field refocusing for camera arrays using bokeh rendering and superresolution" IEEE Signal Process. Lett. (2018) 10.1109/lsp.2018.2885213
[21]
Lee "Complex-valued disparity: Unified depth model of depth from stereo, depth from focus, and depth from defocus based on the light field gradient" IEEE Trans. Pattern Anal. Mach. Intell. (2019) 10.1109/tpami.2019.2946159
[22]
Zhou "Unsupervised monocular depth estimation from light field image" IEEE Trans. Image Process. (2019) 10.1109/tip.2019.2944343
[23]
Peng, J., Xiong, Z., Liu, D., and Chen, X. (2018, January 5–8). Unsupervised depth estimation from light field using a convolutional neural network. Proceedings of the 2018 International Conference on 3D Vision (3DV), Verona, Italy. 10.1109/3dv.2018.00042
[24]
Shin, C., Jeon, H.G., Yoon, Y., Kweon, I.S., and Kim, S.J. (2018, January 1). Epinet: A fully-convolutional neural network using epipolar geometry for depth from light field images. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Online. 10.1109/cvpr.2018.00499
[25]
Tsai, Y.J., Liu, Y.L., Ouhyoung, M., and Chuang, Y.Y. (2020, January 7–12). Attention-based view selection networks for light-field disparity estimation. Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA. 10.1609/aaai.v34i07.6888
[26]
Schilling, H., Diebold, M., Rother, C., and Jähne, B. (2018, January 1). Trust your model: Light field depth estimation with inline occlusion handling. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Online. 10.1109/cvpr.2018.00476
[27]
Jin, J., Hou, J., Chen, J., and Kwong, S. (2020, January 24). Light field spatial super-resolution via deep combinatorial geometry embedding and structural consistency regularization. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Online. 10.1109/cvpr42600.2020.00233
[28]
Liu "Learning from EPI-volume-stack for light field image angular super-resolution" Signal Process. Image Commun. (2021) 10.1016/j.image.2021.116353
[29]
Wang "LFNet: A novel bidirectional recurrent convolutional neural network for light-field image super-resolution" IEEE Trans. Image Process. (2018) 10.1109/tip.2018.2834819
[30]
Yeung "Light field spatial super-resolution using deep efficient spatial-angular separable convolution" IEEE Trans. Image Process. (2018) 10.1109/tip.2018.2885236
[31]
Zhang, S., Lin, Y., and Sheng, H. (2019, January 26). Residual networks for light field image super-resolution. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Online. 10.1109/cvpr.2019.01130
[32]
Salem, A., Ibrahem, H., and Kang, H.S. (2023, January 27). Learning epipolar-spatial relationship for light field image super-resolution. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Online. 10.1109/cvprw59228.2023.00140
[33]
Salem, A., Ibrahem, H., and Kang, H.S. (2023). Light Field Image Super-Resolution Using Deep Residual Networks on Lenslet Images. Sensors, 23. 10.3390/s23042018
[34]
Zhang "Light field saliency detection with deep convolutional networks" IEEE Trans. Image Process. (2020) 10.1109/tip.2020.2970529
[35]
Zhang "LFNet: Light field fusion network for salient object detection" IEEE Trans. Image Process. (2020) 10.1109/tip.2020.2990341
[36]
Lumentut "Deep recurrent network for fast and full-resolution light field deblurring" IEEE Signal Process. Lett. (2019) 10.1109/lsp.2019.2947379
[37]
Salem "Light Field Reconstruction with Dual Features Extraction and Macro-Pixel Upsampling" IEEE Access (2024) 10.1109/access.2024.3446592
[38]
Wang, Y., Liu, F., Wang, Z., Hou, G., Sun, Z., and Tan, T. (2018, January 8–14). End-to-end view synthesis for light field imaging with pseudo 4DCNN. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany. 10.1007/978-3-030-01216-8_21
[39]
Wu "Light field reconstruction using convolutional network on EPI and extended applications" IEEE Trans. Pattern Anal. Mach. Intell. (2018) 10.1109/tpami.2018.2845393
[40]
Wu "Learning sheared EPI structure for light field reconstruction" IEEE Trans. Image Process. (2019) 10.1109/tip.2019.2895463
[41]
Yagoub, B., Kasem, M.S., and Kang, H.S. (2024). Enhancing X-ray Security Image Synthesis: Advanced Generative Models and Innovative Data Augmentation Techniques. Appl. Sci., 14. 10.3390/app14103961
[42]
Wang, C.Y., Liao, H.Y.M., Wu, Y.H., Chen, P.Y., Hsieh, J.W., and Yeh, I.H. (2020, January 24). CSPNet: A new backbone that can enhance learning capability of CNN. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Online. 10.1109/cvprw50498.2020.00203
[43]
Tan, M., Pang, R., and Le, Q.V. (2020, January 24). Efficientdet: Scalable and efficient object detection. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Online. 10.1109/cvpr42600.2020.01079
[44]
Chen, L., Lu, X., Zhang, J., Chu, X., and Chen, C. (2021, January 18). Hinet: Half instance normalization network for image restoration. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Online. 10.1109/cvprw53098.2021.00027
[45]
Bertalmio, M., Sapiro, G., Caselles, V., and Ballester, C. (2000, January 23–28). Image inpainting. Proceedings of the 27th Annual Conference on Computer Graphics and Interactive Techniques, New Orleans, LA, USA. 10.1145/344779.344972
[46]
Ballester "Filling-in by joint interpolation of vector fields and gray levels" IEEE Trans. Image Process. (2001) 10.1109/83.935036
[47]
PatchMatch

Connelly Barnes, Eli Shechtman, Adam Finkelstein et al.

ACM Transactions on Graphics 2009 10.1145/1531326.1531330
[48]
Li, J., Wang, N., Zhang, L., Du, B., and Tao, D. (2020, January 13–19). Recurrent feature reasoning for image inpainting. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA. 10.1109/cvpr42600.2020.00778
[49]
Xie, C., Liu, S., Li, C., Cheng, M.M., Zuo, W., Liu, X., Wen, S., and Ding, E. (November, January 27). Image inpainting with learnable bidirectional attention maps. Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea.
[50]
Zhang "Plenopatch: Patch-based plenoptic image manipulation" IEEE Trans. Vis. Comput. Graph. (2016) 10.1109/tvcg.2016.2532329

Showing 50 of 67 references

Metrics
18
Citations
67
References
Details
Published
Oct 13, 2024
Vol/Issue
14(20)
Pages
9332
License
View
Funding
National Research Foundation of Korea (NRF) Award: 2020R1I1A3A04037680
Innovative Human Resource Development for Local Intellectualization Program Award: 2020R1I1A3A04037680
Cite This Article
Mostafa Farouk Senussi, Hyun-Soo Kang (2024). Occlusion Removal in Light-Field Images Using CSPDarknet53 and Bidirectional Feature Pyramid Network: A Multi-Scale Fusion-Based Approach. Applied Sciences, 14(20), 9332. https://doi.org/10.3390/app14209332