journal article Jun 10, 2020

Monocular depth estimation based on deep learning: An overview

View at Publisher Save 10.1007/s11431-020-1582-8
Topics

No keywords indexed for this article. Browse by subject →

References
119
[1]
Hu G, Huang S, Zhao L, et al. A robust RGB-D SLAM algorithm. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems. Vilamoura: IEEE, 2012. 1714–1719 10.1109/iros.2012.6386103
[2]
Zhu Z S, Su A, Liu H B, et al. Vision navigation for aircrafts based on 3D reconstruction from real-time image sequences. Sci China Tech Sci, 2015, 58: 1196–1208 10.1007/s11431-015-5828-x
[3]
Chai X, Gao F, Qi C K, et al. Obstacle avoidance for a hexapod robot in unknown environment. Sci China Tech Sci, 2017, 60: 818–831 10.1007/s11431-016-9017-6
[4]
Park S J, Hong K S, Lee S. RDFNet: RGB-D multi-level residual feature fusion for indoor semantic segmentation. In: Proceedings of the IEEE International Conference on Computer Vision. Venice, 2017. 4980–4989
[5]
The interpretation of structure from motion

S. Ullman

Proceedings of the Royal Society of London. Series... 1979 10.1098/rspb.1979.0006
[6]
Mancini F, Dubbini M, Gattelli M, et al. Using unmanned aerial vehicles (UAV) for high-resolution reconstruction of topography: The structure from motion approach on coastal environments. Remote Sens, 2013, 5: 6880–6898 10.3390/rs5126880
[7]
ORB-SLAM: A Versatile and Accurate Monocular SLAM System

Raul Mur-Artal, J. M. M. Montiel, Juan D. Tardos

IEEE Transactions on Robotics 2015 10.1109/tro.2015.2463671
[8]
Szeliski R, Kang S R. Shape ambiguities in structure from motion. IEEE Trans Pattern Anal Machine Intell, 1997, 19: 506–512 10.1109/34.589211
[9]
Zou L, Li Y. A method of stereo vision matching based on OpenCV. In: 2010 International Conference on Audio, Language and Image Processing. Shanghai: IEEE, 2010. 185–190 10.1109/icalip.2010.5684978
[10]
Cao Z L, Yan Z H, Wang H. Summary of binocular stereo vision matching technology (in Chinese). J Chongqing Univ Tech (Nat Sci), 2015, 29: 70–75
[11]
Benosman R, Manière T, Devars J. Multidirectional stereovision sensor, calibration and scenes reconstruction. In: Proceedings of 13th International Conference on Pattern Recognition. Vienna: IEEE, 1996. 161–165 10.1109/icpr.1996.546011
[12]
Ramírez-Hernández L R, Rodríguez-Quiñonez J C, Castro-Toscano M J, et al. Improve three-dimensional point localization accuracy in stereo vision systems using a novel camera calibration method. Int J Adv Robot Syst, 2020, 17: 172988141989671 10.1177/1729881419896717
[13]
Tateno K, Tombari F, Laina I, et al. CNN-SLAM: Real-time dense monocular SLAM with learned depth prediction. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, 2017. 6243–6252 10.1109/cvpr.2017.695
[14]
Yoneda K, Tehrani H, Ogawa T, et al. Lidar scan feature for localization with highly precise 3D map. In: 2014 IEEE Intelligent Vehicles Symposium Proceedings. Dearborn: IEEE, 2014. 1345–1350 10.1109/ivs.2014.6856596
[15]
Zhang F, Zhu X, Ye M. Fast human pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Long Beach, 2019. 3517–3526 10.1109/cvpr.2019.00363
[16]
Pang J, Chen K, Shi J, et al. Libra R-CNN: Towards balanced learning for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Long Beach, 2019. 821–830 10.1109/cvpr.2019.00091
[17]
Lyu H, Fu H, Hu X, et al. ESNet: Edge-based segmentation network for real-time semantic segmentation in traffic scenes. In: 2019 IEEE International Conference on Image Processing (ICIP). Taipei: IEEE, 2019. 1855–1859 10.1109/icip.2019.8803132
[18]
Zhao Z Q, Zheng P, Xu S T, et al. Object detection with deep learning: A review. IEEE Trans Neural Netw Learning Syst, 2019, 30: 3212–3232 10.1109/tnnls.2018.2876865
[19]
Ghosh S, Das N, Das I, et al. Understanding deep learning techniques for image segmentation. ACM Comput Surv, 2019, 52: 1–35 10.1145/3329784
[20]
Deep Convolutional Neural Networks for Image Classification: A Comprehensive Review

Waseem Rawat, Zenghui Wang

Neural Computation 2017 10.1162/neco_a_00990
[21]
Tang Y, Zhao C, Wang J, et al. An overview of perception and decision-making in autonomous systems in the era of learning. 2020, arXiv: 2001.02319
[22]
Facil J M, Ummenhofer B, Zhou H, et al. CAM-Convs: Camera-aware multi-scale convolutions for single-view depth. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Long Beach, 2019. 11826–11835 10.1109/cvpr.2019.01210
[23]
Garg R, Vijay Kumar B G, Carneiro G, et al. Unsupervised CNN for single view depth estimation: Geometry to the rescue. In: Leibe B, Matas J, Sebe N, et al., eds. Computer Vision-ECCV 2016. ECCV 2016. Lecture Notes in Computer Science, vol 9912. Cham: Springer, 2016. 740–756 10.1007/978-3-319-46484-8_45
[24]
Wang R, Pizer S M, Frahm J M. Recurrent neural network for (un-)supervised learning of monocular video visual odometry and depth. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Long Beach, 2019. 5555–5564 10.1109/cvpr.2019.00570
[25]
Chakravarty P, Narayanan P, Roussel T. GEN-SLAM: Generative modeling for monocular simultaneous localization and mapping. In: 2019 International Conference on Robotics and Automation (ICRA). Montreal: IEEE, 2019. 147–153 10.1109/icra.2019.8793530
[26]
Aleotti F, Tosi F, Poggi M, et al. Generative adversarial networks for unsupervised monocular depth prediction. In: Leal-Taixe L, Roth S, eds. Computer Vision-ECCV 2018 Workshops. ECCV 2018. Lecture Notes in Computer Science, vol 11129. Cham: Springer, 2018. 337–354 10.1007/978-3-030-11009-3_20
[27]
Godard C, Mac Aodha O, Brostow G J. Unsupervised monocular depth estimation with left-right consistency. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, 2017. 270–279 10.1109/cvpr.2017.699
[28]
Zhan H, Garg R, Saroj Weerasekera C, et al. Unsupervised learning of monocular depth estimation and visual odometry with deep feature reconstruction. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City, 2018. 340–349 10.1109/cvpr.2018.00043
[29]
Yin Z, Shi J. GeoNet: Unsupervised learning of dense depth, optical flow and camera pose. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City, 2018. 1983–1992 10.1109/cvpr.2018.00212
[30]
Wang C, Miguel Buenaposada J, Zhu R, et al. Learning depth from monocular videos using direct methods. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City, 2018. 2022–2030 10.1109/cvpr.2018.00216
[31]
Fei X, Wong A, Soatto S. Geo-supervised visual depth prediction. IEEE Robot Autom Lett, 2019, 4: 1661–1668 10.1109/lra.2019.2896963
[32]
Are we ready for autonomous driving? The KITTI vision benchmark suite

A. Geiger, P. Lenz, R. Urtasun

2012 IEEE Conference on Computer Vision and Patter... 2012 10.1109/cvpr.2012.6248074
[33]
Mayer N, Ilg E, Hausser P, et al. A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, 2016. 4040–4048 10.1109/cvpr.2016.438
[34]
Zhao C, Tang Y, Sun Q. Deep direct visual odometry. 2019, arXiv:1912.05101
[35]
Eigen D, Puhrsch C, Fergus R. Depth map prediction from a single image using a multi-scale deep network. In: Advances in Neural Information Processing Systems. 2014. 2366–2374
[36]
Chen X, Ma H, Wan J, et al. Multi-view 3D object detection network for autonomous driving. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, 2017. 1907–1915 10.1109/cvpr.2017.691
[37]
Understanding Convolution for Semantic Segmentation

Panqu Wang, Pengfei Chen, Ye Yuan et al.

2018 IEEE Winter Conference on Applications of Com... 2018 10.1109/wacv.2018.00163
[38]
Chang M F, Lambert J, Sangkloy P, et al. Argoverse: 3D tracking and forecasting with rich maps. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Long Beach, 2019. 8748–8757 10.1109/cvpr.2019.00895
[39]
Xue F, Wang X, Li S, et al. Beyond tracking: Selecting memory and refining poses for deep visual odometry. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Long Beach, 2019. 8575–8583 10.1109/cvpr.2019.00877
[40]
Clark R, Wang S, Wen H, et al. VINet: Visual-inertial odometry as a sequence-to-sequence learning problem. In: Thirty-First AAAI Conference on Artificial Intelligence, 2017 10.1609/aaai.v31i1.11215
[41]
Indoor Segmentation and Support Inference from RGBD Images

Nathan Silberman, Derek Hoiem, Pushmeet Kohli et al.

Lecture Notes in Computer Science 2012 10.1007/978-3-642-33715-4_54
[42]
The Cityscapes Dataset for Semantic Urban Scene Understanding

Marius Cordts, Mohamed Omran, Sebastian Ramos et al.

2016 IEEE Conference on Computer Vision and Patter... 10.1109/cvpr.2016.350
[43]
Zhou T, Brown M, Snavely N, et al. Unsupervised learning of depth and ego-motion from video. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, 2017. 1851–1858 10.1109/cvpr.2017.700
[44]
Bian J, Li Z, Wang N, et al. Unsupervised scale-consistent depth and ego-motion learning from monocular video. In: Advances in Neural Information Processing Systems, 2019. 35–45
[45]
Saxena A, Min Sun A, Ng A Y. Make3D: Learning 3D scene structure from a single still image. IEEE Trans Pattern Anal Mach Intell, 2009, 31: 824–840 10.1109/tpami.2008.132
[46]
Hoiem D, Efros A A, Hebert M. Automatic photo pop-up. ACM Trans Graph, 2005, 24: 577–584 10.1145/1073204.1073232
[47]
van Dijk T, de Croon G. How do neural networks see depth in single images? In: Proceedings of the IEEE International Conference on Computer Vision. Seoul, 2019. 2183–2191 10.1109/iccv.2019.00227
[48]
Kuznietsov Y, Stuckler J, Leibe B. Semi-supervised deep learning for monocular depth map prediction. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, 2017. 6647–6655 10.1109/cvpr.2017.238
[49]
Kendall A, Martirosyan H, Dasgupta S, et al. End-to-end learning of geometry and context for deep stereo regression. In: Proceedings of the IEEE International Conference on Computer Vision. Venice, 2017. 66–75 10.1109/iccv.2017.17
[50]
Mahjourian R, Wicke M, Angelova A. Unsupervised learning of depth and ego-motion from monocular video using 3D geometric constraints. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City, 2018. 5667–5675 10.1109/cvpr.2018.00594

Showing 50 of 119 references

Metrics
260
Citations
119
References
Details
Published
Jun 10, 2020
Vol/Issue
63(9)
Pages
1612-1627
License
View
Cite This Article
ChaoQiang Zhao, QiYu Sun, ChongZhen Zhang, et al. (2020). Monocular depth estimation based on deep learning: An overview. Science China Technological Sciences, 63(9), 1612-1627. https://doi.org/10.1007/s11431-020-1582-8