Skip to main content
Erschienen in: International Journal of Machine Learning and Cybernetics 6/2021

03.01.2021 | Original Article

Attention-based context aggregation network for monocular depth estimation

verfasst von: Yuru Chen, Haitao Zhao, Zhengwei Hu, Jingchao Peng

Erschienen in: International Journal of Machine Learning and Cybernetics | Ausgabe 6/2021

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Depth estimation is a traditional computer vision task, which plays a crucial role in understanding 3D scene geometry. Recently, algorithms that combine the multi-scale features extracted by the dilated convolution based block (atrous spatial pyramid pooling, ASPP) have gained significant improvements in depth estimation. However, the discretized and predefined dilation kernels cannot capture the continuous context information that differs in diverse scenes and easily introduce the grid artifacts. This paper proposes a novel algorithm, called attention-based context aggregation network (ACAN) for depth estimation. A supervised self-attention model is designed and utilized to adaptively learn the task-specific similarities between different pixels to model the continuous context information. Moreover, a soft ordinal inference is proposed to transform the predicted probabilities to continuous depth values which reduce the discretization error (about 1% decrease in RMSE). ACAN achieves state-of-the-art performance on public monocular depth-estimation benchmark datasets. The source code of ACAN can be found in https://​github.​com/​miraiaroha/​ACAN.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Weitere Produktempfehlungen anzeigen
Literatur
1.
Zurück zum Zitat Silberman N, Hoiem D, Kohli P, Fergus R (2012) Indoor segmentation and support inference from rgbd images 7576(1):746–760 Silberman N, Hoiem D, Kohli P, Fergus R (2012) Indoor segmentation and support inference from rgbd images 7576(1):746–760
2.
Zurück zum Zitat Simon M, Milz S, Amende K, Gross HM (2018) Complex-yolo: real-time 3d object detection on point clouds Simon M, Milz S, Amende K, Gross HM (2018) Complex-yolo: real-time 3d object detection on point clouds
3.
Zurück zum Zitat Tateno K, Tombari F, Laina I, Navab N (2017) Cnn-slam: real-time dense monocular slam with learned depth prediction. p 6565–6574 Tateno K, Tombari F, Laina I, Navab N (2017) Cnn-slam: real-time dense monocular slam with learned depth prediction. p 6565–6574
4.
Zurück zum Zitat Laina I, Rupprecht C, Belagiannis V, Tombari F, Navab N (2016) Deeper depth prediction with fully convolutional residual networks. 3D Vision (3DV), 2016 fourth international conference on. p 239–248. IEEE Laina I, Rupprecht C, Belagiannis V, Tombari F, Navab N (2016) Deeper depth prediction with fully convolutional residual networks. 3D Vision (3DV), 2016 fourth international conference on. p 239–248. IEEE
5.
Zurück zum Zitat Ghosh S, Pal A, Jaiswal S, Santosh KC, Das N, Nasipuri M (2019) Segfast-v2: Semantic image segmentation with less parameters in deep learning for autonomous driving. Int J Mach Learn Cybern 10(11):3145–3154CrossRef Ghosh S, Pal A, Jaiswal S, Santosh KC, Das N, Nasipuri M (2019) Segfast-v2: Semantic image segmentation with less parameters in deep learning for autonomous driving. Int J Mach Learn Cybern 10(11):3145–3154CrossRef
6.
Zurück zum Zitat Hirschmüller H (2005) Accurate and efficient stereo processing by semi-global matching and mutual information. IEEE computer society conference on computer vision and pattern recognition. p 807–814 Hirschmüller H (2005) Accurate and efficient stereo processing by semi-global matching and mutual information. IEEE computer society conference on computer vision and pattern recognition. p 807–814
7.
Zurück zum Zitat Roberts R, Sinha SN, Szeliski R, Steedly D (2011) Structure from motion for scenes with large duplicate structures. IEEE conference on computer vision and pattern recognition. p 3137–3144 Roberts R, Sinha SN, Szeliski R, Steedly D (2011) Structure from motion for scenes with large duplicate structures. IEEE conference on computer vision and pattern recognition. p 3137–3144
8.
Zurück zum Zitat Eigen D, Puhrsch C, Fergus R (2014) Depth map prediction from a single image using a multi-scale deep network. Int Conf Neural Inf Process Syst. 1:2366–2374 Eigen D, Puhrsch C, Fergus R (2014) Depth map prediction from a single image using a multi-scale deep network. Int Conf Neural Inf Process Syst. 1:2366–2374
9.
Zurück zum Zitat Eigen D, Fergus R (2014) Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. pp. 2650–2658 Eigen D, Fergus R (2014) Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. pp. 2650–2658
10.
Zurück zum Zitat Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional networks for biomedical image segmentation. International conference on medical image computing and computer-assisted intervention. p 234–241 Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional networks for biomedical image segmentation. International conference on medical image computing and computer-assisted intervention. p 234–241
11.
Zurück zum Zitat Newell A, Yang K, Deng J (2016) Stacked hourglass networks for human pose estimation. p 483–499 Newell A, Yang K, Deng J (2016) Stacked hourglass networks for human pose estimation. p 483–499
12.
Zurück zum Zitat Wei SE, Ramakrishna V, Kanade T, Sheikh Y (2016) Convolutional pose machines. p 4724–4732 Wei SE, Ramakrishna V, Kanade T, Sheikh Y (2016) Convolutional pose machines. p 4724–4732
13.
Zurück zum Zitat Huang J, Lee AB, Mumford D (2000) Statistics of range images. Comput Vis Pattern Recogn. Proceedings IEEE conference on. vol.1. p 324–331 Huang J, Lee AB, Mumford D (2000) Statistics of range images. Comput Vis Pattern Recogn. Proceedings IEEE conference on. vol.1. p 324–331
15.
Zurück zum Zitat LC Chen, G Papandreou, I Kokkinos, K Murphy, AL Yuille (2018) Deeplab Semantic image segmentation with deep convolutional nets atrous convolution and fully connected. IEEE Trans Pattern Anal Mach Intell 40(4): 834–848CrossRef LC Chen, G Papandreou, I Kokkinos, K Murphy, AL Yuille (2018) Deeplab Semantic image segmentation with deep convolutional nets atrous convolution and fully connected. IEEE Trans Pattern Anal Mach Intell 40(4): 834–848CrossRef
16.
Zurück zum Zitat Chen LC, Papandreou G, Schroff F, Adam H (2017) Rethinking atrous convolution for semantic image segmentation Chen LC, Papandreou G, Schroff F, Adam H (2017) Rethinking atrous convolution for semantic image segmentation
17.
Zurück zum Zitat Wang P, Chen P, Yuan Y, Liu D, Huang Z, Hou X, Cottrell G (2018) Understanding convolution for semantic segmentation. IEEE winter conference on applications of computer vision. p 1451–1460 Wang P, Chen P, Yuan Y, Liu D, Huang Z, Hou X, Cottrell G (2018) Understanding convolution for semantic segmentation. IEEE winter conference on applications of computer vision. p 1451–1460
18.
Zurück zum Zitat Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need
19.
Zurück zum Zitat Wang X, Girshick R, Gupta A, He K (2017) Non-local neural networks Wang X, Girshick R, Gupta A, He K (2017) Non-local neural networks
21.
Zurück zum Zitat Saxena A, Chung SH, Ng AY (2005) Learning depth from single monocular images. International conference on neural information processing systems. p 1161–1168 Saxena A, Chung SH, Ng AY (2005) Learning depth from single monocular images. International conference on neural information processing systems. p 1161–1168
22.
Zurück zum Zitat Saxena A, Sun M, Ng AY (2007) Learning 3-d scene structure from a single still image. IEEE international conference on computer vision. p 1–8 Saxena A, Sun M, Ng AY (2007) Learning 3-d scene structure from a single still image. IEEE international conference on computer vision. p 1–8
23.
Zurück zum Zitat Liu B, Gould S, Koller D (2010) Single image depth estimation from predicted semantic labels. Comput Vis Pattern Recogn. p 1253–1260 Liu B, Gould S, Koller D (2010) Single image depth estimation from predicted semantic labels. Comput Vis Pattern Recogn. p 1253–1260
24.
Zurück zum Zitat Ladicky L, Shi J, Pollefeys M (2014) Pulling things out of perspective. IEEE Conf Comput Vis Pattern Recogn 9:89–96 Ladicky L, Shi J, Pollefeys M (2014) Pulling things out of perspective. IEEE Conf Comput Vis Pattern Recogn 9:89–96
25.
Zurück zum Zitat Junjie H, Ozay M, Zhang Y, Okatani T (2018) Toward higher resolution maps with accurate object boundaries, revisiting single image depth estimation Junjie H, Ozay M, Zhang Y, Okatani T (2018) Toward higher resolution maps with accurate object boundaries, revisiting single image depth estimation
26.
Zurück zum Zitat Han Yan, Shunli Zhang, Yu Zhang, and Li Zhang. Monocular depth estimation with guidance of surface normal map. Neurocomputing, 280:86–100, 2018CrossRef Han Yan, Shunli Zhang, Yu Zhang, and Li Zhang. Monocular depth estimation with guidance of surface normal map. Neurocomputing, 280:86–100, 2018CrossRef
27.
Zurück zum Zitat Junning Zhang, Qunxing Su, Pengyuan Liu, Chao Xu, and Yanlong Chen. Unsupervised learning of monocular depth and ego-motion with spacešctemporal-centroid loss. International Journal of Machine Learning and Cybernetics, 11(3), 615–627, 2020CrossRef Junning Zhang, Qunxing Su, Pengyuan Liu, Chao Xu, and Yanlong Chen. Unsupervised learning of monocular depth and ego-motion with spacešctemporal-centroid loss. International Journal of Machine Learning and Cybernetics, 11(3), 615–627, 2020CrossRef
28.
Zurück zum Zitat Roy A, Todorovic S (2016) Monocular depth estimation using neural regression forest. Comput Vis Pattern Recogn. p 5506–5514 Roy A, Todorovic S (2016) Monocular depth estimation using neural regression forest. Comput Vis Pattern Recogn. p 5506–5514
29.
Zurück zum Zitat Zwald L, Lambertlacroix S (2012) The berhu penalty and the grouped effect. Statistics Zwald L, Lambertlacroix S (2012) The berhu penalty and the grouped effect. Statistics
30.
Zurück zum Zitat Garg R, Vijay Kumar BG, Carneiro G, Reid I (2016) Unsupervised cnn for single view depth estimation: Geometry to the rescue. European conference on computer vision. p 740–756 Garg R, Vijay Kumar BG, Carneiro G, Reid I (2016) Unsupervised cnn for single view depth estimation: Geometry to the rescue. European conference on computer vision. p 740–756
31.
Zurück zum Zitat Godard C, Aodha OM, Brostow GJ (2017) Unsupervised monocular depth estimation with left-right consistency. Comput Vis Pattern Recogn. 1:6602–6611 Godard C, Aodha OM, Brostow GJ (2017) Unsupervised monocular depth estimation with left-right consistency. Comput Vis Pattern Recogn. 1:6602–6611
32.
Zurück zum Zitat Wang Z, Bovik AC, Sheikh HR, Simoncelli EP (2004) Image quality assessment: from error visibility to structural similarity. IEEE Trans Image Process 13(4):600–612CrossRef Wang Z, Bovik AC, Sheikh HR, Simoncelli EP (2004) Image quality assessment: from error visibility to structural similarity. IEEE Trans Image Process 13(4):600–612CrossRef
33.
Zurück zum Zitat Heise P, Klose S, Jensen B, Knoll A (2014) Pm-huber: Patchmatch with huber regularization for stereo matching. IEEE international conference on computer vision. p 2360–2367 Heise P, Klose S, Jensen B, Knoll A (2014) Pm-huber: Patchmatch with huber regularization for stereo matching. IEEE international conference on computer vision. p 2360–2367
34.
Zurück zum Zitat Saining Xie and Zhuowen Tu. Holistically-nested edge detection. International Journal of Computer Vision, 125(1–3), 3–18, 2015MathSciNet Saining Xie and Zhuowen Tu. Holistically-nested edge detection. International Journal of Computer Vision, 125(1–3), 3–18, 2015MathSciNet
35.
Zurück zum Zitat Yu F, Koltun V, Funkhouser T (2017) Dilated residual networks. p 636–644 Yu F, Koltun V, Funkhouser T (2017) Dilated residual networks. p 636–644
36.
Zurück zum Zitat Kim Y, Jung H, Min D, Sohn K (2018) Deep monocular depth estimation via integration of global and local predictions. IEEE Trans Image Process Publ IEEE Sig Process Soc 99:1–1 Kim Y, Jung H, Min D, Sohn K (2018) Deep monocular depth estimation via integration of global and local predictions. IEEE Trans Image Process Publ IEEE Sig Process Soc 99:1–1
37.
Zurück zum Zitat Xu D, Ricci E, Ouyang W, Wang X, Sebe N (2017) Multi-scale continuous crfs as sequential deep networks for monocular depth estimation. p 161–169 Xu D, Ricci E, Ouyang W, Wang X, Sebe N (2017) Multi-scale continuous crfs as sequential deep networks for monocular depth estimation. p 161–169
38.
Zurück zum Zitat Liu F, Shen C, Lin G (2015) Deep convolutional neural fields for depth estimation from a single image. IEEE conference on computer vision and pattern recognition. p 5162–5170 Liu F, Shen C, Lin G (2015) Deep convolutional neural fields for depth estimation from a single image. IEEE conference on computer vision and pattern recognition. p 5162–5170
39.
Zurück zum Zitat Li B, Shen C, Dai Y, Van Den Hengel A, He M (2015) Depth and surface normal estimation from monocular images using regression on deep features and hierarchical crfs. Comput Vis Pattern Recogn. p 1119–1127 Li B, Shen C, Dai Y, Van Den Hengel A, He M (2015) Depth and surface normal estimation from monocular images using regression on deep features and hierarchical crfs. Comput Vis Pattern Recogn. p 1119–1127
40.
Zurück zum Zitat F. Liu, C. Shen, G. Lin, and I Reid. Learning depth from single monocular images using deep convolutional neural fields. IEEE Transactions on Pattern Analysis & Machine Intelligence, 38(10), 2024–2039, 2015CrossRef F. Liu, C. Shen, G. Lin, and I Reid. Learning depth from single monocular images using deep convolutional neural fields. IEEE Transactions on Pattern Analysis & Machine Intelligence, 38(10), 2024–2039, 2015CrossRef
41.
Zurück zum Zitat Zhang Z, Xu C, Yang J, Gao J, Cui Z (2018) Progressive hard-mining network for monocular depth estimation. IEEE Trans Image Process. 99:1–1MathSciNetMATH Zhang Z, Xu C, Yang J, Gao J, Cui Z (2018) Progressive hard-mining network for monocular depth estimation. IEEE Trans Image Process. 99:1–1MathSciNetMATH
42.
Zurück zum Zitat Li B, Dai Y, He M (2018) Monocular depth estimation with hierarchical fusion of dilated cnns and soft-weighted-sum inference. Pattern Recogn Li B, Dai Y, He M (2018) Monocular depth estimation with hierarchical fusion of dilated cnns and soft-weighted-sum inference. Pattern Recogn
43.
Zurück zum Zitat Moukari M, Picard S, Simon L, Jurie F (2018) Deep multi-scale architectures for monocular depth estimation. arXiv preprint arXiv:1806.03051 Moukari M, Picard S, Simon L, Jurie F (2018) Deep multi-scale architectures for monocular depth estimation. arXiv preprint arXiv:​1806.​03051
44.
Zurück zum Zitat Fu H, Gong M, Wang C, Batmanghelich K, Tao D (2018) Deep ordinal regression network for monocular depth estimation. Proceedings of the IEEE conference on computer vision and pattern recognition. p. 2002–2011 Fu H, Gong M, Wang C, Batmanghelich K, Tao D (2018) Deep ordinal regression network for monocular depth estimation. Proceedings of the IEEE conference on computer vision and pattern recognition. p. 2002–2011
45.
Zurück zum Zitat Zheng S, Jayasumana S, Romera-Paredes B, Vineet V, Su Z, Du D, Huang C, Torr PHS (2015) Conditional random fields as recurrent neural networks. p 1529–1537 Zheng S, Jayasumana S, Romera-Paredes B, Vineet V, Su Z, Du D, Huang C, Torr PHS (2015) Conditional random fields as recurrent neural networks. p 1529–1537
46.
Zurück zum Zitat Lin G, Shen C, Reid I, Van Dan Hengel A (2015) Efficient piecewise training of deep structured models for semantic segmentation. p 3194–3203 Lin G, Shen C, Reid I, Van Dan Hengel A (2015) Efficient piecewise training of deep structured models for semantic segmentation. p 3194–3203
47.
Zurück zum Zitat Cao Y, Wu Zi, Shen C (2017) Estimating depth from monocular images as classification using deep fully convolutional residual networks. IEEE Trans Circ Syst Video Technol. 99:1–1 Cao Y, Wu Zi, Shen C (2017) Estimating depth from monocular images as classification using deep fully convolutional residual networks. IEEE Trans Circ Syst Video Technol. 99:1–1
48.
Zurück zum Zitat He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. Proceedings of the IEEE conference on computer vision and pattern recognition. p. 770–778 He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. Proceedings of the IEEE conference on computer vision and pattern recognition. p. 770–778
49.
Zurück zum Zitat Zia T, Abbas A, Habib U, Khan MS (2020) Learning deep hierarchical and temporal recurrent neural networks with residual learning. Int J Mach Learn Cybern 11(4):873–882CrossRef Zia T, Abbas A, Habib U, Khan MS (2020) Learning deep hierarchical and temporal recurrent neural networks with residual learning. Int J Mach Learn Cybern 11(4):873–882CrossRef
50.
Zurück zum Zitat Zhou B, Khosla A, Lapedriza A, Oliva A, Torralba A (2015) Learning deep features for discriminative localization. p 2921–2929 Zhou B, Khosla A, Lapedriza A, Oliva A, Torralba A (2015) Learning deep features for discriminative localization. p 2921–2929
52.
Zurück zum Zitat Li R, Xian K, Shen C, Cao Z, Lu H, Hang L (2018) Deep attention-based classification network for robust depth prediction Li R, Xian K, Shen C, Cao Z, Lu H, Hang L (2018) Deep attention-based classification network for robust depth prediction
53.
Zurück zum Zitat Niu Z, Zhou M, Wang L, Gao X, Hua G (2016) Ordinal regression with multiple output cnn for age estimation. The IEEE conference on computer vision and pattern recognition (CVPR) Niu Z, Zhou M, Wang L, Gao X, Hua G (2016) Ordinal regression with multiple output cnn for age estimation. The IEEE conference on computer vision and pattern recognition (CVPR)
54.
Zurück zum Zitat Geiger A (2012) Are we ready for autonomous driving? the kitti vision benchmark suite. IEEE conference on computer vision and pattern recognition. p 3354–3361 Geiger A (2012) Are we ready for autonomous driving? the kitti vision benchmark suite. IEEE conference on computer vision and pattern recognition. p 3354–3361
55.
Zurück zum Zitat Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, and Michael Bernstein. Imagenet large scale visual recognition challenge. International Journal of Computer Vision, 115(3), 211–252, 2015MathSciNetCrossRef Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, and Michael Bernstein. Imagenet large scale visual recognition challenge. International Journal of Computer Vision, 115(3), 211–252, 2015MathSciNetCrossRef
Metadaten
Titel
Attention-based context aggregation network for monocular depth estimation
verfasst von
Yuru Chen
Haitao Zhao
Zhengwei Hu
Jingchao Peng
Publikationsdatum
03.01.2021
Verlag
Springer Berlin Heidelberg
Erschienen in
International Journal of Machine Learning and Cybernetics / Ausgabe 6/2021
Print ISSN: 1868-8071
Elektronische ISSN: 1868-808X
DOI
https://doi.org/10.1007/s13042-020-01251-y

Weitere Artikel der Ausgabe 6/2021

International Journal of Machine Learning and Cybernetics 6/2021 Zur Ausgabe