Skip to main content
Erschienen in: Cognitive Computation 2/2018

24.11.2017

Semantic Scene Mapping with Spatio-temporal Deep Neural Network for Robotic Applications

verfasst von: Ruihao Li, Dongbing Gu, Qiang Liu, Zhiqiang Long, Huosheng Hu

Erschienen in: Cognitive Computation | Ausgabe 2/2018

Einloggen

Aktivieren Sie unsere intelligente Suche um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Semantic scene mapping is a challenge and significant task for robotic application, such as autonomous navigation and robot-environment interaction. In this paper, we propose a semantic pixel-wise mapping system for potential robotic applications. The system includes a novel spatio-temporal deep neural network for semantic segmentation and a Simultaneous Localisation and Mapping (SLAM) algorithm for 3D point cloud map. Their combination yields a 3D semantic pixel-wise map. The proposed network consists of Convolutional Neural Networks (CNNs) with two streams: spatial stream with images as the input and temporal stream with image differences as the input. Due to the use of both spatial and temporal information, it is called spatio-temporal deep neural network, which shows a better performance in both accuracy and robustness in semantic segmentation. Further, only keyframes are selected for semantic segmentation in order to reduce the computational burden for video streams and improve the real-time performance. Based on the result of semantic segmentation, a 3D semantic map is built up by using the 3D point cloud map from a SLAM algorithm. The proposed spatio-temporal neural network is evaluated on both Cityscapes benchmark (a public dataset) and Essex Indoor benchmark (a dataset we labelled ourselves manually). Compared with the state-of-the-art spatial only neural networks, the proposed network achieves better performances in both pixel-wise accuracy and Intersection over Union (IoU) for scene segmentation. The constructed 3D semantic map with our methods is accurate and meaningful for robotic applications.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Literatur
1.
Zurück zum Zitat Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A. Going deeper with convolutions. Proceedings of the IEEE conference on computer vision and pattern recognition; 2015. p. 1–9. Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A. Going deeper with convolutions. Proceedings of the IEEE conference on computer vision and pattern recognition; 2015. p. 1–9.
2.
Zurück zum Zitat He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. Proceedings of the IEEE conference on computer vision and pattern recognition; 2016. p. 770–8. He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. Proceedings of the IEEE conference on computer vision and pattern recognition; 2016. p. 770–8.
3.
Zurück zum Zitat Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation. Proceedings of the IEEE conference on computer vision and pattern recognition; 2015. p. 3431–40. Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation. Proceedings of the IEEE conference on computer vision and pattern recognition; 2015. p. 3431–40.
4.
5.
Zurück zum Zitat Durrant-Whyte H, Bailey T. Simultaneous localization and mapping: part I. IEEE Robot Autom Mag 2006; 13(2):99–110.CrossRef Durrant-Whyte H, Bailey T. Simultaneous localization and mapping: part I. IEEE Robot Autom Mag 2006; 13(2):99–110.CrossRef
6.
Zurück zum Zitat Bailey T, Durrant-Whyte H. Simultaneous localization and mapping: part II. IEEE Robot Autom Mag 2006; 13(3):108–17.CrossRef Bailey T, Durrant-Whyte H. Simultaneous localization and mapping: part II. IEEE Robot Autom Mag 2006; 13(3):108–17.CrossRef
7.
Zurück zum Zitat Xie J, Yu L, Zhu L, Chen X. Semantic image segmentation method with multiple adjacency trees and multiscale features. Cogn Comput 2017;9(2):168–79.CrossRef Xie J, Yu L, Zhu L, Chen X. Semantic image segmentation method with multiple adjacency trees and multiscale features. Cogn Comput 2017;9(2):168–79.CrossRef
8.
Zurück zum Zitat Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. Proceedings of the 3rd international conference on learning representations; 2015. p. 1–14. Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. Proceedings of the 3rd international conference on learning representations; 2015. p. 1–14.
9.
10.
Zurück zum Zitat Badrinarayanan V, Kendall A, Cipolla R. 2015. Segnet: a deep convolutional encoder-decoder architecture for image segmentation. arXiv:1511.00561. Badrinarayanan V, Kendall A, Cipolla R. 2015. Segnet: a deep convolutional encoder-decoder architecture for image segmentation. arXiv:1511.​00561.
11.
Zurück zum Zitat Kendall A, Badrinarayanan V, Cipolla R. 2015. Bayesian segnet: model uncertainty in deep convolutional encoder-decoder architectures for scene understanding. arXiv:1511.02680. Kendall A, Badrinarayanan V, Cipolla R. 2015. Bayesian segnet: model uncertainty in deep convolutional encoder-decoder architectures for scene understanding. arXiv:1511.​02680.
12.
Zurück zum Zitat Zheng S, Jayasumana S, Romera-Paredes B, Vineet V, Su Z, Du D, Huang C, Torr PH. Conditional random fields as recurrent neural networks. Proceedings of the IEEE international conference on computer vision; 2015. p. 1529–37. Zheng S, Jayasumana S, Romera-Paredes B, Vineet V, Su Z, Du D, Huang C, Torr PH. Conditional random fields as recurrent neural networks. Proceedings of the IEEE international conference on computer vision; 2015. p. 1529–37.
13.
Zurück zum Zitat Arnab A, Jayasumana S, Zheng S, Torr PH. Higher order conditional random fields in deep neural networks. European conference on computer vision. Springer; 2016. p. 524–40. Arnab A, Jayasumana S, Zheng S, Torr PH. Higher order conditional random fields in deep neural networks. European conference on computer vision. Springer; 2016. p. 524–40.
14.
Zurück zum Zitat Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L. Imagenet: a large-scale hierarchical image database. IEEE conference on computer vision and pattern recognition, 2009. CVPR 2009. IEEE; 2009. p. 248–55. Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L. Imagenet: a large-scale hierarchical image database. IEEE conference on computer vision and pattern recognition, 2009. CVPR 2009. IEEE; 2009. p. 248–55.
15.
Zurück zum Zitat Chen L-C, Papandreou G, Kokkinos I, Murphy K, Yuille AL. 2014. Semantic image segmentation with deep convolutional nets and fully connected CRFs. arXiv:1412.7062. Chen L-C, Papandreou G, Kokkinos I, Murphy K, Yuille AL. 2014. Semantic image segmentation with deep convolutional nets and fully connected CRFs. arXiv:1412.​7062.
16.
Zurück zum Zitat Chen L-C, Papandreou G, Kokkinos I, Murphy K, Yuille AL. 2016. Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. arXiv:1606.00915. Chen L-C, Papandreou G, Kokkinos I, Murphy K, Yuille AL. 2016. Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. arXiv:1606.​00915.
17.
Zurück zum Zitat Chen L-C, Yang Y, Wang J, Xu W, Yuille AL. Attention to scale: scale-aware semantic image segmentation. Proceedings of the IEEE conference on computer vision and pattern recognition; 2016. p. 3640–9. Chen L-C, Yang Y, Wang J, Xu W, Yuille AL. Attention to scale: scale-aware semantic image segmentation. Proceedings of the IEEE conference on computer vision and pattern recognition; 2016. p. 3640–9.
18.
Zurück zum Zitat Everingham M, Eslami SA, Van Gool L, Williams CK, Winn J, Zisserman A. The pascal visual object classes challenge: a retrospective. Int J Comput Vis 2015;111(1):98–136.CrossRef Everingham M, Eslami SA, Van Gool L, Williams CK, Winn J, Zisserman A. The pascal visual object classes challenge: a retrospective. Int J Comput Vis 2015;111(1):98–136.CrossRef
19.
Zurück zum Zitat Wu Z, Shen C, Hengel AVD. 2016. High-performance semantic segmentation using very deep fully convolutional networks. arXiv:1604.04339. Wu Z, Shen C, Hengel AVD. 2016. High-performance semantic segmentation using very deep fully convolutional networks. arXiv:1604.​04339.
20.
Zurück zum Zitat Cordts M, Omran M, Ramos S, Rehfeld T, Enzweiler M, Benenson R, Franke U, Roth S, Schiele B. The cityscapes dataset for semantic urban scene understanding. Proceedings of the IEEE conference on computer vision and pattern recognition; 2016. p. 3213–23. Cordts M, Omran M, Ramos S, Rehfeld T, Enzweiler M, Benenson R, Franke U, Roth S, Schiele B. The cityscapes dataset for semantic urban scene understanding. Proceedings of the IEEE conference on computer vision and pattern recognition; 2016. p. 3213–23.
21.
Zurück zum Zitat Wu Z, Shen C, Hengel AVD. 2016. Wider or deeper: revisiting the resnet model for visual recognition. arXiv:1611.10080. Wu Z, Shen C, Hengel AVD. 2016. Wider or deeper: revisiting the resnet model for visual recognition. arXiv:1611.​10080.
22.
Zurück zum Zitat Zhou B, Zhao H, Puig X, Fidler S, Barriuso A, Torralba A. 2016. Semantic understanding of scenes through the ade20k dataset. arXiv:1608.05442. Zhou B, Zhao H, Puig X, Fidler S, Barriuso A, Torralba A. 2016. Semantic understanding of scenes through the ade20k dataset. arXiv:1608.​05442.
23.
Zurück zum Zitat Tu Z, Abel A, Zhang L, Luo B, Hussain A. A new spatio-temporal saliency-based video object segmentation. Cogn Comput 2016;8(4):629–647.CrossRef Tu Z, Abel A, Zhang L, Luo B, Hussain A. A new spatio-temporal saliency-based video object segmentation. Cogn Comput 2016;8(4):629–647.CrossRef
24.
Zurück zum Zitat Doborjeh ZG, Doborjeh MG, Kasabov N. Attentional bias pattern recognition in spiking neural networks from spatio-temporal EEG data. Cogn Comput, 2017:1–14. Doborjeh ZG, Doborjeh MG, Kasabov N. Attentional bias pattern recognition in spiking neural networks from spatio-temporal EEG data. Cogn Comput, 2017:1–14.
25.
Zurück zum Zitat Wang S, Clark R, Wen H, Trigoni N. DeepVO: towards end-to-end visual odometry with deep recurrent convolutional neural networks. 2017 IEEE international conference on robotics and automation (ICRA). IEEE; 2017. p. 2043–50. Wang S, Clark R, Wen H, Trigoni N. DeepVO: towards end-to-end visual odometry with deep recurrent convolutional neural networks. 2017 IEEE international conference on robotics and automation (ICRA). IEEE; 2017. p. 2043–50.
26.
Zurück zum Zitat Wang L, Xiong Y, Wang Z, Qiao Y. 2015. Towards good practices for very deep two-stream convnets. arXiv:1507.02159. Wang L, Xiong Y, Wang Z, Qiao Y. 2015. Towards good practices for very deep two-stream convnets. arXiv:1507.​02159.
27.
Zurück zum Zitat Wang L, Xiong Y, Wang Z, Qiao Y, Lin D, Tang X, Van Gool L. Temporal segment networks: towards good practices for deep action recognition. European conference on computer vision. Springer; 2016. p. 20–36. Wang L, Xiong Y, Wang Z, Qiao Y, Lin D, Tang X, Van Gool L. Temporal segment networks: towards good practices for deep action recognition. European conference on computer vision. Springer; 2016. p. 20–36.
28.
Zurück zum Zitat Li R, Liu Q, Gui J, Gu D, Hu H. 2017. Indoor relocalization in challenging environments with dual-stream convolutional neural networks. IEEE Trans Autom Sci Eng. Li R, Liu Q, Gui J, Gu D, Hu H. 2017. Indoor relocalization in challenging environments with dual-stream convolutional neural networks. IEEE Trans Autom Sci Eng.
29.
Zurück zum Zitat Eitel A, Springenberg JT, Spinello L, Riedmiller M, Burgard W. Multimodal deep learning for robust RGB-d object recognition. 2015 IEEE/RSJ international conference on intelligent robots and systems (IROS). IEEE; 2015. p. 681–7. Eitel A, Springenberg JT, Spinello L, Riedmiller M, Burgard W. Multimodal deep learning for robust RGB-d object recognition. 2015 IEEE/RSJ international conference on intelligent robots and systems (IROS). IEEE; 2015. p. 681–7.
30.
Zurück zum Zitat Schwarz M, Schulz H, Behnke S. RGB-D object recognition and pose estimation based on pre-trained convolutional neural network features. 2015 IEEE international conference on robotics and automation (ICRA). IEEE; 2015. p. 1329–35. Schwarz M, Schulz H, Behnke S. RGB-D object recognition and pose estimation based on pre-trained convolutional neural network features. 2015 IEEE international conference on robotics and automation (ICRA). IEEE; 2015. p. 1329–35.
31.
Zurück zum Zitat Hazirbas C, Ma L, Domokos C, Cremers D. Fusenet: incorporating depth into semantic segmentation via fusion-based CNN architecture. Proceedings of ACCV; 2016. Hazirbas C, Ma L, Domokos C, Cremers D. Fusenet: incorporating depth into semantic segmentation via fusion-based CNN architecture. Proceedings of ACCV; 2016.
32.
Zurück zum Zitat Valada A, Oliveira G, Brox T, Burgard W. Towards robust semantic segmentation using deep fusion. Robotics: science and systems (RSS 2016) workshop, are the sceptics right? Limits and potentials of deep learning in robotics; 2016. Valada A, Oliveira G, Brox T, Burgard W. Towards robust semantic segmentation using deep fusion. Robotics: science and systems (RSS 2016) workshop, are the sceptics right? Limits and potentials of deep learning in robotics; 2016.
33.
Zurück zum Zitat Valada A, Vertens J, Dhall A, Burgard W. Adapnet: adaptive semantic segmentation in adverse environmental conditions. 2017 IEEE international conference on robotics and automation (ICRA). IEEE; 2017. Valada A, Vertens J, Dhall A, Burgard W. Adapnet: adaptive semantic segmentation in adverse environmental conditions. 2017 IEEE international conference on robotics and automation (ICRA). IEEE; 2017.
34.
Zurück zum Zitat Hülse M, McBride S, Lee M. Fast learning mapping schemes for robotic hand–eye coordination. Cogn Comput 2010;2(1):1–16.CrossRef Hülse M, McBride S, Lee M. Fast learning mapping schemes for robotic hand–eye coordination. Cogn Comput 2010;2(1):1–16.CrossRef
35.
Zurück zum Zitat Salas-Moreno RF, Glocken B, Kelly PH, Davison AJ. Dense planar slam. 2014 IEEE international symposium on mixed and augmented reality (ISMAR). IEEE; 2014. p. 157–64. Salas-Moreno RF, Glocken B, Kelly PH, Davison AJ. Dense planar slam. 2014 IEEE international symposium on mixed and augmented reality (ISMAR). IEEE; 2014. p. 157–64.
36.
Zurück zum Zitat Salas-Moreno RF, Newcombe RA, Strasdat H, Kelly PH, Davison AJ. Slam++: simultaneous localisation and mapping at the level of objects. Proceedings of the IEEE conference on computer vision and pattern recognition; 2013. p. 1352–9. Salas-Moreno RF, Newcombe RA, Strasdat H, Kelly PH, Davison AJ. Slam++: simultaneous localisation and mapping at the level of objects. Proceedings of the IEEE conference on computer vision and pattern recognition; 2013. p. 1352–9.
37.
Zurück zum Zitat Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T. Caffe: convolutional architecture for fast feature embedding. Proceedings of the ACM international conference on multimedia. ACM; 2014. p. 675–8. Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T. Caffe: convolutional architecture for fast feature embedding. Proceedings of the ACM international conference on multimedia. ACM; 2014. p. 675–8.
38.
Zurück zum Zitat Mur-Artal R, Tardós JD. Fast relocalisation and loop closing in keyframe-based SLAM. 2014 IEEE international conference on robotics and automation (ICRA). IEEE; 2014. p. 846–53. Mur-Artal R, Tardós JD. Fast relocalisation and loop closing in keyframe-based SLAM. 2014 IEEE international conference on robotics and automation (ICRA). IEEE; 2014. p. 846–53.
39.
Zurück zum Zitat Mur-Artal R, Montiel J, Tardos JD. ORB-SLAM: a versatile and accurate monocular SLAM system. IEEE Trans Robot 2015;31(5):1147–63.CrossRef Mur-Artal R, Montiel J, Tardos JD. ORB-SLAM: a versatile and accurate monocular SLAM system. IEEE Trans Robot 2015;31(5):1147–63.CrossRef
Metadaten
Titel
Semantic Scene Mapping with Spatio-temporal Deep Neural Network for Robotic Applications
verfasst von
Ruihao Li
Dongbing Gu
Qiang Liu
Zhiqiang Long
Huosheng Hu
Publikationsdatum
24.11.2017
Verlag
Springer US
Erschienen in
Cognitive Computation / Ausgabe 2/2018
Print ISSN: 1866-9956
Elektronische ISSN: 1866-9964
DOI
https://doi.org/10.1007/s12559-017-9526-9

Weitere Artikel der Ausgabe 2/2018

Cognitive Computation 2/2018 Zur Ausgabe