Skip to main content

2018 | OriginalPaper | Buchkapitel

Visual-Inertial Object Detection and Mapping

verfasst von : Xiaohan Fei, Stefano Soatto

Erschienen in: Computer Vision – ECCV 2018

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

We present a method to populate an unknown environment with models of previously seen objects, placed in a Euclidean reference frame that is inferred causally and on-line using monocular video along with inertial sensors. The system we implement returns a sparse point cloud for the regions of the scene that are visible but not recognized as a previously seen object, and a detailed object model and its pose in the Euclidean frame otherwise. The system includes bottom-up and top-down components, whereby deep networks trained for detection provide likelihood scores for object hypotheses provided by a nonlinear filter, whose state serves as memory. Additional networks provide likelihood scores for edges, which complements detection networks trained to be invariant to small deformations. We test our algorithm on existing datasets, and also introduce the VISMA dataset, that provides ground truth pose, point-cloud map, and object models, along with time-stamped inertial measurements.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Literatur
1.
Zurück zum Zitat Salas-Moreno, R.F., Newcombe, R.A., Strasdat, H., Kelly, P.H., Davison, A.J.: SLAM++: simultaneous localisation and mapping at the level of objects. In: Computer Vision and Pattern Recognition (CVPR) (2013) Salas-Moreno, R.F., Newcombe, R.A., Strasdat, H., Kelly, P.H., Davison, A.J.: SLAM++: simultaneous localisation and mapping at the level of objects. In: Computer Vision and Pattern Recognition (CVPR) (2013)
2.
Zurück zum Zitat Dong, J., Fei, X., Soatto, S.: Visual-inertial-semantic scene representation for 3D object detection. In: Computer Vision and Pattern Recognition (CVPR) (2017) Dong, J., Fei, X., Soatto, S.: Visual-inertial-semantic scene representation for 3D object detection. In: Computer Vision and Pattern Recognition (CVPR) (2017)
3.
Zurück zum Zitat Jazwinski, A.: Stochastic Processes and Filtering Theory. Academic Press, Cambridge (1970) Jazwinski, A.: Stochastic Processes and Filtering Theory. Academic Press, Cambridge (1970)
4.
Zurück zum Zitat Mourikis, A., Roumeliotis, S.: A multi-state constraint kalman filter for vision-aided inertial navigation. In: International Conference on Robotics and Automation (ICRA) (2007) Mourikis, A., Roumeliotis, S.: A multi-state constraint kalman filter for vision-aided inertial navigation. In: International Conference on Robotics and Automation (ICRA) (2007)
5.
Zurück zum Zitat Tsotsos, K., Chiuso, A., Soatto, S.: Robust inference for visual-inertial sensor fusion. In: International Conference on Robotics and Automation (ICRA) (2015) Tsotsos, K., Chiuso, A., Soatto, S.: Robust inference for visual-inertial sensor fusion. In: International Conference on Robotics and Automation (ICRA) (2015)
6.
Zurück zum Zitat Civera, J., Davison, A.J., Montiel, J.M.: Inverse depth parametrization for monocular slam. IEEE Trans. Robot. 24(5), 932–945 (2008)CrossRef Civera, J., Davison, A.J., Montiel, J.M.: Inverse depth parametrization for monocular slam. IEEE Trans. Robot. 24(5), 932–945 (2008)CrossRef
7.
Zurück zum Zitat Blake, A., Isard, M.: The condensation algorithm-conditional density propagation and applications to visual tracking. In: Advances in Neural Information Processing Systems (NIPS) (1997) Blake, A., Isard, M.: The condensation algorithm-conditional density propagation and applications to visual tracking. In: Advances in Neural Information Processing Systems (NIPS) (1997)
8.
Zurück zum Zitat Drummond, T., Cipolla, R.: Real-time visual tracking of complex structures. IEEE Trans. Pattern Anal. Mach. Intell. (PAMI) 24(7), 932–946 (2002)CrossRef Drummond, T., Cipolla, R.: Real-time visual tracking of complex structures. IEEE Trans. Pattern Anal. Mach. Intell. (PAMI) 24(7), 932–946 (2002)CrossRef
9.
Zurück zum Zitat Klein, G., Murray, D.W.: Full-3D edge tracking with a particle filter. In: British Machine Vision Conference (BMVC) (2006) Klein, G., Murray, D.W.: Full-3D edge tracking with a particle filter. In: British Machine Vision Conference (BMVC) (2006)
10.
Zurück zum Zitat Canny, J.: A computational approach to edge detection. In: Readings in Computer Vision, pp. 184–203. Elsevier (1987) Canny, J.: A computational approach to edge detection. In: Readings in Computer Vision, pp. 184–203. Elsevier (1987)
11.
Zurück zum Zitat Gordon, N.J., Salmond, D.J., Smith, A.F.: Novel approach to nonlinear/non-gaussian bayesian state estimation. In: IEE Proceedings F (Radar and Signal Processing), vol. 140, pp. 107–113. IET (1993) Gordon, N.J., Salmond, D.J., Smith, A.F.: Novel approach to nonlinear/non-gaussian bayesian state estimation. In: IEE Proceedings F (Radar and Signal Processing), vol. 140, pp. 107–113. IET (1993)
13.
Zurück zum Zitat Badrinarayanan, V., Kendall, A., Cipolla, R.: SegNet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. (PAMI) 12, 2481–2495 (2017)CrossRef Badrinarayanan, V., Kendall, A., Cipolla, R.: SegNet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. (PAMI) 12, 2481–2495 (2017)CrossRef
14.
Zurück zum Zitat Martin, D., Fowlkes, C., Tal, D., Malik, J.: A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In: International Conference on Computer Vision (ICCV) (2001) Martin, D., Fowlkes, C., Tal, D., Malik, J.: A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In: International Conference on Computer Vision (ICCV) (2001)
15.
Zurück zum Zitat Sturm, J., Engelhard, N., Endres, F., Burgard, W., Cremers, D.: A benchmark for the evaluation of RGB-D slam systems. In: International Conference on Intelligent Robots and Systems (IROS) (2012) Sturm, J., Engelhard, N., Endres, F., Burgard, W., Cremers, D.: A benchmark for the evaluation of RGB-D slam systems. In: International Conference on Intelligent Robots and Systems (IROS) (2012)
16.
Zurück zum Zitat Handa, A., Whelan, T., McDonald, J., Davison, A.J.: A benchmark for RGB-D visual odometry, 3D reconstruction and SLAM. In: International Conference on Robotics and Automation (ICRA) (2014) Handa, A., Whelan, T., McDonald, J., Davison, A.J.: A benchmark for RGB-D visual odometry, 3D reconstruction and SLAM. In: International Conference on Robotics and Automation (ICRA) (2014)
17.
Zurück zum Zitat Geiger, A., Lenz, P., Stiller, C., Urtasun, R.: Vision meets robotics: the KITTI dataset. Int. J. Robot. Res. (IJRR) 32(11), 1231–1237 (2013)CrossRef Geiger, A., Lenz, P., Stiller, C., Urtasun, R.: Vision meets robotics: the KITTI dataset. Int. J. Robot. Res. (IJRR) 32(11), 1231–1237 (2013)CrossRef
18.
Zurück zum Zitat Burri, M.: The EuRoC micro aerial vehicle datasets. Int. J. Robot. Res. (IJCV) 35(10), 1157–1163 (2016)CrossRef Burri, M.: The EuRoC micro aerial vehicle datasets. Int. J. Robot. Res. (IJCV) 35(10), 1157–1163 (2016)CrossRef
19.
Zurück zum Zitat Pfrommer, B., Sanket, N., Daniilidis, K., Cleveland, J.: Penncosyvio: a challenging visual inertial odometry benchmark. In: International Conference on Robotics and Automation (ICRA) (2017) Pfrommer, B., Sanket, N., Daniilidis, K., Cleveland, J.: Penncosyvio: a challenging visual inertial odometry benchmark. In: International Conference on Robotics and Automation (ICRA) (2017)
20.
Zurück zum Zitat Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The pascal visual object classes (voc) challenge. Int. J. Comput. Vis. (IJCV) 88(2), 303–338 (2010)CrossRef Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The pascal visual object classes (voc) challenge. Int. J. Comput. Vis. (IJCV) 88(2), 303–338 (2010)CrossRef
21.
Zurück zum Zitat Russakovsky, O., et al.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. (IJCV) 115(3), 211–252 (2015)MathSciNetCrossRef Russakovsky, O., et al.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. (IJCV) 115(3), 211–252 (2015)MathSciNetCrossRef
23.
Zurück zum Zitat Xiang, Y., Mottaghi, R., Savarese, S.: Beyond pascal: a benchmark for 3D object detection in the wild. In: Winter Conference on Applications of Computer Vision (WACV) (2014) Xiang, Y., Mottaghi, R., Savarese, S.: Beyond pascal: a benchmark for 3D object detection in the wild. In: Winter Conference on Applications of Computer Vision (WACV) (2014)
25.
Zurück zum Zitat Hua, B.S., Pham, Q.H., Nguyen, D.T., Tran, M.K., Yu, L.F., Yeung, S.K.: Scenenn: a scene meshes dataset with annotations. In: 3D Vision (3DV) (2016) Hua, B.S., Pham, Q.H., Nguyen, D.T., Tran, M.K., Yu, L.F., Yeung, S.K.: Scenenn: a scene meshes dataset with annotations. In: 3D Vision (3DV) (2016)
26.
Zurück zum Zitat Savva, M., et al.: Shrec16 track large-scale 3D shape retrieval from shapenet core55. In: Proceedings of the Eurographics Workshop on 3D Object Retrieval (2016) Savva, M., et al.: Shrec16 track large-scale 3D shape retrieval from shapenet core55. In: Proceedings of the Eurographics Workshop on 3D Object Retrieval (2016)
27.
Zurück zum Zitat Whelan, T., Leutenegger, S., Salas-Moreno, R.F., Glocker, B., Davison, A.J.: Elasticfusion: dense slam without a pose graph. In: Robotics: Science and Systems (RSS) (2015) Whelan, T., Leutenegger, S., Salas-Moreno, R.F., Glocker, B., Davison, A.J.: Elasticfusion: dense slam without a pose graph. In: Robotics: Science and Systems (RSS) (2015)
29.
Zurück zum Zitat Castle, R.O., Klein, G., Murray, D.W.: Combining monoslam with object recognition for scene augmentation using a wearable camera. Image Vis. Comput. 28(11), 1548–1556 (2010)CrossRef Castle, R.O., Klein, G., Murray, D.W.: Combining monoslam with object recognition for scene augmentation using a wearable camera. Image Vis. Comput. 28(11), 1548–1556 (2010)CrossRef
30.
Zurück zum Zitat Civera, J., Gálvez-López, D., Riazuelo, L., Tardós, J.D., Montiel, J.: Towards semantic slam using a monocular camera. In: International Conference on Intelligent Robots and Systems (IROS) (2011) Civera, J., Gálvez-López, D., Riazuelo, L., Tardós, J.D., Montiel, J.: Towards semantic slam using a monocular camera. In: International Conference on Intelligent Robots and Systems (IROS) (2011)
31.
Zurück zum Zitat Kundu, A., Li, Y., Dellaert, F., Li, F., Rehg, J.: Joint semantic segmentation and 3D reconstruction from monocular video. In: European Conference Computer Vision (ECCV) (2014) Kundu, A., Li, Y., Dellaert, F., Li, F., Rehg, J.: Joint semantic segmentation and 3D reconstruction from monocular video. In: European Conference Computer Vision (ECCV) (2014)
32.
Zurück zum Zitat Hermans, A., Floros, G., Leibe, B.: Dense 3D semantic mapping of indoor scenes from RGB-D images. In: International Conference on Robotics and Automation (ICRA) (2014) Hermans, A., Floros, G., Leibe, B.: Dense 3D semantic mapping of indoor scenes from RGB-D images. In: International Conference on Robotics and Automation (ICRA) (2014)
33.
Zurück zum Zitat Vineet, V.E.A.: Incremental dense semantic stereo fusion for large-scale semantic scene reconstruction. In: International Conference on Robotics and Automation (ICRA) (2015) Vineet, V.E.A.: Incremental dense semantic stereo fusion for large-scale semantic scene reconstruction. In: International Conference on Robotics and Automation (ICRA) (2015)
34.
Zurück zum Zitat McCormac, J., Handa, A., Davison, A., Leutenegger, S.: Semanticfusion: dense 3D semantic mapping with convolutional neural networks. In: International Conference on Robotics and Automation (ICRA) (2017) McCormac, J., Handa, A., Davison, A., Leutenegger, S.: Semanticfusion: dense 3D semantic mapping with convolutional neural networks. In: International Conference on Robotics and Automation (ICRA) (2017)
35.
Zurück zum Zitat Bowman, S.L., Atanasov, N., Daniilidis, K., Pappas, G.J.: Probabilistic data association for semantic slam. In: International Conference on Robotics and Automation (ICRA) (2017) Bowman, S.L., Atanasov, N., Daniilidis, K., Pappas, G.J.: Probabilistic data association for semantic slam. In: International Conference on Robotics and Automation (ICRA) (2017)
36.
Zurück zum Zitat Klein, G., Murray, D.: Parallel tracking and mapping for small AR workspaces. In: International Symposium on Mixed and Augmented Reality (ISMAR) (2007) Klein, G., Murray, D.: Parallel tracking and mapping for small AR workspaces. In: International Symposium on Mixed and Augmented Reality (ISMAR) (2007)
37.
Zurück zum Zitat Girshick, R.: Fast R-CNN. In: International Conference on Computer Vision (ICCV) (2015) Girshick, R.: Fast R-CNN. In: International Conference on Computer Vision (ICCV) (2015)
38.
Zurück zum Zitat Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems (NIPS) (2015) Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems (NIPS) (2015)
39.
Zurück zum Zitat He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: International Conference on Computer Vision (ICCV) (2017) He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: International Conference on Computer Vision (ICCV) (2017)
41.
Zurück zum Zitat Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: Computer Vision and Pattern Recognition (CVPR) (2016) Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: Computer Vision and Pattern Recognition (CVPR) (2016)
42.
Zurück zum Zitat Choi, C., Christensen, H.I.: 3D textureless object detection and tracking: An edge-based approach. In: International Conference on Intelligent Robots and Systems (IROS) (2012) Choi, C., Christensen, H.I.: 3D textureless object detection and tracking: An edge-based approach. In: International Conference on Intelligent Robots and Systems (IROS) (2012)
43.
Zurück zum Zitat Lepetit, V., Fua, P., et al.: Monocular model-based 3D tracking of rigid objects: a survey. In: Foundations and Trends\({\textregistered }\) in Computer Graphics and Vision (2005)CrossRef Lepetit, V., Fua, P., et al.: Monocular model-based 3D tracking of rigid objects: a survey. In: Foundations and Trends\({\textregistered }\) in Computer Graphics and Vision (2005)CrossRef
44.
Zurück zum Zitat Prisacariu, V.A., Reid, I.D.: Pwp3D: real-time segmentation and tracking of 3D objects. Int. J. Comput. Vis. (IJCV) 98(3), 335–354 (2012)MathSciNetCrossRef Prisacariu, V.A., Reid, I.D.: Pwp3D: real-time segmentation and tracking of 3D objects. Int. J. Comput. Vis. (IJCV) 98(3), 335–354 (2012)MathSciNetCrossRef
45.
Zurück zum Zitat Tjaden, H., Schwanecke, U., Schömer, E.: Real-time monocular pose estimation of 3D objects using temporally consistent local color histograms. In: Computer Vision and Pattern Recognition (CVPR) (2017) Tjaden, H., Schwanecke, U., Schömer, E.: Real-time monocular pose estimation of 3D objects using temporally consistent local color histograms. In: Computer Vision and Pattern Recognition (CVPR) (2017)
Metadaten
Titel
Visual-Inertial Object Detection and Mapping
verfasst von
Xiaohan Fei
Stefano Soatto
Copyright-Jahr
2018
DOI
https://doi.org/10.1007/978-3-030-01252-6_19

Premium Partner