Skip to main content

2018 | OriginalPaper | Buchkapitel

Multimodal Deep Learning for Advanced Driving Systems

verfasst von : Nerea Aranjuelo, Luis Unzueta, Ignacio Arganda-Carreras, Oihana Otaegui

Erschienen in: Articulated Motion and Deformable Objects

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Multimodal deep learning is about learning features over multiple modalities. Impressive progress has been made in deep learning solutions that rely on a single sensor modality for advanced driving. However, these approaches are limited to cover certain functionalities. The potential of multimodal sensor fusion has been very little exploited, although research vehicles are commonly provided with various sensor types. How to combine their data to achieve a complex scene analysis and improve therefore robustness in driving is still an open question. While different surveys have been done for intelligent vehicles or deep learning, to date no survey on multimodal deep learning for advanced driving exists. This paper attempts to narrow this gap by providing the first review that analyzes existing literature and two indispensable elements: sensors and datasets. We also provide our insights on future challenges and work to be done.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Guo, Y., Liu, Y., Georgiou, T., Lew, M.S.: A review of semantic segmentation using deep neural networks. IJMIR 7(2), 87–93 (2017) Guo, Y., Liu, Y., Georgiou, T., Lew, M.S.: A review of semantic segmentation using deep neural networks. IJMIR 7(2), 87–93 (2017)
2.
Zurück zum Zitat Han, J., Zhang, D., Cheng, G., Liu, N., Xu, D.: Advanced deep-learning techniques for salient and category-specific object detection: a survey. IEEE Signal Process. Magaz. 35(1), 84–100 (2018)CrossRef Han, J., Zhang, D., Cheng, G., Liu, N., Xu, D.: Advanced deep-learning techniques for salient and category-specific object detection: a survey. IEEE Signal Process. Magaz. 35(1), 84–100 (2018)CrossRef
3.
Zurück zum Zitat Chowdhuri, S., Pankaj, T., Zipser, K.: Multi-modal multi-task deep learning for autonomous driving. CoRR abs/1709.05581 (2017) Chowdhuri, S., Pankaj, T., Zipser, K.: Multi-modal multi-task deep learning for autonomous driving. CoRR abs/1709.05581 (2017)
4.
Zurück zum Zitat Liu, G.H., Siravuru, A., Prabhakar, S., Veloso, M., Kantor, G.: Learning end-to-end multimodal sensor policies for autonomous navigation. arXiv preprint arXiv:1705.10422 (2017) Liu, G.H., Siravuru, A., Prabhakar, S., Veloso, M., Kantor, G.: Learning end-to-end multimodal sensor policies for autonomous navigation. arXiv preprint arXiv:​1705.​10422 (2017)
5.
Zurück zum Zitat Janai, J., Güney, F., Behl, A., Geiger, A.: Computer vision for autonomous vehicles: problems, datasets and state-of-the-art. arXiv preprint arXiv:1704.05519 (2017) Janai, J., Güney, F., Behl, A., Geiger, A.: Computer vision for autonomous vehicles: problems, datasets and state-of-the-art. arXiv preprint arXiv:​1704.​05519 (2017)
6.
Zurück zum Zitat Ngiam, J., Khosla, A., Kim, M., Nam, J., Lee, H., Ng, A.Y.: Multimodal deep learning. In: Proceedings of the 28th ICML 2011, pp. 689–696 (2011) Ngiam, J., Khosla, A., Kim, M., Nam, J., Lee, H., Ng, A.Y.: Multimodal deep learning. In: Proceedings of the 28th ICML 2011, pp. 689–696 (2011)
7.
Zurück zum Zitat Zhou, Y., Tuzel, O.: VoxelNet: end-to-end learning for point cloud based 3D object detection. arXiv preprint arXiv:1711.06396 (2017) Zhou, Y., Tuzel, O.: VoxelNet: end-to-end learning for point cloud based 3D object detection. arXiv preprint arXiv:​1711.​06396 (2017)
8.
Zurück zum Zitat Xiao, L., Wang, R., Dai, B., Fang, Y., Liu, D., Wu, T.: Hybrid conditional random field based camera-LiDAR fusion for road detection. Inf. Sci. 432, 543–558 (2017)MathSciNetCrossRef Xiao, L., Wang, R., Dai, B., Fang, Y., Liu, D., Wu, T.: Hybrid conditional random field based camera-LiDAR fusion for road detection. Inf. Sci. 432, 543–558 (2017)MathSciNetCrossRef
9.
Zurück zum Zitat Mousavian, A., Anguelov, D., Flynn, J., Košecká, J.: 3D bounding box estimation using deep learning and geometry. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5632–5640. IEEE (2017) Mousavian, A., Anguelov, D., Flynn, J., Košecká, J.: 3D bounding box estimation using deep learning and geometry. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5632–5640. IEEE (2017)
10.
Zurück zum Zitat Oliveira, G.L., Burgard, W., Brox, T.: Efficient deep models for monocular road segmentation. In: 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 4885–4891. IEEE (2016) Oliveira, G.L., Burgard, W., Brox, T.: Efficient deep models for monocular road segmentation. In: 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 4885–4891. IEEE (2016)
11.
Zurück zum Zitat Liu, J., Zhang, S., Wang, S., Metaxas, D.N.: Multispectral deep neural networks for pedestrian detection. arXiv preprint arXiv:1611.02644 (2016) Liu, J., Zhang, S., Wang, S., Metaxas, D.N.: Multispectral deep neural networks for pedestrian detection. arXiv preprint arXiv:​1611.​02644 (2016)
12.
Zurück zum Zitat Carullo, A., Parvis, M.: An ultrasonic sensor for distance measurement in automotive applications. IEEE Sensors J. 1(2), 143–147 (2001)CrossRef Carullo, A., Parvis, M.: An ultrasonic sensor for distance measurement in automotive applications. IEEE Sensors J. 1(2), 143–147 (2001)CrossRef
13.
Zurück zum Zitat Lombacher, J., Hahn, M., Dickmann, J., Wöhler, C.: Potential of radar for static object classification using deep learning methods. In: IEEE MTT-S International Conference on Microwaves for Intelligent Mobility (ICMIM), pp. 1–4. IEEE (2016) Lombacher, J., Hahn, M., Dickmann, J., Wöhler, C.: Potential of radar for static object classification using deep learning methods. In: IEEE MTT-S International Conference on Microwaves for Intelligent Mobility (ICMIM), pp. 1–4. IEEE (2016)
14.
Zurück zum Zitat Virdi, J.: Using deep learning to predict obstacle trajectories for collision avoidance in autonomous vehicles. Ph.D. thesis, UC, San Diego (2017) Virdi, J.: Using deep learning to predict obstacle trajectories for collision avoidance in autonomous vehicles. Ph.D. thesis, UC, San Diego (2017)
15.
Zurück zum Zitat Shimada, H., Yamaguchi, A., Takada, H., Sato, K.: Implementation and evaluation of local dynamic map in safety driving systems. JTTs 5(02), 102 (2015)CrossRef Shimada, H., Yamaguchi, A., Takada, H., Sato, K.: Implementation and evaluation of local dynamic map in safety driving systems. JTTs 5(02), 102 (2015)CrossRef
19.
Zurück zum Zitat Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? The KITTI vision benchmark suite. In: Conference on CVPR (2012) Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? The KITTI vision benchmark suite. In: Conference on CVPR (2012)
20.
Zurück zum Zitat Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: Proceedings of the IEEE Vonference on CVPR, pp. 3213–3223 (2016) Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: Proceedings of the IEEE Vonference on CVPR, pp. 3213–3223 (2016)
21.
Zurück zum Zitat Wang, S., Bai, M., Mattyus, G., Chu, H., Luo, W., Yang, B., Liang, J., Cheverie, J., Fidler, S., Urtasun, R.: Torontocity: seeing the world with a million eyes. arXiv preprint arXiv:1612.00423 (2016) Wang, S., Bai, M., Mattyus, G., Chu, H., Luo, W., Yang, B., Liang, J., Cheverie, J., Fidler, S., Urtasun, R.: Torontocity: seeing the world with a million eyes. arXiv preprint arXiv:​1612.​00423 (2016)
22.
Zurück zum Zitat Roynard, X., Deschaud, J.E., Goulette, F.: Paris-Lille-3D: a large and high-quality ground truth urban point cloud dataset for automatic segmentation and classification. arXiv preprint arXiv:1712.00032 (2017) Roynard, X., Deschaud, J.E., Goulette, F.: Paris-Lille-3D: a large and high-quality ground truth urban point cloud dataset for automatic segmentation and classification. arXiv preprint arXiv:​1712.​00032 (2017)
23.
Zurück zum Zitat Maddern, W., Pascoe, G., Linegar, C., Newman, P.: 1 year, 1000 km: the Oxford robotcar dataset. Int. J. Robot. Res. 36(1), 3–15 (2017)CrossRef Maddern, W., Pascoe, G., Linegar, C., Newman, P.: 1 year, 1000 km: the Oxford robotcar dataset. Int. J. Robot. Res. 36(1), 3–15 (2017)CrossRef
25.
Zurück zum Zitat Xu, H., Gao, Y., Yu, F., Darrell, T.: End-to-end learning of driving models from large-scale video datasets. arXiv preprint (2017) Xu, H., Gao, Y., Yu, F., Darrell, T.: End-to-end learning of driving models from large-scale video datasets. arXiv preprint (2017)
26.
Zurück zum Zitat Neuhold, G., Ollmann, T., Bulò, S.R., Kontschieder, P.: The mapillary vistas dataset for semantic understanding of street scenes. In: Proceedings of ICCV, pp. 22–29 (2017) Neuhold, G., Ollmann, T., Bulò, S.R., Kontschieder, P.: The mapillary vistas dataset for semantic understanding of street scenes. In: Proceedings of ICCV, pp. 22–29 (2017)
27.
Zurück zum Zitat Choi, Y., Kim, N., Hwang, S., Park, K., Yoon, J.S., An, K., Kweon, I.S.: KAIST multi-spectral day/night data set for autonomous and assisted driving. IEEE Trans. Intell. Transp. Syst. 19 (2018)CrossRef Choi, Y., Kim, N., Hwang, S., Park, K., Yoon, J.S., An, K., Kweon, I.S.: KAIST multi-spectral day/night data set for autonomous and assisted driving. IEEE Trans. Intell. Transp. Syst. 19 (2018)CrossRef
28.
Zurück zum Zitat Ros, G., Sellart, L., Materzynska, J., Vazquez, D., Lopez, A.: The SYNTHIA dataset: a large collection of synthetic images for semantic segmentation of urban scenes (2016) Ros, G., Sellart, L., Materzynska, J., Vazquez, D., Lopez, A.: The SYNTHIA dataset: a large collection of synthetic images for semantic segmentation of urban scenes (2016)
31.
Zurück zum Zitat Dosovitskiy, A., Ros, G., Codevilla, F., López, A., Koltun, V.: CARLA: an open urban driving simulator. arXiv preprint arXiv:1711.03938 (2017) Dosovitskiy, A., Ros, G., Codevilla, F., López, A., Koltun, V.: CARLA: an open urban driving simulator. arXiv preprint arXiv:​1711.​03938 (2017)
32.
Zurück zum Zitat Ullman, S.: Against direct perception. BBS 3(3), 373–381 (1980) Ullman, S.: Against direct perception. BBS 3(3), 373–381 (1980)
33.
Zurück zum Zitat Chabot, F., Chaouch, M., Rabarisoa, J., Teulière, C., Chateau, T.: Deep MANTA: a coarse-to-fine many-task network for joint 2D and 3D vehicle analysis from monocular image. In: Proceedings of IEEE CVPR, pp. 2040–2049 (2017) Chabot, F., Chaouch, M., Rabarisoa, J., Teulière, C., Chateau, T.: Deep MANTA: a coarse-to-fine many-task network for joint 2D and 3D vehicle analysis from monocular image. In: Proceedings of IEEE CVPR, pp. 2040–2049 (2017)
35.
Zurück zum Zitat Li, B., Zhang, T., Xia, T.: Vehicle detection from 3D LiDAR using fully convolutional network. arXiv preprint arXiv:1608.07916 (2016) Li, B., Zhang, T., Xia, T.: Vehicle detection from 3D LiDAR using fully convolutional network. arXiv preprint arXiv:​1608.​07916 (2016)
36.
Zurück zum Zitat Chen, X., Ma, H., Wan, J., Li, B., Xia, T.: Multi-view 3D object detection network for autonomous driving. In: IEEE CVPR, vol. 1, p. 3 (2017) Chen, X., Ma, H., Wan, J., Li, B., Xia, T.: Multi-view 3D object detection network for autonomous driving. In: IEEE CVPR, vol. 1, p. 3 (2017)
37.
Zurück zum Zitat Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in NIPS, pp. 91–99 (2015) Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in NIPS, pp. 91–99 (2015)
38.
Zurück zum Zitat Ku, J., Mozifian, M., Lee, J., Harakeh, A., Waslander, S.: Joint 3D proposal generation and object detection from view aggregation. arXiv preprint arXiv:1712.02294 (2017) Ku, J., Mozifian, M., Lee, J., Harakeh, A., Waslander, S.: Joint 3D proposal generation and object detection from view aggregation. arXiv preprint arXiv:​1712.​02294 (2017)
39.
Zurück zum Zitat Wang, Z., Zhan, W., Tomizuka, M.: Fusing bird view LIDAR point cloud and front view camera image for deep object detection. arXiv:1711.06703 (2017) Wang, Z., Zhan, W., Tomizuka, M.: Fusing bird view LIDAR point cloud and front view camera image for deep object detection. arXiv:​1711.​06703 (2017)
40.
Zurück zum Zitat Bojarski, M., Del Testa, D., Dworakowski, D., Firner, B., Flepp, B., Goyal, P., Jackel, L.D., Monfort, M., Muller, U., Zhang, J., et al.: End to end learning for self-driving cars. arXiv preprint arXiv:1604.07316 (2016) Bojarski, M., Del Testa, D., Dworakowski, D., Firner, B., Flepp, B., Goyal, P., Jackel, L.D., Monfort, M., Muller, U., Zhang, J., et al.: End to end learning for self-driving cars. arXiv preprint arXiv:​1604.​07316 (2016)
42.
Zurück zum Zitat Chi, L., Mu, Y.: Deep steering: learning end-to-end driving model from spatial and temporal visual cues. arXiv preprint arXiv:1708.03798 (2017) Chi, L., Mu, Y.: Deep steering: learning end-to-end driving model from spatial and temporal visual cues. arXiv preprint arXiv:​1708.​03798 (2017)
43.
Zurück zum Zitat Patel, N., Choromanska, A., Krishnamurthy, P., Khorrami, F.: Sensor modality fusion with CNNs for UGV autonomous driving in indoor environments. In: International Conference on Intelligent Robots and Systems (IROS). IEEE (2017) Patel, N., Choromanska, A., Krishnamurthy, P., Khorrami, F.: Sensor modality fusion with CNNs for UGV autonomous driving in indoor environments. In: International Conference on Intelligent Robots and Systems (IROS). IEEE (2017)
Metadaten
Titel
Multimodal Deep Learning for Advanced Driving Systems
verfasst von
Nerea Aranjuelo
Luis Unzueta
Ignacio Arganda-Carreras
Oihana Otaegui
Copyright-Jahr
2018
DOI
https://doi.org/10.1007/978-3-319-94544-6_10