Skip to main content
Erschienen in:
Buchtitelbild

2019 | OriginalPaper | Buchkapitel

Deep Learning for Assistive Computer Vision

verfasst von : Marco Leo, Antonino Furnari, Gerard G. Medioni, Mohan Trivedi, Giovanni M. Farinella

Erschienen in: Computer Vision – ECCV 2018 Workshops

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

This paper revises the main advances in assistive computer vision recently fostered by deep learning. To this aim, we first discuss how the application of deep learning in computer vision has contributed to the development of assistive techinologies, then analyze the recent advances in assistive technologies achieved in five main areas, namely, object classification and localization, scene understanding, human pose estimation and tracking, action/event recognition and anticipation. The paper is concluded with a discussion and insights for future directions.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Abebe, G., Cavallaro, A.: A long short-term memory convolutional neural network for first-person vision activity recognition. In: Proceedings of International Conference on Computer Vision Workshops (ICCVW) (2017) Abebe, G., Cavallaro, A.: A long short-term memory convolutional neural network for first-person vision activity recognition. In: Proceedings of International Conference on Computer Vision Workshops (ICCVW) (2017)
2.
Zurück zum Zitat Abu Farha, Y., Richard, A., Gall, J.: When will you do what?-anticipating temporal occurrences of activities. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5343–5352 (2018) Abu Farha, Y., Richard, A., Gall, J.: When will you do what?-anticipating temporal occurrences of activities. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5343–5352 (2018)
3.
Zurück zum Zitat Andriluka, M., et al.: PoseTrack: a benchmark for human pose estimation and tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5167–5176 (2018) Andriluka, M., et al.: PoseTrack: a benchmark for human pose estimation and tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5167–5176 (2018)
4.
Zurück zum Zitat Battaglia, P., Pascanu, R., Lai, M., Rezende, D.J., et al.: Interaction networks for learning about objects, relations and physics. In: Advances in Neural Information Processing Systems, pp. 4502–4510 (2016) Battaglia, P., Pascanu, R., Lai, M., Rezende, D.J., et al.: Interaction networks for learning about objects, relations and physics. In: Advances in Neural Information Processing Systems, pp. 4502–4510 (2016)
5.
Zurück zum Zitat Brust, C.A., Sickert, S., Simon, M., Rodner, E., Denzler, J.: Efficient convolutional patch networks for scene understanding. In: International Conference on Computer Vision Theory and Applications (VISAPP) (2015) Brust, C.A., Sickert, S., Simon, M., Rodner, E., Denzler, J.: Efficient convolutional patch networks for scene understanding. In: International Conference on Computer Vision Theory and Applications (VISAPP) (2015)
6.
Zurück zum Zitat Celiktutan, O., Demiris, Y.: Inferring human knowledgeability from eye gaze in m-learning environments. In: Leal-Taixé, L., Roth, S. (eds.) ECCV 2018 Workshops, LNCS, vol. 11134, pp. 193–209. Springer, Cham (2019) Celiktutan, O., Demiris, Y.: Inferring human knowledgeability from eye gaze in m-learning environments. In: Leal-Taixé, L., Roth, S. (eds.) ECCV 2018 Workshops, LNCS, vol. 11134, pp. 193–209. Springer, Cham (2019)
8.
Zurück zum Zitat Erol, B.A., Majumdar, A., Lwowski, J., Benavidez, P., Rad, P., Jamshidi, M.: Improved deep neural network object tracking system for applications in home robotics. In: Pedrycz, W., Chen, S.-M. (eds.) Computational Intelligence for Pattern Recognition. SCI, vol. 777, pp. 369–395. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-89629-8_14CrossRef Erol, B.A., Majumdar, A., Lwowski, J., Benavidez, P., Rad, P., Jamshidi, M.: Improved deep neural network object tracking system for applications in home robotics. In: Pedrycz, W., Chen, S.-M. (eds.) Computational Intelligence for Pattern Recognition. SCI, vol. 777, pp. 369–395. Springer, Cham (2018). https://​doi.​org/​10.​1007/​978-3-319-89629-8_​14CrossRef
10.
Zurück zum Zitat Feng, D., Barnes, N., You, S.: DSD: depth structural descriptor for edge-based assistive navigation. In: 2017 IEEE International Conference on Computer Vision Workshop (ICCVW), pp. 1536–1544. IEEE (2017) Feng, D., Barnes, N., You, S.: DSD: depth structural descriptor for edge-based assistive navigation. In: 2017 IEEE International Conference on Computer Vision Workshop (ICCVW), pp. 1536–1544. IEEE (2017)
11.
Zurück zum Zitat Furnari, A., Battiato, S., Farinella, G.M.: Personal-location-based temporal segmentation of egocentric videos for lifelogging applications. J. Vis. Commun. Image Represent. 52, 1–12 (2018)CrossRef Furnari, A., Battiato, S., Farinella, G.M.: Personal-location-based temporal segmentation of egocentric videos for lifelogging applications. J. Vis. Commun. Image Represent. 52, 1–12 (2018)CrossRef
12.
Zurück zum Zitat Furnari, A., Battiato, S., Grauman, K., Farinella, G.M.: Next-active-object prediction from egocentric videos. J. Vis. Commun. Image Represent. 49, 401–411 (2017)CrossRef Furnari, A., Battiato, S., Grauman, K., Farinella, G.M.: Next-active-object prediction from egocentric videos. J. Vis. Commun. Image Represent. 49, 401–411 (2017)CrossRef
13.
Zurück zum Zitat Gao, J., Yang, Z., Nevatia, R.: RED: reinforced encoder-decoder networks for action anticipation. In: British Machine Vision Conference (2017) Gao, J., Yang, Z., Nevatia, R.: RED: reinforced encoder-decoder networks for action anticipation. In: British Machine Vision Conference (2017)
15.
Zurück zum Zitat Hesse, N., Bodensteiner, C., Arens, M., Hofmann, U., Weinberger, R., Schroeder, S.: An empirical study towards understanding how deep convolutional nets recognize falls. In: Leal-Taixé, L., Roth, S. (eds.) ECCV 2018 Workshops. LNCS, vol. 11134, pp. 112–127. Springer, Cham (2019) Hesse, N., Bodensteiner, C., Arens, M., Hofmann, U., Weinberger, R., Schroeder, S.: An empirical study towards understanding how deep convolutional nets recognize falls. In: Leal-Taixé, L., Roth, S. (eds.) ECCV 2018 Workshops. LNCS, vol. 11134, pp. 112–127. Springer, Cham (2019)
16.
Zurück zum Zitat Ivorra, E., Ortega, M., Alcañiz, M., Garcia-Aracil, N.: Multimodal computer vision framework for human assistive robotics. In: 2018 Workshop on Metrology for Industry 4.0 and IoT, pp. 1–5. IEEE (2018) Ivorra, E., Ortega, M., Alcañiz, M., Garcia-Aracil, N.: Multimodal computer vision framework for human assistive robotics. In: 2018 Workshop on Metrology for Industry 4.0 and IoT, pp. 1–5. IEEE (2018)
20.
Zurück zum Zitat Leo, M., Medioni, G., Trivedi, M., Kanade, T., Farinella, G.: Computer vision for assistive technologies. Comput. Vis. Image Underst. 154(Suppl. C), 1–15 (2017)CrossRef Leo, M., Medioni, G., Trivedi, M., Kanade, T., Farinella, G.: Computer vision for assistive technologies. Comput. Vis. Image Underst. 154(Suppl. C), 1–15 (2017)CrossRef
21.
22.
Zurück zum Zitat Litjens, G., et al.: A survey on deep learning in medical image analysis. Med. Image Anal. 42, 60–88 (2017)CrossRef Litjens, G., et al.: A survey on deep learning in medical image analysis. Med. Image Anal. 42, 60–88 (2017)CrossRef
24.
Zurück zum Zitat Mahmud, T., Hasan, M., Roy-Chowdhury, A.K.: Joint prediction of activity labels and starting times in untrimmed videos. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 5784–5793 (2017) Mahmud, T., Hasan, M., Roy-Chowdhury, A.K.: Joint prediction of activity labels and starting times in untrimmed videos. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 5784–5793 (2017)
25.
Zurück zum Zitat Nair, V., Budhai, M., Olmschenk, G., Seiple, W.H., Zhu, Z.: ASSIST: personalized indoor navigation via multimodal sensors and high-level semantic information. In: Leal-Taixé, L., Roth, S. (eds.) ECCV 2018 Workshops. LNCS, vol. 11134, pp. 128–143. Springer, Cham (2019) Nair, V., Budhai, M., Olmschenk, G., Seiple, W.H., Zhu, Z.: ASSIST: personalized indoor navigation via multimodal sensors and high-level semantic information. In: Leal-Taixé, L., Roth, S. (eds.) ECCV 2018 Workshops. LNCS, vol. 11134, pp. 128–143. Springer, Cham (2019)
26.
Zurück zum Zitat Nouredanesh, M., Li, A.W., Godfrey, A., Hoey, J., Tung, J.: Chasing feet in the wild: a proposed egocentric motion-aware gait assessment tool. In: Leal-Taixé, L., Roth, S. (eds.) ECCV 2018 Workshops. LNCS, vol. 11134, pp. 176–192. Springer, Cham (2019) Nouredanesh, M., Li, A.W., Godfrey, A., Hoey, J., Tung, J.: Chasing feet in the wild: a proposed egocentric motion-aware gait assessment tool. In: Leal-Taixé, L., Roth, S. (eds.) ECCV 2018 Workshops. LNCS, vol. 11134, pp. 176–192. Springer, Cham (2019)
27.
Zurück zum Zitat Ortis, A., Farinella, G.M., D’Amico, V., Addesso, L., Torrisi, G., Battiato, S.: Organizing egocentric videos of daily living activities. Pattern Recogn. 72, 207–218 (2017)CrossRef Ortis, A., Farinella, G.M., D’Amico, V., Addesso, L., Torrisi, G., Battiato, S.: Organizing egocentric videos of daily living activities. Pattern Recogn. 72, 207–218 (2017)CrossRef
28.
Zurück zum Zitat Park, H.S., Hwang, J.J., Niu, Y., Shi, J.: Egocentric future localization. In: CVPR 2016, pp. 4697–4705 (2016) Park, H.S., Hwang, J.J., Niu, Y., Shi, J.: Egocentric future localization. In: CVPR 2016, pp. 4697–4705 (2016)
29.
Zurück zum Zitat Perrett, T., Damen, D.: Recurrent assistance: cross-dataset training of LSTMs on kitchen tasks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1354–1362 (2017) Perrett, T., Damen, D.: Recurrent assistance: cross-dataset training of LSTMs on kitchen tasks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1354–1362 (2017)
30.
Zurück zum Zitat Pirri, F., Mauro, L., Alati, E., Sanzari, M., Ntouskos, V.: Deep execution monitor for robot assistive tasks. In: Leal-Taixé, L., Roth, S. (eds.) ECCV 2018 Workshops. LNCS, vol. 11134, pp. 158–175. Springer, Cham (2019) Pirri, F., Mauro, L., Alati, E., Sanzari, M., Ntouskos, V.: Deep execution monitor for robot assistive tasks. In: Leal-Taixé, L., Roth, S. (eds.) ECCV 2018 Workshops. LNCS, vol. 11134, pp. 158–175. Springer, Cham (2019)
31.
Zurück zum Zitat Ravì, D., et al.: Deep learning for health informatics. IEEE J. Biomed. Health Inform. 21, 4–21 (2017)CrossRef Ravì, D., et al.: Deep learning for health informatics. IEEE J. Biomed. Health Inform. 21, 4–21 (2017)CrossRef
32.
Zurück zum Zitat Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger. arXiv preprint (2017) Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger. arXiv preprint (2017)
33.
Zurück zum Zitat Rhinehart, N., Kitani, K.M.: First-person activity forecasting with online inverse reinforcement learning. In: ICCV (2017) Rhinehart, N., Kitani, K.M.: First-person activity forecasting with online inverse reinforcement learning. In: ICCV (2017)
34.
Zurück zum Zitat Rodriguez, C., Fernando, B., Li, H.: Action anticipation by predicting future dynamic images. arXiv preprint arXiv:1808.00141 (2018) Rodriguez, C., Fernando, B., Li, H.: Action anticipation by predicting future dynamic images. arXiv preprint arXiv:​1808.​00141 (2018)
35.
Zurück zum Zitat Sawatzky, J., Gall, J.: Adaptive binarization for weakly supervised affordance segmentation. arXiv preprint arXiv:1707.02850 (2017) Sawatzky, J., Gall, J.: Adaptive binarization for weakly supervised affordance segmentation. arXiv preprint arXiv:​1707.​02850 (2017)
36.
Zurück zum Zitat Schydlo, P., Rakovic, M., Jamone, L., Santos-Victor, J.: Anticipation in Human-Robot Cooperation: A Recurrent Neural Network Approach for Multiple Action Sequences Prediction. arXiv e-prints, February 2018 Schydlo, P., Rakovic, M., Jamone, L., Santos-Victor, J.: Anticipation in Human-Robot Cooperation: A Recurrent Neural Network Approach for Multiple Action Sequences Prediction. arXiv e-prints, February 2018
38.
Zurück zum Zitat Sharif Razavian, A., Azizpour, H., Sullivan, J., Carlsson, S.: CNN features off-the-shelf: an astounding baseline for recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 806–813 (2014) Sharif Razavian, A., Azizpour, H., Sullivan, J., Carlsson, S.: CNN features off-the-shelf: an astounding baseline for recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 806–813 (2014)
39.
Zurück zum Zitat Soran, B., Farhadi, A., Shapiro, L.: Generating notifications for missing actions: don’t forget to turn the lights off! In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4669–4677 (2016) Soran, B., Farhadi, A., Shapiro, L.: Generating notifications for missing actions: don’t forget to turn the lights off! In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4669–4677 (2016)
41.
Zurück zum Zitat Sun, C., Shrivastava, A., Vondrick, C., Murphy, K., Sukthankar, R., Schmid, C.: Actor-centric relation network. arXiv preprint arXiv:1807.10982 (2018) Sun, C., Shrivastava, A., Vondrick, C., Murphy, K., Sukthankar, R., Schmid, C.: Actor-centric relation network. arXiv preprint arXiv:​1807.​10982 (2018)
42.
Zurück zum Zitat Tapaswi, M., Zhu, Y., Stiefelhagen, R., Torralba, A., Urtasun, R., Fidler, S.: MovieQA: understanding stories in movies through question-answering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4631–4640 (2016) Tapaswi, M., Zhu, Y., Stiefelhagen, R., Torralba, A., Urtasun, R., Fidler, S.: MovieQA: understanding stories in movies through question-answering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4631–4640 (2016)
43.
Zurück zum Zitat Toshev, A., Szegedy, C.: DeepPose: human pose estimation via deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1653–1660 (2014) Toshev, A., Szegedy, C.: DeepPose: human pose estimation via deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1653–1660 (2014)
44.
Zurück zum Zitat Vondrick, C., Pirsiavash, H., Torralba, A.: Anticipating visual representations from unlabeled video. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 98–106 (2016) Vondrick, C., Pirsiavash, H., Torralba, A.: Anticipating visual representations from unlabeled video. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 98–106 (2016)
45.
Zurück zum Zitat Wang, A., Dantcheva, A., Broutart, J.C., Robert, P., Bremond, F., Bilinski, P.: Comparing methods for assessment of facial dynamics in patients with major neurocognitive disorders. In: Leal-Taixé, L., Roth, S. (eds.) ECCV 2018 Workshops. LNCS, vol. 11134, pp. 144–157. Springer, Cham (2019) Wang, A., Dantcheva, A., Broutart, J.C., Robert, P., Bremond, F., Bilinski, P.: Comparing methods for assessment of facial dynamics in patients with major neurocognitive disorders. In: Leal-Taixé, L., Roth, S. (eds.) ECCV 2018 Workshops. LNCS, vol. 11134, pp. 144–157. Springer, Cham (2019)
47.
Zurück zum Zitat Yan, Z.: Computer vision for medical infant motion analysis: state of the art and RGB-D data set. In: Leal-Taixé, L., Roth, S. (eds.) ECCV 2018 Workshops. LNCS, vol. 11134, pp. 32–49. Springer, Cham (2019) Yan, Z.: Computer vision for medical infant motion analysis: state of the art and RGB-D data set. In: Leal-Taixé, L., Roth, S. (eds.) ECCV 2018 Workshops. LNCS, vol. 11134, pp. 32–49. Springer, Cham (2019)
48.
Zurück zum Zitat Jiang, Y., Natarajan, V., Chen, X., Rohrbach, M., Batra, D., Parikh, D.: Pythia v0.1: the winning entry to the VQA challenge 2018. arXiv preprint arXiv:1807.09956 (2018) Jiang, Y., Natarajan, V., Chen, X., Rohrbach, M., Batra, D., Parikh, D.: Pythia v0.1: the winning entry to the VQA challenge 2018. arXiv preprint arXiv:​1807.​09956 (2018)
49.
Zurück zum Zitat Zhang, M., Ma, K.T., Lim, J.H., Zhao, Q., Feng, J.: Deep future gaze: gaze anticipation on egocentric videos using adversarial networks. In: Conference on Computer Vision and Pattern Recognition, pp. 4372–4381 (2017) Zhang, M., Ma, K.T., Lim, J.H., Zhao, Q., Feng, J.: Deep future gaze: gaze anticipation on egocentric videos using adversarial networks. In: Conference on Computer Vision and Pattern Recognition, pp. 4372–4381 (2017)
50.
Zurück zum Zitat Zhang, Q., Yang, L.T., Chen, Z., Li, P.: A survey on deep learning for big data. Inf. Fusion 42(Suppl. C), 146–157 (2018)CrossRef Zhang, Q., Yang, L.T., Chen, Z., Li, P.: A survey on deep learning for big data. Inf. Fusion 42(Suppl. C), 146–157 (2018)CrossRef
51.
Zurück zum Zitat Zhang, Y., et al.: Physically-based rendering for indoor scene understanding using convolutional neural networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5057–5065. IEEE (2017) Zhang, Y., et al.: Physically-based rendering for indoor scene understanding using convolutional neural networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5057–5065. IEEE (2017)
52.
Zurück zum Zitat Zhao, Z.Q., Zheng, P., Xu, S., Wu, X.: Object Detection with Deep Learning: A Review. arXiv e-prints, July 2018 Zhao, Z.Q., Zheng, P., Xu, S., Wu, X.: Object Detection with Deep Learning: A Review. arXiv e-prints, July 2018
53.
Zurück zum Zitat Zhu, Y., Jiang, S.: Deep structured learning for visual relationship detection. In: The Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-2018) (2018) Zhu, Y., Jiang, S.: Deep structured learning for visual relationship detection. In: The Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-2018) (2018)
Metadaten
Titel
Deep Learning for Assistive Computer Vision
verfasst von
Marco Leo
Antonino Furnari
Gerard G. Medioni
Mohan Trivedi
Giovanni M. Farinella
Copyright-Jahr
2019
DOI
https://doi.org/10.1007/978-3-030-11024-6_1

Premium Partner