Skip to main content

2020 | OriginalPaper | Buchkapitel

Deep Learning Architectures for Computer Vision Applications: A Study

verfasst von : Randheer Bagi, Tanima Dutta, Hari Prabhat Gupta

Erschienen in: Advances in Data and Information Sciences

Verlag: Springer Singapore

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Deep learning has become one of the most preferred solution for many complex problems. It shows outstanding performance in the field of computer vision to perform tasks like, image classification, object detection, and image generation. Recently, many research efforts are focused on changing the deep learning architecture for widespread application domain. In this paper, we present a comprehensive survey on the various issues and challenges faced by deep learning techniques. Furthermore, we analyze different deep learning architectures to provide the solution for the computer vision tasks along with their importance.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Chen, L. C., Papandreou, G., Schroff, F., & Adam, H. (2017). Rethinking atrous convolution for semantic image segmentation. CoRR abs/1706.05587. Chen, L. C., Papandreou, G., Schroff, F., & Adam, H. (2017). Rethinking atrous convolution for semantic image segmentation. CoRR abs/1706.05587.
2.
Zurück zum Zitat Choe, J. W., Nikoozadeh, A., & Oralkan, O., Khuri-Yakub, B.T. (2013). GPU-based real-time volumetric ultrasound image reconstruction for a ring array. IEEE Transactions on Medical Imaging,32(7), 1258–1264. Choe, J. W., Nikoozadeh, A., & Oralkan, O., Khuri-Yakub, B.T. (2013). GPU-based real-time volumetric ultrasound image reconstruction for a ring array. IEEE Transactions on Medical Imaging,32(7), 1258–1264.
3.
Zurück zum Zitat Choi, Y., Choi, M., Kim, M., Ha, J., Kim, S., & Choo, J. (2017). StarGAN: Unified generative adversarial networks for multi-domain image-to-image translation. CoRR abs/1711.09020. Choi, Y., Choi, M., Kim, M., Ha, J., Kim, S., & Choo, J. (2017). StarGAN: Unified generative adversarial networks for multi-domain image-to-image translation. CoRR abs/1711.09020.
4.
Zurück zum Zitat Forsyth, D. A., & Ponce, J. (2002). Computer vision: A modern approach. Pearson Education India. Forsyth, D. A., & Ponce, J. (2002). Computer vision: A modern approach. Pearson Education India.
5.
Zurück zum Zitat Girshick, R. (2015). Fast R-CNN. In Proceedings of IEEE ICCV (pp. 1440–1448). Girshick, R. (2015). Fast R-CNN. In Proceedings of IEEE ICCV (pp. 1440–1448).
6.
Zurück zum Zitat Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. MIT Press. Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. MIT Press.
7.
Zurück zum Zitat Goodfellow, I. J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., et al. (2014). Generative adversarial nets. In Proceedings of NIPS (pp. 2672–2680). Goodfellow, I. J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., et al. (2014). Generative adversarial nets. In Proceedings of NIPS (pp. 2672–2680).
8.
Zurück zum Zitat Graves, A., Mohamed, A., & Hinton, G. (2013). Speech recognition with deep recurrent neural networks. In Proceedings of IEEE ICASSP (pp. 6645–6649). Graves, A., Mohamed, A., & Hinton, G. (2013). Speech recognition with deep recurrent neural networks. In Proceedings of IEEE ICASSP (pp. 6645–6649).
9.
Zurück zum Zitat Guo, T., Dong, J., Li, H., & Gao, Y. (2017). Simple convolutional neural network on image classification. In Proceeding of IEEE ICBDA (pp. 721–724). Guo, T., Dong, J., Li, H., & Gao, Y. (2017). Simple convolutional neural network on image classification. In Proceeding of IEEE ICBDA (pp. 721–724).
10.
Zurück zum Zitat Hall, M. A., & Smith, L. A. (1999). Feature selection for machine learning: Comparing a correlation-based filter approach to the wrapper. In Proceedings of IFAIRSC (pp. 235–239). Hall, M. A., & Smith, L. A. (1999). Feature selection for machine learning: Comparing a correlation-based filter approach to the wrapper. In Proceedings of IFAIRSC (pp. 235–239).
11.
Zurück zum Zitat Hatcher, W. G., & Yu, W. (2018). A survey of deep learning: Platforms, applications and emerging research trends. IEEE Access,6, 24411–24432. Hatcher, W. G., & Yu, W. (2018). A survey of deep learning: Platforms, applications and emerging research trends. IEEE Access,6, 24411–24432.
12.
Zurück zum Zitat He, K., Zhang, X., Ren, S., & Sun, J. (2015). Deep residual learning for image recognition. CoRR abs/1512.03385. He, K., Zhang, X., Ren, S., & Sun, J. (2015). Deep residual learning for image recognition. CoRR abs/1512.03385.
13.
Zurück zum Zitat He, K., Gkioxari, G., Dollr, P., & Girshick, R. (2017). Mask R-CNN. In Proceedings of IEEE ICCV (pp. 2980–2988). He, K., Gkioxari, G., Dollr, P., & Girshick, R. (2017). Mask R-CNN. In Proceedings of IEEE ICCV (pp. 2980–2988).
14.
Zurück zum Zitat Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation,9(8), 1735–1780. Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation,9(8), 1735–1780.
15.
Zurück zum Zitat Hu, B., Lu, Z., Li, H., & Chen, Q. (2014). Convolutional neural network architectures for matching natural language sentences. Advances in Neural Information Processing Systems,27, 2042–2050. Hu, B., Lu, Z., Li, H., & Chen, Q. (2014). Convolutional neural network architectures for matching natural language sentences. Advances in Neural Information Processing Systems,27, 2042–2050.
16.
Zurück zum Zitat Jaiswal, A., AbdAlmageed, W., & Natarajan, P. (2018). CapsuleGAN: Generative adversarial capsule network. In ECCV Workshops. Jaiswal, A., AbdAlmageed, W., & Natarajan, P. (2018). CapsuleGAN: Generative adversarial capsule network. In ECCV Workshops.
17.
Zurück zum Zitat Kaiming, H., Zhang, X., Ren, S., Sun, J. (2015). Delving deep into rectifiers: Surpassing human-level performance on imageNet classification. In Proceedings of IEEE ICCV (pp. 1026–1034). Kaiming, H., Zhang, X., Ren, S., Sun, J. (2015). Delving deep into rectifiers: Surpassing human-level performance on imageNet classification. In Proceedings of IEEE ICCV (pp. 1026–1034).
18.
Zurück zum Zitat Karras, T., Laine, S., & Aila, T. (2018). A style-based generator architecture for generative adversarial networks. CoRR abs/1812.04948. Karras, T., Laine, S., & Aila, T. (2018). A style-based generator architecture for generative adversarial networks. CoRR abs/1812.04948.
19.
Zurück zum Zitat Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems,25, 1097–1105. Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems,25, 1097–1105.
20.
Zurück zum Zitat Lecun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE,86(11), 2278–2324. Lecun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE,86(11), 2278–2324.
21.
Zurück zum Zitat O’Shea, K., & Nash, R. (2015). An introduction to convolutional neural networks. ArXiv e-prints. O’Shea, K., & Nash, R. (2015). An introduction to convolutional neural networks. ArXiv e-prints.
22.
Zurück zum Zitat Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016a). You only look once: Unified, real-time object detection. In Proceeding of IEEE CVPR (pp. 779–788). Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016a). You only look once: Unified, real-time object detection. In Proceeding of IEEE CVPR (pp. 779–788).
23.
Zurück zum Zitat Redmon, J., Divvala, S. K., Girshick, R. B., & Farhadi, A. (2016b). You only look once: Unified, real-time object detection. Proceeding of IEEE CVPR (pp. 779–788). Redmon, J., Divvala, S. K., Girshick, R. B., & Farhadi, A. (2016b). You only look once: Unified, real-time object detection. Proceeding of IEEE CVPR (pp. 779–788).
24.
Zurück zum Zitat Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster R-CNN: Towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems,28, 91–99. Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster R-CNN: Towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems,28, 91–99.
25.
Zurück zum Zitat Sabour, S., Frosst, N., & Hinton, G. E. (2017). Dynamic routing between capsules. CoRR abs/1710.09829, 1710.09829. Sabour, S., Frosst, N., & Hinton, G. E. (2017). Dynamic routing between capsules. CoRR abs/1710.09829, 1710.09829.
26.
Zurück zum Zitat Sak, H., Senior, A. W., & Beaufays, F. (2014). Long short-term memory based recurrent neural network architectures for large vocabulary speech recognition. CoRR abs/1402.1128. Sak, H., Senior, A. W., & Beaufays, F. (2014). Long short-term memory based recurrent neural network architectures for large vocabulary speech recognition. CoRR abs/1402.1128.
27.
Zurück zum Zitat Schuster, M., & Paliwal, K. (1997). Bidirectional recurrent neural networks. Transaction in Signal Processing,45(11), 2673–2681. Schuster, M., & Paliwal, K. (1997). Bidirectional recurrent neural networks. Transaction in Signal Processing,45(11), 2673–2681.
28.
Zurück zum Zitat Shin, H., Roth, H. R., Gao, M., Lu, L., Xu, Z., Nogues, I., et al. (2016). Deep convolutional neural networks for computer-aided detection: CNN architectures, dataset characteristics and transfer learning. IEEE Transactions on Medical Imaging,35(5), 1285–1298. Shin, H., Roth, H. R., Gao, M., Lu, L., Xu, Z., Nogues, I., et al. (2016). Deep convolutional neural networks for computer-aided detection: CNN architectures, dataset characteristics and transfer learning. IEEE Transactions on Medical Imaging,35(5), 1285–1298.
29.
Zurück zum Zitat Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. CoRR abs/1409.1556. Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. CoRR abs/1409.1556.
30.
Zurück zum Zitat Sutskever, I., Vinyals, O., & Le, Q. V. (2014). Sequence to sequence learning with neural networks. Advances in Neural Information Processing Systems,27, 3104–3112. Sutskever, I., Vinyals, O., & Le, Q. V. (2014). Sequence to sequence learning with neural networks. Advances in Neural Information Processing Systems,27, 3104–3112.
31.
Zurück zum Zitat Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., et al. (2015). Going deeper with convolutions. In Proceeding of IEEE CVPR (pp. 1–9). Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., et al. (2015). Going deeper with convolutions. In Proceeding of IEEE CVPR (pp. 1–9).
32.
Zurück zum Zitat Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., & Wojna, Z. (2016). Rethinking the inception architecture for computer vision. In Proceeding of IEEE CVPR. Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., & Wojna, Z. (2016). Rethinking the inception architecture for computer vision. In Proceeding of IEEE CVPR.
33.
Zurück zum Zitat Turner, C. R., Wolf, A. L., Fuggetta, A., & Lavazza, L. (1998). Feature engineering. In Proceedings of IWSSD (p. 162). Turner, C. R., Wolf, A. L., Fuggetta, A., & Lavazza, L. (1998). Feature engineering. In Proceedings of IWSSD (p. 162).
34.
Zurück zum Zitat Vijay, B., Kendall, A., & Cipolla, R. (2017). SegNet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence. Vijay, B., Kendall, A., & Cipolla, R. (2017). SegNet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence.
35.
Zurück zum Zitat Zeiler, M. D., & Fergus, R. (2013). Visualizing and understanding convolutional networks. CoRR. Zeiler, M. D., & Fergus, R. (2013). Visualizing and understanding convolutional networks. CoRR.
36.
Zurück zum Zitat Zhao, Q., Sheng, T., Wang, Y., Tang, Z., Chen, Y., Cai, L., et al. (2018). M2Det: A single-shot object detector based on multi-level feature pyramid network. CoRR abs/1811.04533. Zhao, Q., Sheng, T., Wang, Y., Tang, Z., Chen, Y., Cai, L., et al. (2018). M2Det: A single-shot object detector based on multi-level feature pyramid network. CoRR abs/1811.04533.
37.
Zurück zum Zitat Zhu, J. Y., Krähenbühl, P., Shechtman, E., & Efros, A. A. (2016). Generative visual manipulation on the natural image manifold. CoRR abs/1609.03552. Zhu, J. Y., Krähenbühl, P., Shechtman, E., & Efros, A. A. (2016). Generative visual manipulation on the natural image manifold. CoRR abs/1609.03552.
Metadaten
Titel
Deep Learning Architectures for Computer Vision Applications: A Study
verfasst von
Randheer Bagi
Tanima Dutta
Hari Prabhat Gupta
Copyright-Jahr
2020
Verlag
Springer Singapore
DOI
https://doi.org/10.1007/978-981-15-0694-9_56

Neuer Inhalt