nach oben

Artificial Intelligence Review

Erschienen in:

08.04.2020

Visual question answering: a state-of-the-art review

verfasst von: Sruthy Manmadhan, Binsu C. Kovoor

Erschienen in: Artificial Intelligence Review | Ausgabe 8/2020

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Visual question answering (VQA) is a task that has received immense consideration from two major research communities: computer vision and natural language processing. Recently it has been widely accepted as an AI-complete task which can be used as an alternative to visual turing test. In its most common form, it is a multi-modal challenging task where a computer is required to provide the correct answer for a natural language question asked about an input image. It attracts many deep learning researchers after their remarkable achievements in text, voice and vision technologies. This review extensively and critically examines the current status of VQA research in terms of step by step solution methodologies, datasets and evaluation metrics. Finally, this paper also discusses future research directions for all the above-mentioned aspects of VQA separately.

Vorheriger Artikel A hybridization of deep learning techniques to predict and control traffic disturbances

Nächster Artikel CHIRPS: Explaining random forest classification

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Agrawal A, Kembhavi A, Batra D, Parikh D (2017) C-vqa: A compositional split of the visual question answering (vqa) v1. 0 dataset. arXiv preprint arXiv:1704.08243

Anderson P, He X, Buehler C, Teney D, Johnson M, Gould S, Zhang L (2018) Bottom-up and top-down attention for image captioning and visual question answering. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 6077–6086

Andreas J, Rohrbach M, Darrell T, Klein D (2015) Deep compositional question answering with neural module networks. arXiv preprint. arXiv preprint arXiv:1511.02799

Andreas J, Rohrbach M, Darrell T, Klein D (2016) Neural module networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 39–48

Antol S, Zitnick CL, Parikh D (2014) Zero-shot learning via visual abstraction. In: European conference on computer vision. Springer, Cham, pp 401–416

Antol S, Agrawal A, Lu J, Mitchell M, Batra D, Lawrence Zitnick C, Parikh D (2015) Vqa: Visual question answering. In: Proceedings of the IEEE international conference on computer vision. pp 2425–2433

Bai Y, Fu J, Zhao T, Mei T (2018) Deep attention neural tensor network for visual question answering. In: Computer vision–ECCV 2018: 15th European conference, Munich, Germany, September 8–14, 2018, Proceedings. Springer, vol 11216, p 20

Bengio Y, Ducharme R, Vincent P, Jauvin C (2003) A neural probabilistic language model. J Mach Learn Res 3(2):1137–1155MATH

Ben-Younes H, Cadene R, Cord M, & Thome N (2017) Mutan: multimodal tucker fusion for visual question answering. In: Proceedings of the IEEE international conference on computer vision. pp 2612–2620

Bojanowski P, Grave E, Joulin A, Mikolov T (2017) Enriching word vectors with subword information. Trans Assoc Comput Linguist 5:135–146CrossRef

Cao L, Gao L, Song J, Xu X, Shen HT (2017) Jointly learning attentions with semantic cross-modal correlation for visual question answering. In: Australasian database conference. Springer, Cham, pp 248–260

Cer D, Yang Y, Kong SY, Hua N, Limtiaco N, John RS, Constant N, Guajardo-Cespedes M, Yuan S, Tar C, Sung YH (2018) Universal sentence encoder. arXiv preprint arXiv:1803.11175

Chaturvedi I, Satapathy R, Cavallari S, Cambria E (2019) Fuzzy common-sense reasoning for multimodal sentiment analysis. Pattern Recognit Lett 125:264–270CrossRef

Chen K, Wang J, Chen LC, Gao H, Xu W, Nevatia R (2015) ABC-CNN: An attention based convolutional neural network for visual question answering. arXiv preprint arXiv:1511.05960

Cho K, Van Merriënboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y (2014) Learning phrase representations using RNN encoder–decoder for statistical machine translation. arXiv preprint arXiv:1406.1078

Conneau A, Kiela D, Schwenk H, Barrault L, Bordes A (2017) Supervised learning of universal sentence representations from natural language inference data. arXiv preprint arXiv:1705.02364

Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: IEEE computer society conference on computer vision and pattern recognition. CVPR 2005. IEEE, vol 1, pp 886–893

Denkowski M, Lavie A (2014) Meteor universal: language specific translation evaluation for any target language. In: Proceedings of the ninth workshop on statistical machine translation. pp 376–380

Devlin J, Chang MW, Lee K, Toutanova K (2018) Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805

Eckart C, Young G (1936) The approximation of one matrix by another of lower rank. Psychometrika 1(3):211–218MATHCrossRef

Elman JL (1990) Finding structure in time. Cognit Sci 14(2):179–211CrossRef

Fang Z, Liu J, Li Y, Qiao Y, Lu H (2019) Improving visual question answering using dropout and enhanced question encoder. Pattern Recognit 90:404–414CrossRef

Feng, Y., Zhu, X., Li, Y., Ruan, Y., & Greenspan, M. (2018). Learning Capsule Networks with Images and Text. In Advances in neural information processing systems

Forsbom E (2003) Training a super model look-alike: featuring edit distance, n-gram occurrence, and one reference translation. In: Proceedings of the workshop on machine translation evaluation: towards systemizing MT evaluation. pp 29–36

Fukui A, Park DH, Yang D, Rohrbach A, Darrell T, Rohrbach M (2016) Multimodal compact bilinear pooling for visual question answering and visual grounding. arXiv preprint arXiv:1606.01847

Gao H, Mao J, Zhou J, Huang Z, Wang L, Xu W (2015) Are you talking to a machine? Dataset and methods for multilingual image question. In: Advances in neural information processing systems, pp 2296–2304

Gao P, Li H, Li S, Lu P, Li Y, Hoi SC, Wang X (2018) Question-guided hybrid convolution for visual question answering. In: Computer vision—ECCV 2018 lecture notes in computer science. pp 485–501

Geman D, Geman S, Hallonquist N, Younes L (2015) Visual turing test for computer vision systems. In: Proceedings of the national academy of sciences. pp 201422953

Girshick R (2015) Fast r-cnn. In: Proceedings of the IEEE international conference on computer vision. pp 1440–1448

Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 580–587

Gong Y, Ke Q, Isard M, Lazebnik S (2014) A multi-view embedding space for modeling internet images, tags, and their semantics. Int J Comput Vis 106(2):210–233CrossRef

Goyal Y, Khot T, Summers-Stay D, Batra D, Parikh D (2017) Making the V in VQA matter: elevating the role of image understanding in visual question answering. In: CVPR. vol 1(2), p 3

Gurari D, Li Q, Stangl AJ, Guo A, Lin C, Grauman K, Luo J, Bigham JP (2018) VizWiz grand challenge: answering visual questions from blind people. arXiv preprint arXiv:1802.08218

Hasan SA, Ling Y, Farri O, Liu J, Lungren M, Müller H (2018) Overview of the ImageCLEF 2018 medical domain visual question answering task. In CLEF2018 working notes. CEUR Workshop proceedings, Avignon, France

He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 770–778

He K, Gkioxari G, Dollár P, Girshick R (2017) Mask r-cnn. In: 2017 IEEE international conference on computer vision (ICCV). IEEE, pp 2980–2988

Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780CrossRef

Huang LC, Kulkarni K, Jha A, Lohit S, Jayasuriya S, Turaga P (2018) CS-VQA: visual question answering with compressively sensed images. arXiv preprint arXiv:1806.03379

Jabri A, Joulin A, van der Maaten L (2016) Revisiting visual question answering baselines. In: European conference on computer vision. Springer, Cham, pp 727–739

Johnson J, Hariharan B, van der Maaten L, Fei-Fei L, Lawrence Zitnick C, Girshick R (2017) Clevr: A diagnostic dataset for compositional language and elementary visual reasoning. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 2901–2910

Kafle K, Kanan C (2016) Answer-type prediction for visual question answering. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 4976–4984

Kafle K, Kanan C (2017a) An analysis of visual question answering algorithms. In: 2017 IEEE international conference on computer vision (ICCV). IEEE, pp 1983–1991

Kafle K, Kanan C (2017b) Visual question answering: datasets, algorithms, and future challenges. Comput Vis Image Underst 163:3–20CrossRef

Kafle K, Price B, Cohen S, Kanan C (2018) DVQA: Understanding data visualizations via question answering. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 5648–5656

Kahou SE, Michalski V, Atkinson A, Kadar A, Trischler A, Bengio Y (2017) Figureqa: An annotated figure dataset for visual reasoning. arXiv preprint arXiv:1710.07300

Kembhavi A, Salvato M, Kolve E, Seo M, Hajishirzi H, Farhadi A (2016) A diagram is worth a dozen images. In: European conference on computer vision. Springer, Cham, pp 235–251

Kim Y (2014) Convolutional neural networks for sentence classification. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP)

Kim JH, Lee SW, Kwak D, Heo MO, Kim J, Ha JW, Zhang BT (2016a) Multimodal residual learning for visual qa. In: Advances in neural information processing systems pp 361–369

Kim JH, On KW, Lim W, Kim J, Ha JW, Zhang BT (2016b) Hadamard product for low-rank bilinear pooling. arXiv preprint arXiv:1610.04325

Kiros R, Zhu Y, Salakhutdinov RR, Zemel R, Urtasun R, Torralba A, Fidler S (2015) Skip-thought vectors. In: Advances in neural information processing systems, pp. 3294–3302

Krishna R, Zhu Y, Groth O, Johnson J, Hata K, Kravitz J, Bernstein MS (2017) Visual genome: connecting language and vision using crowdsourced dense image annotations. Int J Comput Vis 123(1):32–73MathSciNetCrossRef

Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems pp 1097-1105)

Lao M, Guo Y, Wang H, Zhang X (2018) Cross-modal multistep fusion network with co-attention for visual question answering. IEEE Access

Levy O, Goldberg Y (2014) Neural word embedding as implicit matrix factorization. In: Advances in neural information processing systems pp 2177–2185

Levy O, Goldberg Y, Dagan I (2015) Improving distributional similarity with lessons learned from word embeddings. Trans Assoc Comput Linguist 3:211–225CrossRef

Li M, Gu L, Ji Y, Liu C (2018) Text-guided dual-branch attention network for visual question answering. In: Pacific rim conference on multimedia. Springer, Cham, pp 750–760

Lienhart R, Maydt J (2002) An extended set of haar-like features for rapid object detection. In: Proceedings. 2002 international conference on image processing. IEEE, vol 1, pp I–I

Lin X, Parikh D (2016) Leveraging visual question answering for image-caption ranking. In: European conference on computer vision. Springer, Cham, pp 261–277

Lioutas V, Passalis N, Tefas A (2018) Explicit ensemble attention learning for improving visual question answering. Pattern Recognit Lett 111:51–57CrossRef

Logeswaran L, Lee H (2018) An efficient framework for learning sentence representations. arXiv preprint arXiv:1803.02893

Lowe DG (1999) Object recognition from local scale-invariant features. In: The proceedings of the seventh IEEE international conference on computer vision, 1999. IEEE, vol 2, pp 1150–1157

Lu J, Yang J, Batra D, Parikh D (2016) Hierarchical question-image co-attention for visual question answering. In: Advances in neural information processing systems. pp 289–297

Lu J, Xiong C, Parikh D, Socher R (2017) Knowing when to look: adaptive attention via a visual sentinel for image captioning. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 375–383

Ma L, Lu Z, Li H (2016) Learning to answer questions from image using convolutional neural network. In: AAAI. vol. 3(7), p 16

Malinowski M, Fritz M (2014) A multi-world approach to question answering about real-world scenes based on uncertain input. In: Advances in neural information processing systems. pp 1682–1690

Malinowski M, Rohrbach M, Fritz M (2017) Ask your neurons: a deep learning approach to visual question answering. Int J Comput Vis 125(1–3):110–135MathSciNetCrossRef

Malinowski M, Doersch C, Santoro A, Battaglia P (2018) Learning visual question answering by bootstrapping hard attention. In: Computer vision—ECCV 2018 lecture notes in computer science. pp 3–20

Mikolov T, Chen K, Corrado G, Dean J (2013a) Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781

Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013b) Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems. pp 3111–3119

Miller GA, Charles WG (1991) Contextual correlates of semantic similarity. Lang Cognit Process 6(1):1–28MathSciNetCrossRef

Narasimhan M, Schwing AG (2018) Straight to the facts: learning knowledge base retrieval for factual visual question answering. In: Proceedings of the European conference on computer vision (ECCV). pp 451–468

Noh H, Hongsuck Seo P, Han B (2016) Image question answering using convolutional neural network with dynamic parameter prediction. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 30–38

Papineni K, Roukos S, Ward T, Zhu WJ (2002) BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting on association for computational linguistics. Association for Computational Linguistics, pp 311–318

Peng L, Yang Y, Bin Y, Xie N, Shen F, Ji Y, Xu X (2019) Word-to-region attention network for visual question answering. Multimedia Tools Appl 78(3):3843–3858CrossRef

Pennington J, Socher R, Manning C (2014) Glove: global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). pp 1532–1543

Peters ME, Neumann M, Iyyer M, Gardner M, Clark C, Lee K, Zettlemoyer L (2018) Deep contextualized word representations. arXiv preprint ar-Xiv:1802.05365

Prakash BS, Sanjeev KV, Prakash R, Chandrasekaran K (2019) A survey on recurrent eural network architectures for sequential learning. In: Soft computing for problem solving. Springer, Singapore, pp 57–66

Ren H, Lu H (2018) Compositional coding capsule network with k-means routing for text classification. arXiv preprint arXiv:1810.09177

Ren M, Kiros R, Zemel R (2015a) Image question answering: a visual semantic embedding model and a new dataset. Proc Adv Neural Inf Process Syst 1(2):5

Ren M, Kiros R, Zemel R (2015b) Exploring models and data for image question answering. In: Advances in neural information processing systems. pp 2953–2961

Ren S, He K, Girshick R, Sun J (2015c) Faster r-cnn: Towards real-time object detection with region proposal networks. In: Advances in neural information processing systems. pp 91–99

Ruwa N, Mao Q, Wang L, Dong M (2018) Affective visual question answering network. In: 2018 IEEE conference on multimedia information processing and retrieval (MIPR)

Sabour S, Frosst N, Hinton GE (2017). Dynamic routing between capsules. In: Advances in neural information processing systems. pp 3856–3866

Saito K, Shin A, Ushiku Y, Harada T (2017) Dualnet: domain-invariant network for visual question answering. In: 2017 IEEE international conference on multimedia and expo (ICME). IEEE, pp 829–834

Shah M, Chen X, Rohrbach M, Parikh D (2019) Cycle-consistency for robust visual question answering. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 6649–6658

Shi Y, Furlanello T, Zha S, Anandkumar A (2018) Question type guided attention in visual question answering. In: Computer vision—ECCV 2018 lecture notes in computer science. pp 158–175

Shih KJ, Singh S, Hoiem D (2016) Where to look: focus regions for visual question answering. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 4613–4621

Shrestha R, Kafle K, Kanan C (2019) Answer them all! toward universal visual question answering models. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 10472–10481

Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556

Sutskever I, Vinyals O, Le QV (2014) Sequence to sequence learning with neural networks. In: Advances in neural information processing systems. pp 3104–3112

Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 1–9

Teney D, Hengel AV (2018) Visual question answering as a meta learning task. In: Computer vision—ECCV 2018 lecture notes in computer science. 229–245

Tommasi T, Mallya A, Plummer B, Lazebnik S, Berg AC, Berg TL (2019) Combining multiple cues for visual madlibs question answering. Int J Comput Vis 127(1):38–60CrossRef

Toor AS, Wechsler H, Nappi M (2019) Question action relevance and editing for visual question answering. Multimedia Tools Appl 78(3):2921–2935CrossRef

Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Polosukhin I (2017) Attention is all you need. In: Advances in neural in-formation processing systems. pp 5998–6008

Wang P, Wu Q, Shen C, Hengel AVD, Dick A (2015) Explicit knowledge-based reasoning for visual question answering. arXiv preprint arXiv:1511.02570

Wang P, Wu Q, Shen C, Dick A, van den Hengel A (2018) Fvqa: fact-based visual question answering. IEEE Trans Pattern Anal Mach Intell 40(10):2413–2427CrossRef

Wu Z, Palmer M (1994) Verbs semantics and lexical selection. In: Proceedings of the 32nd annual meeting on association for computational linguistics. Association for Computational Linguistics, pp 133–138

Wu Q, Wang P, Shen C, Dick A, van den Hengel A (2016). Ask me any-thing: Free-form visual question answering based on knowledge from exter-nal sources. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition pp 4622-4630)

Wu Q, Shen C, Wang P, Dick A, van den Hengel A (2018) Image captioning and visual question answering based on attributes and external knowledge. IEEE Trans Pattern Anal Mach Intell 40(6):1367–1381CrossRef

Xu W, Rudnicky A (2000) Can artificial neural networks learn language models?. In: sixth international conference on spoken language processing

Xu H, Saenko K (2016) Ask, attend and answer: Exploring question-guided spatial attention for visual question answering. In: European conference on computer vision. Springer, Cham, pp 451–466

Yang Z, He X, Gao J, Deng L, Smola A (2016) Stacked attention networks for image question answering. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 21–29

Young T, Hazarika D, Poria S, Cambria E (2018) Recent trends in deep learn-ing based natural language processing. IEEE Comput Intell Mag 13(3):55–75CrossRef

Yu L, Park E, Berg AC, Berg TL (2015) Visual madlibs: fill in the blank description generation and question answering. In: Proceedings of the ieee international conference on computer vision. pp 2461–2469

Yu D, Fu J, Mei T, Rui Y (2017) Multi-level attention networks for visual question answering. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 4187–4195

Yu D, Gao X, Xiong H (2018a) Structured semantic representation for visual question answering. In: 2018 25th IEEE international conference on image processing (ICIP). IEEE, pp 2286–2290

Yu Z, Yu J, Xiang C, Fan J, Tao D (2018b) Beyond bilinear: generalized multimodal factorized high-order pooling for visual question answering. IEEE Trans Neural Netw Learn Syst 29(12):5947–5959CrossRef

Yu Z, Yu J, Cui Y, Tao D, Tian Q (2019) Deep modular co-attention networks for visual question answering. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 6281–6290

Zeiler MD, Fergus R (2014) Visualizing and understanding convolutional networks. In: European conference on computer vision. Springer, Cham, pp 818–833

Zhang P, Goyal Y, Summers-Stay D, Batra D, Parikh D (2016) Yin and yang: balancing and answering binary visual questions. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 5014–5022

Zhao W, Peng H, Eger S, Cambria E, Yang M (2019) Towards scalable and reliable capsule networks for challenging NLP applications. arXiv preprint arXiv:1906.02829

Zhou B, Tian Y, Sukhbaatar S, Szlam A, Fergus R (2015) Simple baseline for visual question answering. arXiv preprint arXiv:1512.02167

Zhu Y, Zhang C, Ré C, Fei-Fei L (2015) Building a large-scale multimodal knowledge base system for answering visual queries. arXiv preprint arXiv:1507.05670

Zhu Y, Groth O, Bernstein M, Fei-Fei L (2016) Visual7w: Grounded question answering in images. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 4995–5004

Titel: Visual question answering: a state-of-the-art review
verfasst von: Sruthy Manmadhan
Binsu C. Kovoor
Publikationsdatum: 08.04.2020
Verlag: Springer Netherlands
Erschienen in: Artificial Intelligence Review / Ausgabe 8/2020
Print ISSN: 0269-2821
Elektronische ISSN: 1573-7462
DOI: https://doi.org/10.1007/s10462-020-09832-7

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Weitere Artikel der Ausgabe 8/2020

A hybridization of deep learning techniques to predict and control traffic disturbances

A review on the long short-term memory model

Analytical review of clustering techniques and proximity measures

Sentiment analysis with deep neural networks: comparative study and performance assessment

Multi-label classification and knowledge extraction from oncology-related content on online social networks

A survey of the recent architectures of deep convolutional neural networks

Premium Partner