Top

Published in:

2018 | OriginalPaper | Chapter

7. Deep Learning—A New Era in Bridging the Semantic Gap

Authors : Urszula Markowska-Kaczmar, Halina Kwaśnicka

Published in: Bridging the Semantic Gap in Image and Video Analysis

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

The chapter deals with the semantic gap, the well-known phenomenon in the area of vision systems. Despite the significant efforts of researchers, the problem of how to overcome the semantic gap remains a challenge. One of the most popular research areas, where this problem is present and causes difficulty in obtaining good results, is the task of image retrieval. This chapter focuses on this problem. As deep learning models gain more and more popularity among researchers and more and more spectacular results are obtained, the deep learning models in solving the semantic gap in the Content Based Image Retrieval (CBIR) is the central issue of this chapter. The chapter briefly presents the traditional approaches to CBIR, next introduces shortly into methods and models of deep learning, and shows the application of deep learning at the particular levels of CBIR—features level, common sense knowledge level, and inference about a scene level.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

previous chapter Ontology-Based Structured Video Annotation for Content-Based Video Retrieval via Spatiotemporal Reasoning

Agostinelli, F., Hoffman, M., Sadowski, P., Baldi, P.: Learning activation functions to improve deep neural networks (2014). https://arXiv.org/abs/1412.6830

Akcay, S., Kundegorski, M.E., Devereux, M., Breckon, T. P.: Transfer learning using convolutional networks for object recognition within X-ray Baggage security imaginary. In: International Conference on Image Processing ICIP, IEEE, pp. 1057–1061 (2016)

Alzubi, A., Amira, A., Ramzan, N.: Semantic content-based image retrieval: a comprehensive study. J. Vis. Commun. Image Representation 32, 20–54 (2015)CrossRef

Andreas, J., Rohrbach, M., Darrell, T., Klein, D.: Learning to compose neural networks for question answering. In: NAACL (2016). https://arXiv.org/abs/1601.01705

Antol, S., Agrawal, A., Lu, J., Mitchell, M., Batra, D., Zitnick, C.L., Batra, D., Parikh, D.: VQA: visual question answering. In: Proceedings of International Conference on Computer Vision (ICCV), pp. 4–31 (2015)

Arandjelovic, R., Zisserman, A.: Three things everyone should know to improve object retrieval. In: Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE Computer Society, Washington, DC, USA, pp. 2911–2918 (2012)

Arik, S.O., Chrzanowski, M., Coates, A., Diamos, G., Gibiansky, A., Kang, Y., Li, X., Miller, J., Ng, A., Raiman, J., Sengupta, S., Shoeybi, M.: Deep voice: real-time neural text-to-speech. In: ICML (2017). https://arXiv.org/pdf/1702.07825.pdf

Atif, J., Hudelot, C., Bloch, I.: Explanatory reasoning for image understanding using formal concept analysis and description logics. IEEE Trans. Syst. Man Cybernetics Syst. 44(5), 552–570 (2014)CrossRef

Ba, J., Mnih, V., Kavukcuoglu, K.: Multiple object recognition with visual attention (2014). https://arXiv.org/abs/1412.7755

10.

Ba, J. Swersky, K. Fidler, S. Salakhutdinov, R.: Predicting deep zero-shot convolutional neural networks using textual descriptions. In: Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), pp. 4247–4255 (2015)

11.

Badrinarayanan, V., Kendall, A., Cipolla, R.: SegNet: a deep convolutional encoder-decoder architecture for image segmentation (2016). https://arXiv.org/pdf/1511.00561.pdf

12.

Bagdanov, A.D., Bertini, M., Del Bimbo, A., Serra, G., Torniai, C.: Semantic annotation and retrieval of video events using multimedia ontologies. In: International Conference on Semantic Computing (ICSC07), pp. 713–720 (2007)

13.

Bartunov, S., Vetrov, D.P.: Fast adaptation in generative models with generative matching networks. In: ICLR 2017 (2017). https://openreview.net/pdf?id=r1IvyjVYl

14.

Bay, H., Tuytelaars, T., Van Gool, L.: SURF: speeded up robust features. In: European Conference on Computer Vision (ECCV), pp. 404–417. Springer, Heidelberg (2006)

15.

Bengio, Y.: Practical recommendations for gradient-based training of deep architectures (2012). https://arXiv.org/abs/1206.5533

16.

Bengio, Y. Boulanger-Lewandowski, N. Pascanu, R.: Advances in optimizing recurrent networks (2012). http://arXiv.org/abs/1212.0901

17.

Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)MATH

18.

Bojarski, M., Del Testa, D., Dworakowski, D., Firner, B., Flepp, B., Goyal, P., Jackel, L.D., Monfort, M., Muller, U., Zhang, J., Zhang, X., Zhao, J., Zieba, K.: End to end learning for self-driving cars. In: CVPR Proceedings (2016). https://arXiv.org/abs/1604.07316

19.

Borji, A., Sihite, D.N., Itti, L.: Quantitative analysis of human-model agreement in visual saliency modeling: a comparative study. IEEE Trans. Image Process. 22(1), 55–69 (2013)MathSciNetCrossRefMATH

20.

Borji, A., Cheng, M.-M., Jiang, H., Li, J.: Salient object detection: a benchmark. IEEE Trans. Image Process. 24(12), 5706–5722 (2015)MathSciNetCrossRef

21.

Bruce, N., Tsotsos, J.: Saliency based on information maximization. NIPS 06, 155–162 (2006)

22.

Bruce, N., Wloka, C., Frosst, N., Rahman, S., Tsotsos, J.: On computational modeling of visual saliency: examining whats right, and whats left. In: Vision Research, vol. 116, Part B, pp. 95–112 (2015)

23.

Cao, Z., Simon, T., Wei, S.E., Sheikh, Y.: Realtime multi-person 2D pose estimation using part affinity fields. In: CVPR Proceedings (2017). https://arXiv.org/abs/1611.08050

24.

Chen, X., Zitnick, C.L.: Mind’s eye: a recurrent visual representation for image caption generation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2422–2431 (2015)

25.

Chen, D., Yuan, L., Liao, J., Yu, N., Hua, G.: StyleBank: an explicit representation for neural image style transfer. In: CVPR Proceedings (2017). https://arXiv.org/abs/1703.09210

26.

Cheng, M.-M., Mitra, N.J., Huang, X., Torr, P.H.S., Hu, S.-M.: Global contrast based salient region detection. IEEE Trans. Pattern Anal. Mach. Intell. 37(3), 569–582 (2015)CrossRef

27.

Chengjian, S., Zhu, S., Shi, Z.: Image annotation via deep neural network. In: 14th IAPR International Conference on Machine Vision Applications (MVA) (2015)

28.

Chung, J., Gulcehre, C., Cho, K., Bengio, Y.: Empirical evaluation of gated recurrent neural networks on sequence modeling (2014). https://arXiv.org/pdf/1412.3555v1.pdf

29.

Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (ELUs) (2015). https://arXiv.org/abs/1511.07289

30.

Collell, G., Moens, M.-F.: Is an image worth more than a thousand words? On the fine-grain semantic differences between visual and linguistic representations. In: NIPS 2016 (2016). http://www.aclweb.org/anthology/C/C16/C16-1264.pdf

31.

Collell, G., Zhang, T., Moens, M.-F.: Imagined visual representations as multimodal embeddings. In: International Conference on Computational Linguistics (COLING) (2017). https://aaai.org/ocs/index.php/AAAI/AAAI17/paper/view/14811/14042

32.

Colombo, F., Muscinelli, S.P., Seeholzer, A., Brea, J., Gerstner, W.: algorithmic composition of melodies with deep recurrent neural networks. In: Proceedings of 1st Conference on Computer Simulation of Musical Creativity (2016)

33.

Datta, R., Joshi, D., Li, J., Wang, J.Z.: Image retrieval: ideas, influences, and trends of the new age. ACM Comput. Surv. 40(2), Article 5 (2008)

34.

Denil, M., Bazzani, L., Larochelle, H., de Freitas, N.: Learning where to attend with deep architectures for image tracking. In: NIPS 2011 (2011). https://arXiv.org/abs/1109.3737

35.

Denton, E.L., Chintala, S., Szlam, A., Fergus, R.: Deep generative image models using a Laplacian pyramid of adversarial networks. In: NIPS Proceedings (2015)

36.

Dodge, S., Karam, L.: Visual saliency prediction using a mixture of deep neural networks (2017). https://arXiv.org/pdf/1702.00372.pdf

37.

Donahue, J., Hendricks, L.A., Rohrbach, M., Venugopalan, S., Guadarrama, S., Saenko, K., Darrell, T.: Long-term recurrent convolutional networks for visual recognition and description (2016). https://arXiv.org/pdf/1411.4389.pdf

38.

Dou, D., Wang, H., Liu, H.: Semantic data mining: a survey of ontology-based approaches. In: Proceedings of the 2015 IEEE 9th International Conference on Semantic Computing (IEEE ICSC 2015), pp. 244–251 (2015)

39.

Dozat, T.: Incorporating Nesterov Momentum into Adam. In: ICLR Workshop (2016). http://cs229.stanford.edu/proj2015/054_report.pdf

40.

Eakins, J.P. Graham, M.E.: Content-based image retrieval, JISC Technology Applications Programme Report 39. (1999). http://www.unn.ac.uk/iidr/CBIR/report.html

41.

Eakins, J.P.: Towards intelligent image retrieval. Pattern Recogn. 35, 3–14 (2002)CrossRefMATH

42.

Enser, P., Sandom, Ch.: Towards a comprehensive survey of the semantic gap in visual image retrieval. In: Proceedings of the Second International Conference on Image and Video Retrieval (CIVR), pp. 291–299 (2003)

43.

Eidenberger, H., Breiteneder, C.: Semantic feature layers in content based image retrieval: implementation of human world features. In: 7th International Conference on Control, Automation, Robotics and Vision, ICARCV 2002 (2002)

44.

Erdmann, M., Maedche, A., Schnurr, H.P., Staab, S.: From manual to semi-automatic semantic annotation: about ontology-based text annotation tools. In: Proceedings of the COLING-2000 Workshop on Semantic Annotation and Intelligent Content, pp. 79–85 (2000)

45.

Escalante, H.J., Hernadez, C.A., Sucar, L.E., Montes, M.: Late fusion of heterogeneous methods for multimedia image retrieval. In: Proceedings of the 1st ACM International Conference on Multimedia Information Retrieval, pp. 172–179 (2008)

46.

Fang, H., Gupta, S., Iandola, F., Srivastava, R., Deng, L., Dollar, P., Gao, J., He, X., Mitchell, M., Platt, J.C., Zitnick, C.L., Zweig, G.: From captions to visual concepts and back (2015). https://arXiv.org/pdf/1411.4952.pdf

47.

Frome, A., Corrado, G., Shlens, J., Bengio, S., Dean, J., Ranzato, M., Mikolov, T.: DeViSE: a deep visual-semantic embedding model. In: Annual Conference on Neural Information Processing Systems (NIPS) (2013)

48.

Farabet, C., Couprie, C., Najman, L., LeCun, Y.: Learning hierarchical features for scene labeling. IEEE Trans. on Pattern Analy. Mach. Intell. 35:8, 1915–1929 (2013)

49.

Fu, J., Mei, T., Yang, K., Lu, H., Rui, Y.: Tagging personal photos with transfer deep learning. In Proceedings of International World Wide Web Conference (IW3C2) (2015)

50.

Fukushima, K.: Neocognitron: a self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biol. Cybern. 36, 193–202 (1980)MathSciNetCrossRefMATH

51.

Ganin, Y., Kononenko, D., Sungatullina, D., Lempitsky, V.: DeepWarp: photorealistic image resynthesis for gaze manipulation. In: 14th Proceedings of European Conference on Computer Vision – ECCV 2016, Amsterdam, The Netherlands, October 11–14, 2016, Part II, pp. 311–326 (2016)

52.

Garcez, A.A., Besold, T.B., de Raedt, L., Foeldiak, P., Hitzler, P., Icard, T., Kuehnberger, K.-U., Lamb, L.C., Miikkulainen, R., Silver, D.L.: Neural-symbolic learning and reasoning: contributions and challenges. In: Proceedings of the AAAI Spring Symposium on Knowledge Representation and Reasoning: Integrating Symbolic and Neural Approaches (2015). https://aaai.org/ocs/index.php/SSS/SSS15/paper/view/10281/10029

53.

Garcia-Diaz, A., Leboran, V., Fdez-Vidal, X.R., Pardo, X.M.: On the relationship between optical variability, visual saliency, and eye fixations: a computational approach. J. Vis. 12(6), 17–17 (2012)CrossRef

54.

Garcia-Garcia, A., Orts-Escolano, S., Oprea, S.O., Villena-Martinez, V., Garcia-Rodriguez, J.: A review on deep learning techniques applied to semantic segmentation. arXiv:1704.06857v1. [cs.CV] 22 Apr 2017

55.

Gatys, L.A., Ecker, A.S., Bethge, M.: Image style transfer using convolutional neural networks. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, pp. 2414–2423 (2016). https://doi.org/10.1109/CVPR.2016.265

56.

Gehring, J., Auli, M., Grangier, D., Yarats, D., Dauphin, Y.N.: Convolutional sequence to sequence learning (2017). https://arXiv.org/pdf/1705.03122.pdf

57.

Girshick, R., Donahu, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. Technical Report (2013). https://arXiv.org/pdf/1311.2524v5.pdf

58.

Girshick, R.: Fast R-CNN (2015). https://arXiv.org/pdf/1504.08083.pdf

59.

Godfrey, L.B,. Gashler, M.S.: A continuum among logarithmic, linear, and exponential functions, and its potential to improve generalization in neural networks. In: Proceedings of 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management, pp. 481–486 (2016). https://arXiv.org/abs/1602.01321

60.

Goldberg, Y., Levy, O.: Word2vec explained: deriving Mikolov et al. Negative-sampling word-embedding method. (2014). https://arXiv.org/pdf/1402.3722v1.pdf

61.

Gong, Y., Jia, Y., Leung, T., Toshev, A., Ioffe, S.: Deep convolutional ranking for multilabel image annotation (2015). https://arXiv.org/abs/1312.4894

62.

Gordo, A., Almazn, J., Revaud, J., Larlus, D.: Deep image retrieval: learning global representations for image search. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 241–257. Springer, Cham (2016)

63.

Goodfellow, I.J., Bulatov, Y., Ibarz, J., Arnoud, S., Shet, V.: Multi-digit number recognition from street view imagery using deep convolutional neural networks (2013). https://arXiv.org/pdf/1312.6082.pdf

64.

Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, J.: Generative adversarial networks (2014). https://arXiv.org/pdf/1406.2661.pdf

65.

Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT (2016)

66.

Gong, Y., Wang, L., Guo, R., Lazebnik, S.: Multi-scale orderless pooling of deep convolutional activation features(2014). https://arXiv.org/abs/1403.1840

67.

Graves, A.: Supervised sequence labelling with recurrent neural networks. Studies in Computational Intelligence, vol. 385, pp. 1–131. Springer, Heidelberg (2012)

68.

Graves, A., Mohamed, A., Hinton, G.: Speech recognition with deep recurrent neural networks. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2013)

69.

Graves, A., Wayne, G., Danihelka, I.: Neural turing machines (2014). https://arXiv.org/pdf/1410.5401.pdf

70.

Gregor, K., Danihelka, I., Graves, A., Wierstra, D.: DRAW: A recurrent neural network for image generation. In: Proceedings of International Conference on Machine Learning ICML (2015)

71.

Hare, J.S., Lewis, P.H., Enser, P.G.B., Sandom, C.J.: Mind the gap: another look at the problem of the semantic gap in image retrieval. In: Proceedings of Multimedia Content Analysis, Management, and Retrieval, vol. 6073 (2006)

72.

Hare, J.S., Lewis, P.H.: Semantic retrieval and automatic annotation: linear transformations, correlation and semantic spaces. In: Imaging and Printing in a Web 2.0. World and Multimedia Content Access: Algorithms and Systems IV, pp. 17–21. (2010)

73.

Harris, C.G., Pike, J.M.: 3D positional integration from image sequences. Image Vis. Comput. 6(2): 8790 (1988)

74.

Haykin, S.: Neural Networks: A Comprehensive Foundation 2 edn. Prentice Hall (1998)

75.

He, R., Xiong, N., Yang, L.T., Park, J.H.: Using multi-modal semantic association rules to fuse keywords and visual features automatically for web image retrieval. In: Information Fusion, vol. 12(3) (2010)

76.

He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on ImageNet classification. In: 2015 IEEE International Conference on Computer Vision, IEEE Computing Society, pp. 1026–1034 (2015). https://doi.org/10.1109/ICCV.2015.123

77.

He, K., Zhang, X., Ren, S., Sun, J.: Deep Residual Learning for Image Recognition. In: CVPR (2016). https://arXiv.org/abs/1512.03385

78.

Hein, A.M.: Identification and bridging of semantic gaps in the context of multi-domain engineering, abstracts of the 2010 Forum on Philosophy, Engineering and Technology. Colorado (2010). http://philengtech.org/wp-content/uploads/2010/05/fPET2010-abstracts-5-1-2010.pdf. Accessed on 16 Aug 2017

79.

Hermann, K.M., Kocisky, T., Grefenstette, E., Espeholt, L., Kay, W., Suleyman, M., Blunsom, P.: Teaching machines to read and comprehend (2015). https://arXiv.org/pdf/1506.03340.pdf

80.

Hermans, M., Schrauwen, B.: Training and analyzing deep recurrent neural networks. In: NIPS 2013 (2013). https://papers.nips.cc/paper/5166-training-and-analysing-deep-recurrent-neural-networks.pdf

81.

Hill, F., Cho, K., Korhonen, A., Bengio, Y.: Learning to understand phrases by embedding the dictionary. Trans. Association Comput. Linguist. 4, 17–30 (2016)

82.

Hinton, G., Salakhutdinov, R.: Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006)MathSciNetCrossRefMATH

83.

Hinton, G.E., Osindero, S., Teh, Y.W.: A fast learning algorithm for deep belief nets. Neural Comput. 18(7), 1527–1554 (2006)MathSciNetCrossRefMATH

84.

Hinton, G.E.: Learning multiple layers of representation. Trends Cognitive Sci. 11, 428–434 (2007)CrossRef

85.

Hinton, G. E.: A practical guide to training restricted Boltzmann machines. Technical Report UTML2010-003. University of Toronto (2010)

86.

Hinton, G., Deng, L., Yu, D., Dahl, G.E., Mohamed, A., Jaitly, N.: Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Sig. Process. Mag. 29(6), 82–97 (2012)CrossRef

87.

Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRef

88.

Hoffer, E., Ailon, N.: Deep metric learning using triplet network. In: International Workshop on Similarity-Based Pattern Recognition, pp. 84–92. Springer, Cham (2015)

89.

Hofmann, T.: Unsupervised learning by probabilistic latent semantic analysis. Mach. Learn. 42(1–2), 177–196 (2001)CrossRefMATH

90.

Hohenecker, P., Lukasiewicz, T.: Deep learning for ontology reasoning (2017). https://arXiv.org/abs/1705.10342

91.

Holder, C.J., Toby, P., Breckon, T.B., Wei, X.: From on-road to off: transfer learning within a deep convolutional neural network for segmentation and classification of off-road scenes. In: European Conference on Computer Vision, pp. 149–162. Springer, Cham (2016)

92.

Hou, X., Zhang, L.: Dynamic visual attention: searching for coding length increments. In: NIPS08, pp. 681–688 (2008)

93.

Hou, J., Zhang, D., Chen, Z., Jiang, L., Zhang, H., Qin, X.: Web image search by automatic image annotation and translation. In: 17th International Conference on Systems, Signals and Image Processing (2010)

94.

Hu, Z., Ma, X., Liu, Z.: Harnessing deep neural networks with logic rules (2016). https://arXiv.org/pdf/1603.06318.pdf

95.

Huang, X., Shen, C., Boix, X., Zhao, Q.: SALICON: reducing the semantic gap in saliency prediction by adapting deep neural networks. In: Proceedings of 2015 IEEE International Conference on Computer Vision, ICCV 2015, vol. 11–18, December 2015, pp. 262–270 (2015)

96.

Huang, A., Wu, R.: Deep learning for music (2016). arXiv:1606.04930v1 [cs.LG]. https://cs224d.stanford.edu/reports/allenh.pdf

97.

Hudelot, C., Atif, J., Bloch, I.: ALC(F): a new description logic for spatial reasoning in images. ECCV Workshops 2, 370–384 (2014)

98.

Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift (2015). https://arXiv.org/abs/1502.03167

99.

Isola, P., Zhu, J.Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. In: CVPR Proceedings (2017). https://arXiv.org/pdf/1611.07004v1.pdf

100.

Jaderberg, M., Simonyan, K., Zisserman, A., Kavukcuoglu, K.: Spatial transformer networks (2015). https://arXiv.org/abs/1506.02025

101.

Jin, X., Xu, C., Feng, J., Wei, Y., Xiong, J., Yan, S.: Deep learning with S-shaped rectified linear activation units (2015). https://arXiv.org/abs/1512.07030

102.

Johnson, J., Krishna, R., Stark, M., Li, L.-J., Shamma, D.A., Bernstein, M., Fei-Fei, L.: Image retrieval using scene graphs. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015)

103.

Johnson, J., Hariharan, B., van der Maaten, L., Hoffman, J., Fei-Fei, L., Zitnick, C.L., Girshick, R.: Inferring and executing programs for visual reasoning (2017). https://arXiv.org/pdf/1705.03633.pdf

104.

Karpathy, A., Joulin, A., Fei-Fei, L.: Deep fragment embeddings for bidirectional image sentence mapping (2014). https://arXiv.org/pdf/1406.5679.pdf. Accessed on 04 Aug 2017

105.

Karpathy, A., Fei-Fei, L.: Deep visual-semantic alignments for generating image descriptions. IEEE Trans. Pattern Anal. Mach. Intell. 39(4), 664–676 (2017)CrossRef

106.

Kim, H.-R., Kim, Y.-S., Kim, S.J., Lee, I.K.: Building emotional machines: recognizing image emotions through deep neural networks (2017). https://arXiv.org/pdf/1705.07543.pdf

107.

Kingma, D.P. Welling, M.: Auto-Encoding Variational Bayes. (2014) CoRR: https://arXiv.org/abs/1312.6114

108.

Kingma, D.P., Ba, J.L.: Adam: a method for stochastic optimization. In: International Conference on Learning Representations, vol. 113 (2015)

109.

Kiros, R., Salakhutdinov, R., Zemel, R.: Multimodal neural language models. In: Proceedings of the 31st International Conference on Machine Learning (ICML) (2014)

110.

Kiros, R., Zhu, Y. Salakhutdinov, R. Zemel, R. S., Torralba, A. Urtasun, R. Fidler, S.: Skip-thought vectors. In: NIPS Proceedings (2015)

111.

Krishna, R., Zhu, Y., Groth, O., Johnson, J., Hata, K., Kravitz, J., Chen, S., Kalantidis, Y., Li, L.J., Shamma, D.A., Bernstein, M.S., Fei-Fei, L.: Visual genome: connecting language and vision using crowdsourced dense image annotations. Int. J. Comput. Vis. 123(1), 32–73 (2017)MathSciNetCrossRef

112.

Krizhevsky, A., Hinton, G.E.: Using very deep autoencoders for content-based image retrieval. In: European Symposium on Artificial Neural Networks ESANN-2011, Bruges, Belgium (2011)

113.

Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. NIPS 2012, 1097–1105 (2012)

114.

Lai, H., Pan,Y., Liu,Y., Yan, S.: Simultaneous feature learning and hash coding with deep neural networks (2015). https://arXiv.org/pdf/1706.06064.pdf. Accessed on 18 Aug 2017

115.

Lample, G., Chaplot, D.S.: Playing FPS games with deep reinforcement learning (2016). https://arXiv.org/abs/1609.05521

116.

Larochelle, H., Hinton G.E.: Learning to combine foveal glimpses with a third-order Boltzmann machine. In: NIPS 2010 (2010)

117.

LeCun, Y. Bengio, Y.: Convolutional Networks for Images, Speech, and Time Series. The Handbook of Brain Theory and Neural Networks, vol. 3361(10) (1995)

118.

LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. In: Proceedings of IEEE 86(11): 2278–2324 (1998)

119.

Lienhart, R., Slaney, M.: pLSA on large scale image databases. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 4, pp. 1217–1220 (2007)

120.

Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Zitnick, C.L., Dollar, P.: Microsoft COCO: common objects in context. In: Computer Vision ECCV Proceedings 2014, pp. 740–755. Springer, Cham (2014)

121.

Lippi, M.: Reasoning with deep learning: an open challenge. In: CEUR Workshop Proceedings (2016). http://ceur-ws.org/Vol-1802/paper5.pdf

122.

Liu, N., Han, J., Zhang, D., Wen, S., Liu, T.: Predicting eye fixations using convolutional neural networks. In: Proceedings IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 362–370 (2015)

123.

Liu, H., Li, B., Lv, X., Huang, Y.: Image retrieval using fused deep convolutional features. Procedia Comput. Sci. 107, 749–754 (2017)CrossRef

124.

Liu, N., Wang, K., Jin, X., Gao, B., Dellandrea, E., Chen, L.: Visual affective classification by combining visual and text features. PLoS ONE 12(8): e0183018 (2017). https://doi.org/10.1371/journal.pone.0183018

125.

Liu, Y., Zhang, D., Lu, G., Ma, W.-Y.: A survey of content-based image retrieval with high-level semantics. Pattern Recogn. 40, 262–282 (2007)CrossRefMATH

126.

Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation (2016). https://arXiv.org/abs/1605.06211

127.

Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)CrossRef

128.

Ma, L., Lu, Z., Shang, L., Li, H.: Multimodal convolutional neural networks for matching image and sentence. In: Proceedings of 2015 IEEE International Conference on Computer Vision (ICCV) (2015)

129.

Ma, L., Lu, Z., Li, H.: Learning to answer questions from image using convolutional neural network. In: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (AAAI-16) (2016)

130.

Maas, A.L., Hannun, A.Y., Ng, A.Y.: Rectifier nonlinearities improve neural network acoustic models. In: Proceedings of ICML, vol. 30 (1) (2013)

131.

Maas, A.L. Hannun, A.Y. Ng, A.Y.: Rectifier nonlinearities improve neural network acoustic models. In: ICML Workshop on Deep Learning for Audio, Speech and Language Processing (2014)

132.

Maillot, N., Thonnat, M.: Ontology based complex object recognition. Image Vis. Comput. 26(1), 102–113 (2008)CrossRef

133.

Malinowski, M., Rohrbach, M., Fritz, M.: Ask your neurons: a deep learning approach to visual question answering (2016). https://arXiv.org/abs/1605.02697

134.

Mao, J., Xu, W., Yang, Y., Wang, J., Yuille, A.L.: Explain images with multimodal recurrent neural networks (2014). https://arXiv.org/pdf/1410.1090.pdf. Accessed 06 Aug 2017

135.

Mao, J., Wei, X., Yang, Y., Wang, J., Huang, Z., Yuille, A. L.: Learning like a child: fast novel visual concept learning from sentence descriptions of images. In: ICCV Proceedings, pp. 2533–2541 (2015)

136.

Mao, J., Xu, J., Jing, Y., Yuille, A.: Training and evaluating multimodal word embeddings with large-scale web annotated images. In: NIPS 2016 Proceedings. http://papers.nips.cc/paper/6590-training-and-evaluating-multimodal-word-embeddings-with-large-scale-web-annotated-images

137.

Mansimov, E., Parisotto, E., Ba, J.L., Salakhutdinov, R.: Generating images from captions with attention. arXiv:1511.02793v2 [cs.LG]. Accessed on 29 Feb 2016

138.

Menegola, A., Fornaciali, M., Pires, R., Avila, S., Valle, E.: Towards automated melanoma screening: exploring transfer learning schemes (2016). https://arXiv.org/pdf/1609.01228.pdf

139.

Mezaris, V. Strintzis, M. G.: Object segmentation and ontologies for MPEG-2 video indexing and retrieval. In: International Conference on Image and Video Retrieval, CIVR 2004. Image and Video Retrieval, pp. 573–581 (2004)

140.

Mikolov, T., Corrado, G., Chen, K., Dean, J.: Efficient estimation of word representations in vector space. In: Proceedings of the International Conference on Learning Representations (ICLR 2013), pp. 1–12 (2013)

141.

Mesnil, G., He, X., Deng, L., Bengio, Y.: Investigation of recurrent-neural-network architectures and learning methods for spoken language understanding. In: Interspeech (2013)

142.

Mikolajczyk, K., Schmid, C.: A performance evaluation of local descriptors. IEEE Trans. Pattern Analy. Mach. Intell. 27(10), 1615–1630 (2005)CrossRef

143.

Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: NIPS Proceedings (2013)

144.

Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., Riedmiller, M.: Playing Atari with deep reinforcement learning (2013). https://arXiv.org/abs/1312.5602

145.

Mnih, V., Heess, N., Graves, A., Kavukcuoglu, K.: Recurrent models of visual attention. In: NIPS Proceedings (2014). https://arXiv.org/abs/1406.6247

146.

Mohamed, A., Dahl, G.E., Hinton, G.: Acoustic modeling using deep belief networks. IEEE Trans. Audio Speech Lang. Process. 20(1), 14–22 (2012)CrossRef

147.

Mosbah, M. Boucheham, B.: Matching measures in the context of CBIR: a comparative study in terms of effectiveness and efficiency. In: World Conference on Information Systems and Technologies. World CIST 2017, pp. 245–258 (2017)

148.

Mozer, M.C.: A focused backpropagation algorithm for temporal pattern recognition. In: Chauvin, Y., Rumelhart, D. (eds.) Backpropagation: Theory, Architectures, and Applications. Research Gate. Lawrence Erlbaum Associates, Hillsdale, NJ, pp. 137–169 (1995)

149.

Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning. In: NIPS Workshop on Deep Learning and Unsupervised Feature Learning (2011). http://ufldl.stanford.edu/housenumbers/nips2011_housenumbers.pdf

150.

Ng, A.: Sparse autoencoder. CS294A Lecture Notes, Stanford University, Stanford, USA, Technical Report, p. 72 (2010). https://web.stanford.edu/class/cs294a/sparseAutoencoder.pdf

151.

Ngiam, J., Khosla, A., Kim, M., Nam, J., Lee, H., Ng, A.Y.: Multimodal deep learning. In: Proceedings of the 28th International Conference on Machine Learning (ICML) (2011)

152.

Nguyen, A., Clune, J., Bengio, Y., Dosovitskiy, A., Yosinski, J.: Plug & Play Generative Networks: Conditional Iterative Generation of Images in Latent Space (2016). https://arXiv.org/pdf/1612.00005.pdf

153.

Nguyen, G.-H., Tamine, L., Soulier, L.: Toward a deep neural approach for knowledge-based IR (2016). https://arXiv.org/pdf/1606.07211.pdf

154.

Noh, H., Hong, S., Han, B.: Learning deconvolution network for semantic segmentation (2015). https://arXiv.org/abs/1505.04366

155.

Noh, H., Seo, P.H., Han, B.: Image question answering using convolutional neural network with dynamic parameter prediction. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, pp. 30–38 (2016). https://doi.org/10.1109/CVPR.2016.11

156.

Novotny, D., Larlus, D., Vedaldi, A.: Learning the structure of objects from web supervision. In: Computer Vision ECCV 2016 Workshops. Amsterdam, The Netherlands, Part 3. LNCS 9915, pp. 218–233 (2016)

157.

Pappas, N., Popescu-Belis, A.: Multilingual hierarchical attention networks for document classification (2017). https://arXiv.org/abs/1707.00896

158.

Parikh, A.P., Taeckstroem, O., Das, D., Uszkoreit, J.: Composable attention model for natural language inference. In: EMNLP 2016 (2016)

159.

Paulin, M., Douze, M., Harchaoui, Z., Mairal, J., Perronin, F., Schmid, C.: Local convolutional features with unsupervised training for image retrieval. In IEEE International Conference on Computer Vision (ICCV), pp. 91–99 (2015)

160.

Peng, B., Lu, Z., Li, H., Wong, K.-F.: Towards neural network-based reasoning (2015). https://arXiv.org/abs/1508.05508

161.

Pennington, J., Socher, R., Manning, C.D.: GloVe: global vectors for word representation (2014). https://nlp.stanford.edu/pubs/glove.pdf

162.

Perez-Rey, D., Anguita, A., Crespo, J.: Ontodataclean: ontology-based integration and preprocessing of distributed data. In: Biological and Medical Data Analysis, pp. 262–272. Springer, Heidelberg (2006)

163.

Peters, R.J., Iyer, A., Itti, L., Koch, C.: Components of bottom-up gaze allocation in natural images. Vis. Res. 45, 2397–2416 (2005)CrossRef

164.

Petrucci, G., Ghidini, C., Rospocher, M.: Ontology learning in the deep. In: European Knowledge Acquisition Workshop EKAW 2016: Knowledge Engineering and Knowledge Management, pp. 480–495 (2016)

165.

Piras, L., Giacinto, G.: Information fusion in content based image retrieval: a comprehensive overview. J. Inf. Fusion. 37(C), 50–60 (2017)

166.

Porello, D., Cristani, M., Ferrario, R.: Integrating ontologies and computer vision for classification of objects in images. In: Proceedings of the Workshop on Neural-Cognitive Integration in German Conference on Artificial Intelligence, pp. 1–15 (2013)

167.

Pyykko, J., Glowacka, D.: Interactive content-based image retrieval with deep neural networks. In: International Workshop on Symbiotic Interaction, pp. 77–88 (2016)

168.

Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger (2016). https://arXiv.org/abs/1612.08242

169.

Reed, S., Akata, Z., Yan, X., Logeswaran, L., Schiele, B., Lee, H.: Generative adversarial text to image synthesis (2016). arXiv:1605.05396v2 [cs.NE]. Accessed on 5 Jun 2016

170.

Reed, S., Akata, Z., Mohan, S., Tenka, S., Schiele, B., Lee, H.: Learning what and where to draw. In: Advances in Neural Information Processing Systems 29, Curran Associates, Inc., pp. 217–225 (2016). http://papers.nips.cc/paper/6111-learning-what-and-where-to-draw.pdf

171.

Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks (2016). https://arXiv.org/pdf/1506.01497v3.pdf

172.

Rezende, D.J., Mohamed, S., Wierstra D.: Stochastic backpropagation and approximate inference in deep generative models (2014). https://arXiv.orabs/1401.4082

173.

Ribeiro, R., Uhl, A., Wimmer, G., Haefner, M.: Exploring deep learning and transfer learning for colonic polyp classification. Comput. Math. Methods Med. (2016)

174.

Riloff, E.: Automatically generating extraction patterns from untagged text. Proc. Nat. Conf. Arti. Intell. 2, 1044–1049 (1996)

175.

Rosten, E., Drummond, T.: Machine learning for high-speed corner detection. In: Proceeding of European Conference on Computer Vision (ECCV 2006), pp. 430–443 (2006)

176.

Saenko, K., Darrell, T.: Unsupervised learning of visual sense models for polysemous word. In: Proceedings of the 22nd Annual Conference on Neural Information Processing Systems. Vancouver, Canada, pp. 1393–1400 (2008)

177.

Sak, H., Senior, A., Beaufays, F.: Long short-term memory recurrent neural network architectures for large scale acoustic modeling (2014). https://arXiv.org/abs/1402.1128

178.

Salakhutinov, R.: Learning deep generative models. Ann. Rev. Stat. Appl. 2015(2), 361–385 (2015)CrossRef

179.

Santoro, A., Raposo, D., Barrett, D.G.T., Malinowski, M., Pascanu, R., Battaglia, P., Lillicrap, T.: A simple neural network module for relational reasoning (2017). https://arXiv.org/pdf/1706.01427.pdf

180.

dos Santos, C., Tan, M., Xiang, B., Zhou, B.: Attentive pooling networks (2016). https://arXiv.org/abs/1602.03609

181.

Sarikaya, R., Hinton, G.E., Deoras, A.: Application of deep belief networks for natural language understanding. IEEE/ACM Trans. Audio Speech Lang. Process. 22(4), 778–784 (2014)CrossRef

182.

Schawinski, K., Zhang, C., Zhang, H., Fowler, L., Santhanam, G.K.: Generative adversarial networks recover features in astrophysical images of galaxies beyond the deconvolution limit. Monthly Notices of the Royal Astronomical Society: Letters: slx008. https://arXiv.org/pdf/1702.00403.pdf

183.

Schuster, S., Krishna, R., Chang, A., Fei-Fei, L., Manning, C.D.: Generating semantically precise scene graphs from textual descriptions for improved image retrieval. In: Proceedings of the Fourth Workshop on Vision and Language, pp. 70–80 (2015)

184.

Shench, C., Song, M., Zhao, Q.: Learning high-level concepts by training a deep network on eye fixations. In: NIPS Deep Learning and Unsup Feat Learn Workshop (2012)

185.

Shen, C., Zhao, Q.: Learning to predict eye fixations for semantic contents using multi-layer sparse network. Neurocomputing 138, 61–68 (2014)CrossRef

186.

Shi, J., Tomasi, C.: Good features to track. In: Proceedings of 1994 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, (CVPR, 1994), pp. 593–600 (1994)

187.

Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: ICLR (2015)

188.

Singh, M.D., Lee, M.: Temporal hierarchies in multilayer gated recurrent neural networks for language models. In: International Joint Conference on Neural Networks (IJCNN) (2017)

189.

Smeulders, A.W.M., Worring, M., Santini, S., Gupta, A., Jain, R.: Content-based image retrieval at the end of the early years. IEEE Trans. Pattern Anal. Mach. Intell. 22(12), 1349–80 (2000)CrossRef

190.

Snoek, C.G.M., Smeulders, A.W.M.: Visual-concept search solved? IEEE Comput. 43(6), 76–78 (2010)CrossRef

191.

Srivastava, N., Salakhutdinov, R.: Multimodal learning with deep Boltzmann machines. In: NIPS 2012 (2012)

192.

Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1929–1958 (2014)MathSciNetMATH

193.

Sun, X., Huang, Z., Yin, H., Shen, H.T.: An integrated model for effective saliency prediction. In: Proceedings of Thirty-First AAAI Conference on Artificial Intelligence (2017)

194.

Sundermeyer, M., Schluter, R., Ney, H.: LSTM neural networks for language modeling. In: Proceedings of Interspeech (2012)

195.

Sutskever, I., Martens, J.: On the importance of initialization and momentum in deep learning (2013). http://doi.org/10.1109/ICASSP.2013.6639346

196.

Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: CVPR (2015)

197.

Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision (2016). https://arXiv.org/abs/1512.00567

198.

Thompson, A., George, N.: Deep Q-learning for humanoid walking. Project advisors: Professor Michael Gennert (2016). https://web.wpi.edu/Pubs/E-project/Available/E-project-042616-142036/unrestricted/Deep_Q-Learning_for_Humanoid_Walking.pdf

199.

Tousch, A.-M., Herbin, S., Audibert, J.-Y.: Semantic hierarchies for image annotation: a survey. Pattern Recogn. 45(1), 333–345 (2012)CrossRef

200.

Town, Ch.: Ontological inference for image and video analysis. Mach. Vis. Appl. 17(2), 94–115 (2006)CrossRef

201.

Traina, A., Marques, J., Traina, C.: Fighting the semantic gap on CBIR systems through new relevance feedback techniques. In: Proceedings of the 19th IEEE Symposium on Computer-Based Medical Systems, pp. 881–886 (2006)

202.

Valle, E., Cord, M.: Advanced techniques in CBIR local descriptors, visual dictionaries and bags of features. In: Tutorials of the XXII Brazilian Symposium on Computer Graphics and Image Processing (SIBGRAPI TUTORIALS), pp. 72–78 (2009)

203.

Venugopalan, S., Xu, H., Donahue, J., Rohrbach, M., Mooney, R., Saenko. K.: Translating videos to natural language using deep recurrent neural networks. arXiv preprint arXiv:1412.4729 (2014)

204.

Vig, E., Dorr, M., Cox, D.: Large-scale optimization of hierarchical features for saliency prediction in natural images. In: CVPR (2014)

205.

Vincent, P., Larochelle, H., Bengio, Y., Manzagol, P.A.: Extracting and composing robust features with denoising autoencoders. In: Proceedings of the Twenty-fifth International Conference on Machine Learning (ICML08), pp 1096–1103. ACM (2008)

206.

Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., Manzagol, P.-A.: Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. J. Mach. Learn. Res. 11, 3371–3408 (2010)MathSciNetMATH

207.

Vinyals, O., Kaiser, L., Koo, T., Petrov, S., Sutskever, I., Hinton, G.: Grammar as a foreign language (2015). CoRR: https://arXiv.org/pdf/1412.7449.pdf

208.

Vinyals, O., Toshev, A., Bengio, S., Erhan, D.: Show and tell: a neural image caption generator. In: CVPR 2015 (2015)

209.

Wan, J., Wang, D., Hoi, S.C.H., Wu, P., Zhu, J., Zhang, Y., Li, J.: Deep learning for content-based image retrieval: a comprehensive study. In: ACM International Conference on Multimedia (MM), pp. 157–166. ACM (2014)

210.

Wang, C., Zhang, L., Zhang, H.: Learning to reduce the semantic gap in web image retrieval and annotation. In: SIGIR08, Singapore (2008)

211.

Wang, H., Cai, Y., Zhang, Y., Pan, H., Lv, W., Han H.: Deep learning for image retrieval: what works and what doesnt. In: IEEE 15th International Conference on Data Mining Workshops, pp. 1576–1583 (2015)

212.

Wang, H.: Semantic Deep Learning, University of Oregon, pp. 1–42 (2015)

213.

Wang, H., Dou, D., Lowd, D.: Ontology-based deep restricted boltzmann machine. In: 27th International Conference on Database and Expert Systems Applications, DEXA 2016, Porto, Portugal, September 5–8, 2016, Proceedings, Part I, pp. 431–445. Springer International Publishing (2016)

214.

Wang, J., Song, Y., Leung, T., Rosenberg, C., Wang, J., Philbin, J., Chen, B., Wu, Y.: Learning fine-grained image similarity with deep ranking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1386–1393. IEEE (2014)

215.

Wang, F., Tax, D.M.J.: Survey on the attention based RNN model and its applications in computer vision (2016). CoRR: https://arXiv.org/pdf/1601.06823.pdf

216.

Wang, L., Li, Y., Lazebnik, S.: Learning deep structure-preserving image-text embeddings. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, pp. 5005–5013 (2016). https://doi.org/10.1109/CVPR.2016.541

217.

Wang, W., Shen, J.: Deep visual attention prediction (2017). CoRR: https://arXiv.org/pdf/1705.02544

218.

Wei, Y., Liang, X., Chen, Y., Shen, X., Cheng, M.-M., Feng, J., Zhao, Y., Yan, S.: STC: a simple to complex framework for weakly-supervised semantic segmentation. IEEE Trans. Pattern Analy. Mach. Intell. (2015)

219.

Weston, J., Chopra, S., Bordes, A.: Memory networks (2015). arXiv:1410.3916v11 [cs.AI] https://arXiv.org/pdf/1410.3916.pdf

220.

Wimalasuriya, D.C., Dou, D.: Ontology-based information extraction: an introduction and a survey of current approaches. J. Inf. Sci. 36(3), 306–323 (2010)CrossRef

221.

Wu, Y., Schuster, M., Chen, Z., Le, Q.V., Norouzi, M., Macherey, W., Krikun, M., Cao, Y., Gao, Q., Macherey, K., Klingner, J., Shah, A., et al.: Google’s neural machine translation system: bridging the gap between human and machine translation (2016). CoRR: https://arXiv.org/abs/1609.08144

222.

Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A., Salakhutdinov, R., Zemel, R., Bengio, Y.: Show, attend and tell: neural image caption generation with visual attention (2016). CoRR: https://arXiv.org/pdf/1502.03044.pdf

223.

Yang, Z., Yang, D., Dyer, C., He, X., Smola, A., Hovy, E.: Hierarchical attention networks for document classification. In: HLT-NAACL Proceedings (2016)

224.

Yin, W., Kann, K., Yu, M., Schuetze, H.: Comparative study of CNN and RNN for natural language processing (2017). https://arXiv.org/pdf/1702.01923.pdf

225.

Yosinski, J., Clune, J., Bengio, Y., Lipson, H.: How transferable are features in deep neural networks? In: Advances in Neural Information Processing Systems 27 (NIPS 14) (2014)

226.

You, Q., Luo, J., Jin, H., Yang, J.: Building a large scale dataset for image emotion recognition: the fine print and the benchmark (2016). CoRR: https://arXiv.org/abs/1605.02677

227.

Yua, S., Jiaa, S., Xu, Ch.: Convolutional neural networks for hyperspectral image classification. Neurocomputing 219, 88–98 (2017)CrossRef

228.

You, Q., Jin, H., Wang, Z., Fang, C., Luo, J.: Image captioning with semantic attention (2016). https://pdfs.semanticscholar.org/bf55/591e09b58ea9ce8d66110d6d3000ee804bdd.pdf

229.

Zeiler, M.D., Fergus, R.: Visualizing and Understanding Convolutional Networks, ECCV 2014. Part I, LNCS 8689, 818–833 (2014)

230.

Zeiler, M. D.: ADADELTA: an adaptive learning rate method (2012). CoRR: http://arXiv.org/abs/1212.5701

231.

Zhang, X., LeCun, Y: Text understanding from scratch. eprint arXiv:1502.01710. (2015). CoRR: https://arXiv.org/pdf/1502.01710.pdf

232.

Zhang, S., Choromanska, A., LeCun, Y.: Deep learning with elastic averaging SGD. In: Neural Information Processing Systems Conference (NIPS 2015) Proceedings, 1–24. CoRR: http://arXiv.org/abs/1412.6651

233.

Zhang, J., Lin, Z., Brandt, J., Shen, X., Sclarof, S.: Top-down neural attention by excitation backprop. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds) Computer Vision ECCV 2016. Lecture Notes in Computer Science, vol. 9908. Springer, Cham (2016)

234.

Zhang, H., Xu, T., Li, H., Zhang, S., Huang, X., Wang, X., Metaxas, D.: StackGAN: text to photo-realistic image synthesis with stacked generative adversarial networks (2016). CoRR: https://arXiv.org/pdf/1612.03242v1.pdf

235.

Zhou, W., Li, H., Tian, T.: Recent advance in content-based image retrieval: a literature survey (2017). CoRR: https://arXiv.org/pdf/1706.06064.pdf. Accessed on 18 Aug 2017

236.

Zhu, J.Y., Wu, J., Xu, Y., Chang, E., Tu, Z.: Unsupervised object class discovery via saliency-guided multiple class learning. IEEE Trans. Pattern Analy. Mach. Intell. 37(4), 862–75 (2015)CrossRef

237.

Zhu, S., Shi, Z., Sun, C., Shen, S.: Deep neural network based image annotation. Pattern Recogn. Lett. 65, 103–108 (2015)CrossRef

Title: Deep Learning—A New Era in Bridging the Semantic Gap
Authors: Urszula Markowska-Kaczmar
Halina Kwaśnicka
Publisher: Springer International Publishing
Book: Bridging the Semantic Gap in Image and Video Analysis
Print ISBN: 978-3-319-73890-1

Electronic ISBN: 978-3-319-73891-8

Copyright Year: 2018
DOI: https://doi.org/10.1007/978-3-319-73891-8_7

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner