Skip to main content
Top

2018 | OriginalPaper | Chapter

7. Deep Learning—A New Era in Bridging the Semantic Gap

Authors : Urszula Markowska-Kaczmar, Halina Kwaśnicka

Published in: Bridging the Semantic Gap in Image and Video Analysis

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

The chapter deals with the semantic gap, the well-known phenomenon in the area of vision systems. Despite the significant efforts of researchers, the problem of how to overcome the semantic gap remains a challenge. One of the most popular research areas, where this problem is present and causes difficulty in obtaining good results, is the task of image retrieval. This chapter focuses on this problem. As deep learning models gain more and more popularity among researchers and more and more spectacular results are obtained, the deep learning models in solving the semantic gap in the Content Based Image Retrieval (CBIR) is the central issue of this chapter. The chapter briefly presents the traditional approaches to CBIR, next introduces shortly into methods and models of deep learning, and shows the application of deep learning at the particular levels of CBIR—features level, common sense knowledge level, and inference about a scene level.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
2.
go back to reference Akcay, S., Kundegorski, M.E., Devereux, M., Breckon, T. P.: Transfer learning using convolutional networks for object recognition within X-ray Baggage security imaginary. In: International Conference on Image Processing ICIP, IEEE, pp. 1057–1061 (2016) Akcay, S., Kundegorski, M.E., Devereux, M., Breckon, T. P.: Transfer learning using convolutional networks for object recognition within X-ray Baggage security imaginary. In: International Conference on Image Processing ICIP, IEEE, pp. 1057–1061 (2016)
3.
go back to reference Alzubi, A., Amira, A., Ramzan, N.: Semantic content-based image retrieval: a comprehensive study. J. Vis. Commun. Image Representation 32, 20–54 (2015)CrossRef Alzubi, A., Amira, A., Ramzan, N.: Semantic content-based image retrieval: a comprehensive study. J. Vis. Commun. Image Representation 32, 20–54 (2015)CrossRef
5.
go back to reference Antol, S., Agrawal, A., Lu, J., Mitchell, M., Batra, D., Zitnick, C.L., Batra, D., Parikh, D.: VQA: visual question answering. In: Proceedings of International Conference on Computer Vision (ICCV), pp. 4–31 (2015) Antol, S., Agrawal, A., Lu, J., Mitchell, M., Batra, D., Zitnick, C.L., Batra, D., Parikh, D.: VQA: visual question answering. In: Proceedings of International Conference on Computer Vision (ICCV), pp. 4–31 (2015)
6.
go back to reference Arandjelovic, R., Zisserman, A.: Three things everyone should know to improve object retrieval. In: Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE Computer Society, Washington, DC, USA, pp. 2911–2918 (2012) Arandjelovic, R., Zisserman, A.: Three things everyone should know to improve object retrieval. In: Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE Computer Society, Washington, DC, USA, pp. 2911–2918 (2012)
8.
go back to reference Atif, J., Hudelot, C., Bloch, I.: Explanatory reasoning for image understanding using formal concept analysis and description logics. IEEE Trans. Syst. Man Cybernetics Syst. 44(5), 552–570 (2014)CrossRef Atif, J., Hudelot, C., Bloch, I.: Explanatory reasoning for image understanding using formal concept analysis and description logics. IEEE Trans. Syst. Man Cybernetics Syst. 44(5), 552–570 (2014)CrossRef
10.
go back to reference Ba, J. Swersky, K. Fidler, S. Salakhutdinov, R.: Predicting deep zero-shot convolutional neural networks using textual descriptions. In: Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), pp. 4247–4255 (2015) Ba, J. Swersky, K. Fidler, S. Salakhutdinov, R.: Predicting deep zero-shot convolutional neural networks using textual descriptions. In: Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), pp. 4247–4255 (2015)
12.
go back to reference Bagdanov, A.D., Bertini, M., Del Bimbo, A., Serra, G., Torniai, C.: Semantic annotation and retrieval of video events using multimedia ontologies. In: International Conference on Semantic Computing (ICSC07), pp. 713–720 (2007) Bagdanov, A.D., Bertini, M., Del Bimbo, A., Serra, G., Torniai, C.: Semantic annotation and retrieval of video events using multimedia ontologies. In: International Conference on Semantic Computing (ICSC07), pp. 713–720 (2007)
14.
go back to reference Bay, H., Tuytelaars, T., Van Gool, L.: SURF: speeded up robust features. In: European Conference on Computer Vision (ECCV), pp. 404–417. Springer, Heidelberg (2006) Bay, H., Tuytelaars, T., Van Gool, L.: SURF: speeded up robust features. In: European Conference on Computer Vision (ECCV), pp. 404–417. Springer, Heidelberg (2006)
17.
go back to reference Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)MATH Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)MATH
18.
go back to reference Bojarski, M., Del Testa, D., Dworakowski, D., Firner, B., Flepp, B., Goyal, P., Jackel, L.D., Monfort, M., Muller, U., Zhang, J., Zhang, X., Zhao, J., Zieba, K.: End to end learning for self-driving cars. In: CVPR Proceedings (2016). https://arXiv.org/abs/1604.07316 Bojarski, M., Del Testa, D., Dworakowski, D., Firner, B., Flepp, B., Goyal, P., Jackel, L.D., Monfort, M., Muller, U., Zhang, J., Zhang, X., Zhao, J., Zieba, K.: End to end learning for self-driving cars. In: CVPR Proceedings (2016). https://​arXiv.​org/​abs/​1604.​07316
19.
go back to reference Borji, A., Sihite, D.N., Itti, L.: Quantitative analysis of human-model agreement in visual saliency modeling: a comparative study. IEEE Trans. Image Process. 22(1), 55–69 (2013)MathSciNetCrossRefMATH Borji, A., Sihite, D.N., Itti, L.: Quantitative analysis of human-model agreement in visual saliency modeling: a comparative study. IEEE Trans. Image Process. 22(1), 55–69 (2013)MathSciNetCrossRefMATH
20.
go back to reference Borji, A., Cheng, M.-M., Jiang, H., Li, J.: Salient object detection: a benchmark. IEEE Trans. Image Process. 24(12), 5706–5722 (2015)MathSciNetCrossRef Borji, A., Cheng, M.-M., Jiang, H., Li, J.: Salient object detection: a benchmark. IEEE Trans. Image Process. 24(12), 5706–5722 (2015)MathSciNetCrossRef
21.
go back to reference Bruce, N., Tsotsos, J.: Saliency based on information maximization. NIPS 06, 155–162 (2006) Bruce, N., Tsotsos, J.: Saliency based on information maximization. NIPS 06, 155–162 (2006)
22.
go back to reference Bruce, N., Wloka, C., Frosst, N., Rahman, S., Tsotsos, J.: On computational modeling of visual saliency: examining whats right, and whats left. In: Vision Research, vol. 116, Part B, pp. 95–112 (2015) Bruce, N., Wloka, C., Frosst, N., Rahman, S., Tsotsos, J.: On computational modeling of visual saliency: examining whats right, and whats left. In: Vision Research, vol. 116, Part B, pp. 95–112 (2015)
24.
go back to reference Chen, X., Zitnick, C.L.: Mind’s eye: a recurrent visual representation for image caption generation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2422–2431 (2015) Chen, X., Zitnick, C.L.: Mind’s eye: a recurrent visual representation for image caption generation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2422–2431 (2015)
26.
go back to reference Cheng, M.-M., Mitra, N.J., Huang, X., Torr, P.H.S., Hu, S.-M.: Global contrast based salient region detection. IEEE Trans. Pattern Anal. Mach. Intell. 37(3), 569–582 (2015)CrossRef Cheng, M.-M., Mitra, N.J., Huang, X., Torr, P.H.S., Hu, S.-M.: Global contrast based salient region detection. IEEE Trans. Pattern Anal. Mach. Intell. 37(3), 569–582 (2015)CrossRef
27.
go back to reference Chengjian, S., Zhu, S., Shi, Z.: Image annotation via deep neural network. In: 14th IAPR International Conference on Machine Vision Applications (MVA) (2015) Chengjian, S., Zhu, S., Shi, Z.: Image annotation via deep neural network. In: 14th IAPR International Conference on Machine Vision Applications (MVA) (2015)
32.
go back to reference Colombo, F., Muscinelli, S.P., Seeholzer, A., Brea, J., Gerstner, W.: algorithmic composition of melodies with deep recurrent neural networks. In: Proceedings of 1st Conference on Computer Simulation of Musical Creativity (2016) Colombo, F., Muscinelli, S.P., Seeholzer, A., Brea, J., Gerstner, W.: algorithmic composition of melodies with deep recurrent neural networks. In: Proceedings of 1st Conference on Computer Simulation of Musical Creativity (2016)
33.
go back to reference Datta, R., Joshi, D., Li, J., Wang, J.Z.: Image retrieval: ideas, influences, and trends of the new age. ACM Comput. Surv. 40(2), Article 5 (2008) Datta, R., Joshi, D., Li, J., Wang, J.Z.: Image retrieval: ideas, influences, and trends of the new age. ACM Comput. Surv. 40(2), Article 5 (2008)
35.
go back to reference Denton, E.L., Chintala, S., Szlam, A., Fergus, R.: Deep generative image models using a Laplacian pyramid of adversarial networks. In: NIPS Proceedings (2015) Denton, E.L., Chintala, S., Szlam, A., Fergus, R.: Deep generative image models using a Laplacian pyramid of adversarial networks. In: NIPS Proceedings (2015)
38.
go back to reference Dou, D., Wang, H., Liu, H.: Semantic data mining: a survey of ontology-based approaches. In: Proceedings of the 2015 IEEE 9th International Conference on Semantic Computing (IEEE ICSC 2015), pp. 244–251 (2015) Dou, D., Wang, H., Liu, H.: Semantic data mining: a survey of ontology-based approaches. In: Proceedings of the 2015 IEEE 9th International Conference on Semantic Computing (IEEE ICSC 2015), pp. 244–251 (2015)
41.
42.
go back to reference Enser, P., Sandom, Ch.: Towards a comprehensive survey of the semantic gap in visual image retrieval. In: Proceedings of the Second International Conference on Image and Video Retrieval (CIVR), pp. 291–299 (2003) Enser, P., Sandom, Ch.: Towards a comprehensive survey of the semantic gap in visual image retrieval. In: Proceedings of the Second International Conference on Image and Video Retrieval (CIVR), pp. 291–299 (2003)
43.
go back to reference Eidenberger, H., Breiteneder, C.: Semantic feature layers in content based image retrieval: implementation of human world features. In: 7th International Conference on Control, Automation, Robotics and Vision, ICARCV 2002 (2002) Eidenberger, H., Breiteneder, C.: Semantic feature layers in content based image retrieval: implementation of human world features. In: 7th International Conference on Control, Automation, Robotics and Vision, ICARCV 2002 (2002)
44.
go back to reference Erdmann, M., Maedche, A., Schnurr, H.P., Staab, S.: From manual to semi-automatic semantic annotation: about ontology-based text annotation tools. In: Proceedings of the COLING-2000 Workshop on Semantic Annotation and Intelligent Content, pp. 79–85 (2000) Erdmann, M., Maedche, A., Schnurr, H.P., Staab, S.: From manual to semi-automatic semantic annotation: about ontology-based text annotation tools. In: Proceedings of the COLING-2000 Workshop on Semantic Annotation and Intelligent Content, pp. 79–85 (2000)
45.
go back to reference Escalante, H.J., Hernadez, C.A., Sucar, L.E., Montes, M.: Late fusion of heterogeneous methods for multimedia image retrieval. In: Proceedings of the 1st ACM International Conference on Multimedia Information Retrieval, pp. 172–179 (2008) Escalante, H.J., Hernadez, C.A., Sucar, L.E., Montes, M.: Late fusion of heterogeneous methods for multimedia image retrieval. In: Proceedings of the 1st ACM International Conference on Multimedia Information Retrieval, pp. 172–179 (2008)
47.
go back to reference Frome, A., Corrado, G., Shlens, J., Bengio, S., Dean, J., Ranzato, M., Mikolov, T.: DeViSE: a deep visual-semantic embedding model. In: Annual Conference on Neural Information Processing Systems (NIPS) (2013) Frome, A., Corrado, G., Shlens, J., Bengio, S., Dean, J., Ranzato, M., Mikolov, T.: DeViSE: a deep visual-semantic embedding model. In: Annual Conference on Neural Information Processing Systems (NIPS) (2013)
48.
go back to reference Farabet, C., Couprie, C., Najman, L., LeCun, Y.: Learning hierarchical features for scene labeling. IEEE Trans. on Pattern Analy. Mach. Intell. 35:8, 1915–1929 (2013) Farabet, C., Couprie, C., Najman, L., LeCun, Y.: Learning hierarchical features for scene labeling. IEEE Trans. on Pattern Analy. Mach. Intell. 35:8, 1915–1929 (2013)
49.
go back to reference Fu, J., Mei, T., Yang, K., Lu, H., Rui, Y.: Tagging personal photos with transfer deep learning. In Proceedings of International World Wide Web Conference (IW3C2) (2015) Fu, J., Mei, T., Yang, K., Lu, H., Rui, Y.: Tagging personal photos with transfer deep learning. In Proceedings of International World Wide Web Conference (IW3C2) (2015)
50.
go back to reference Fukushima, K.: Neocognitron: a self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biol. Cybern. 36, 193–202 (1980)MathSciNetCrossRefMATH Fukushima, K.: Neocognitron: a self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biol. Cybern. 36, 193–202 (1980)MathSciNetCrossRefMATH
51.
go back to reference Ganin, Y., Kononenko, D., Sungatullina, D., Lempitsky, V.: DeepWarp: photorealistic image resynthesis for gaze manipulation. In: 14th Proceedings of European Conference on Computer Vision – ECCV 2016, Amsterdam, The Netherlands, October 11–14, 2016, Part II, pp. 311–326 (2016) Ganin, Y., Kononenko, D., Sungatullina, D., Lempitsky, V.: DeepWarp: photorealistic image resynthesis for gaze manipulation. In: 14th Proceedings of European Conference on Computer Vision – ECCV 2016, Amsterdam, The Netherlands, October 11–14, 2016, Part II, pp. 311–326 (2016)
52.
53.
go back to reference Garcia-Diaz, A., Leboran, V., Fdez-Vidal, X.R., Pardo, X.M.: On the relationship between optical variability, visual saliency, and eye fixations: a computational approach. J. Vis. 12(6), 17–17 (2012)CrossRef Garcia-Diaz, A., Leboran, V., Fdez-Vidal, X.R., Pardo, X.M.: On the relationship between optical variability, visual saliency, and eye fixations: a computational approach. J. Vis. 12(6), 17–17 (2012)CrossRef
54.
go back to reference Garcia-Garcia, A., Orts-Escolano, S., Oprea, S.O., Villena-Martinez, V., Garcia-Rodriguez, J.: A review on deep learning techniques applied to semantic segmentation. arXiv:1704.06857v1. [cs.CV] 22 Apr 2017 Garcia-Garcia, A., Orts-Escolano, S., Oprea, S.O., Villena-Martinez, V., Garcia-Rodriguez, J.: A review on deep learning techniques applied to semantic segmentation. arXiv:​1704.​06857v1. [cs.CV] 22 Apr 2017
59.
go back to reference Godfrey, L.B,. Gashler, M.S.: A continuum among logarithmic, linear, and exponential functions, and its potential to improve generalization in neural networks. In: Proceedings of 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management, pp. 481–486 (2016). https://arXiv.org/abs/1602.01321 Godfrey, L.B,. Gashler, M.S.: A continuum among logarithmic, linear, and exponential functions, and its potential to improve generalization in neural networks. In: Proceedings of 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management, pp. 481–486 (2016). https://​arXiv.​org/​abs/​1602.​01321
62.
go back to reference Gordo, A., Almazn, J., Revaud, J., Larlus, D.: Deep image retrieval: learning global representations for image search. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 241–257. Springer, Cham (2016) Gordo, A., Almazn, J., Revaud, J., Larlus, D.: Deep image retrieval: learning global representations for image search. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 241–257. Springer, Cham (2016)
65.
go back to reference Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT (2016) Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT (2016)
67.
go back to reference Graves, A.: Supervised sequence labelling with recurrent neural networks. Studies in Computational Intelligence, vol. 385, pp. 1–131. Springer, Heidelberg (2012) Graves, A.: Supervised sequence labelling with recurrent neural networks. Studies in Computational Intelligence, vol. 385, pp. 1–131. Springer, Heidelberg (2012)
68.
go back to reference Graves, A., Mohamed, A., Hinton, G.: Speech recognition with deep recurrent neural networks. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2013) Graves, A., Mohamed, A., Hinton, G.: Speech recognition with deep recurrent neural networks. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2013)
70.
go back to reference Gregor, K., Danihelka, I., Graves, A., Wierstra, D.: DRAW: A recurrent neural network for image generation. In: Proceedings of International Conference on Machine Learning ICML (2015) Gregor, K., Danihelka, I., Graves, A., Wierstra, D.: DRAW: A recurrent neural network for image generation. In: Proceedings of International Conference on Machine Learning ICML (2015)
71.
go back to reference Hare, J.S., Lewis, P.H., Enser, P.G.B., Sandom, C.J.: Mind the gap: another look at the problem of the semantic gap in image retrieval. In: Proceedings of Multimedia Content Analysis, Management, and Retrieval, vol. 6073 (2006) Hare, J.S., Lewis, P.H., Enser, P.G.B., Sandom, C.J.: Mind the gap: another look at the problem of the semantic gap in image retrieval. In: Proceedings of Multimedia Content Analysis, Management, and Retrieval, vol. 6073 (2006)
72.
go back to reference Hare, J.S., Lewis, P.H.: Semantic retrieval and automatic annotation: linear transformations, correlation and semantic spaces. In: Imaging and Printing in a Web 2.0. World and Multimedia Content Access: Algorithms and Systems IV, pp. 17–21. (2010) Hare, J.S., Lewis, P.H.: Semantic retrieval and automatic annotation: linear transformations, correlation and semantic spaces. In: Imaging and Printing in a Web 2.0. World and Multimedia Content Access: Algorithms and Systems IV, pp. 17–21. (2010)
73.
go back to reference Harris, C.G., Pike, J.M.: 3D positional integration from image sequences. Image Vis. Comput. 6(2): 8790 (1988) Harris, C.G., Pike, J.M.: 3D positional integration from image sequences. Image Vis. Comput. 6(2): 8790 (1988)
74.
go back to reference Haykin, S.: Neural Networks: A Comprehensive Foundation 2 edn. Prentice Hall (1998) Haykin, S.: Neural Networks: A Comprehensive Foundation 2 edn. Prentice Hall (1998)
75.
go back to reference He, R., Xiong, N., Yang, L.T., Park, J.H.: Using multi-modal semantic association rules to fuse keywords and visual features automatically for web image retrieval. In: Information Fusion, vol. 12(3) (2010) He, R., Xiong, N., Yang, L.T., Park, J.H.: Using multi-modal semantic association rules to fuse keywords and visual features automatically for web image retrieval. In: Information Fusion, vol. 12(3) (2010)
76.
go back to reference He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on ImageNet classification. In: 2015 IEEE International Conference on Computer Vision, IEEE Computing Society, pp. 1026–1034 (2015). https://doi.org/10.1109/ICCV.2015.123 He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on ImageNet classification. In: 2015 IEEE International Conference on Computer Vision, IEEE Computing Society, pp. 1026–1034 (2015). https://​doi.​org/​10.​1109/​ICCV.​2015.​123
81.
go back to reference Hill, F., Cho, K., Korhonen, A., Bengio, Y.: Learning to understand phrases by embedding the dictionary. Trans. Association Comput. Linguist. 4, 17–30 (2016) Hill, F., Cho, K., Korhonen, A., Bengio, Y.: Learning to understand phrases by embedding the dictionary. Trans. Association Comput. Linguist. 4, 17–30 (2016)
82.
83.
84.
go back to reference Hinton, G.E.: Learning multiple layers of representation. Trends Cognitive Sci. 11, 428–434 (2007)CrossRef Hinton, G.E.: Learning multiple layers of representation. Trends Cognitive Sci. 11, 428–434 (2007)CrossRef
85.
go back to reference Hinton, G. E.: A practical guide to training restricted Boltzmann machines. Technical Report UTML2010-003. University of Toronto (2010) Hinton, G. E.: A practical guide to training restricted Boltzmann machines. Technical Report UTML2010-003. University of Toronto (2010)
86.
go back to reference Hinton, G., Deng, L., Yu, D., Dahl, G.E., Mohamed, A., Jaitly, N.: Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Sig. Process. Mag. 29(6), 82–97 (2012)CrossRef Hinton, G., Deng, L., Yu, D., Dahl, G.E., Mohamed, A., Jaitly, N.: Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Sig. Process. Mag. 29(6), 82–97 (2012)CrossRef
87.
go back to reference Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRef Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRef
88.
go back to reference Hoffer, E., Ailon, N.: Deep metric learning using triplet network. In: International Workshop on Similarity-Based Pattern Recognition, pp. 84–92. Springer, Cham (2015) Hoffer, E., Ailon, N.: Deep metric learning using triplet network. In: International Workshop on Similarity-Based Pattern Recognition, pp. 84–92. Springer, Cham (2015)
89.
go back to reference Hofmann, T.: Unsupervised learning by probabilistic latent semantic analysis. Mach. Learn. 42(1–2), 177–196 (2001)CrossRefMATH Hofmann, T.: Unsupervised learning by probabilistic latent semantic analysis. Mach. Learn. 42(1–2), 177–196 (2001)CrossRefMATH
91.
go back to reference Holder, C.J., Toby, P., Breckon, T.B., Wei, X.: From on-road to off: transfer learning within a deep convolutional neural network for segmentation and classification of off-road scenes. In: European Conference on Computer Vision, pp. 149–162. Springer, Cham (2016) Holder, C.J., Toby, P., Breckon, T.B., Wei, X.: From on-road to off: transfer learning within a deep convolutional neural network for segmentation and classification of off-road scenes. In: European Conference on Computer Vision, pp. 149–162. Springer, Cham (2016)
92.
go back to reference Hou, X., Zhang, L.: Dynamic visual attention: searching for coding length increments. In: NIPS08, pp. 681–688 (2008) Hou, X., Zhang, L.: Dynamic visual attention: searching for coding length increments. In: NIPS08, pp. 681–688 (2008)
93.
go back to reference Hou, J., Zhang, D., Chen, Z., Jiang, L., Zhang, H., Qin, X.: Web image search by automatic image annotation and translation. In: 17th International Conference on Systems, Signals and Image Processing (2010) Hou, J., Zhang, D., Chen, Z., Jiang, L., Zhang, H., Qin, X.: Web image search by automatic image annotation and translation. In: 17th International Conference on Systems, Signals and Image Processing (2010)
95.
go back to reference Huang, X., Shen, C., Boix, X., Zhao, Q.: SALICON: reducing the semantic gap in saliency prediction by adapting deep neural networks. In: Proceedings of 2015 IEEE International Conference on Computer Vision, ICCV 2015, vol. 11–18, December 2015, pp. 262–270 (2015) Huang, X., Shen, C., Boix, X., Zhao, Q.: SALICON: reducing the semantic gap in saliency prediction by adapting deep neural networks. In: Proceedings of 2015 IEEE International Conference on Computer Vision, ICCV 2015, vol. 11–18, December 2015, pp. 262–270 (2015)
97.
go back to reference Hudelot, C., Atif, J., Bloch, I.: ALC(F): a new description logic for spatial reasoning in images. ECCV Workshops 2, 370–384 (2014) Hudelot, C., Atif, J., Bloch, I.: ALC(F): a new description logic for spatial reasoning in images. ECCV Workshops 2, 370–384 (2014)
102.
go back to reference Johnson, J., Krishna, R., Stark, M., Li, L.-J., Shamma, D.A., Bernstein, M., Fei-Fei, L.: Image retrieval using scene graphs. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015) Johnson, J., Krishna, R., Stark, M., Li, L.-J., Shamma, D.A., Bernstein, M., Fei-Fei, L.: Image retrieval using scene graphs. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015)
105.
go back to reference Karpathy, A., Fei-Fei, L.: Deep visual-semantic alignments for generating image descriptions. IEEE Trans. Pattern Anal. Mach. Intell. 39(4), 664–676 (2017)CrossRef Karpathy, A., Fei-Fei, L.: Deep visual-semantic alignments for generating image descriptions. IEEE Trans. Pattern Anal. Mach. Intell. 39(4), 664–676 (2017)CrossRef
108.
go back to reference Kingma, D.P., Ba, J.L.: Adam: a method for stochastic optimization. In: International Conference on Learning Representations, vol. 113 (2015) Kingma, D.P., Ba, J.L.: Adam: a method for stochastic optimization. In: International Conference on Learning Representations, vol. 113 (2015)
109.
go back to reference Kiros, R., Salakhutdinov, R., Zemel, R.: Multimodal neural language models. In: Proceedings of the 31st International Conference on Machine Learning (ICML) (2014) Kiros, R., Salakhutdinov, R., Zemel, R.: Multimodal neural language models. In: Proceedings of the 31st International Conference on Machine Learning (ICML) (2014)
110.
go back to reference Kiros, R., Zhu, Y. Salakhutdinov, R. Zemel, R. S., Torralba, A. Urtasun, R. Fidler, S.: Skip-thought vectors. In: NIPS Proceedings (2015) Kiros, R., Zhu, Y. Salakhutdinov, R. Zemel, R. S., Torralba, A. Urtasun, R. Fidler, S.: Skip-thought vectors. In: NIPS Proceedings (2015)
111.
go back to reference Krishna, R., Zhu, Y., Groth, O., Johnson, J., Hata, K., Kravitz, J., Chen, S., Kalantidis, Y., Li, L.J., Shamma, D.A., Bernstein, M.S., Fei-Fei, L.: Visual genome: connecting language and vision using crowdsourced dense image annotations. Int. J. Comput. Vis. 123(1), 32–73 (2017)MathSciNetCrossRef Krishna, R., Zhu, Y., Groth, O., Johnson, J., Hata, K., Kravitz, J., Chen, S., Kalantidis, Y., Li, L.J., Shamma, D.A., Bernstein, M.S., Fei-Fei, L.: Visual genome: connecting language and vision using crowdsourced dense image annotations. Int. J. Comput. Vis. 123(1), 32–73 (2017)MathSciNetCrossRef
112.
go back to reference Krizhevsky, A., Hinton, G.E.: Using very deep autoencoders for content-based image retrieval. In: European Symposium on Artificial Neural Networks ESANN-2011, Bruges, Belgium (2011) Krizhevsky, A., Hinton, G.E.: Using very deep autoencoders for content-based image retrieval. In: European Symposium on Artificial Neural Networks ESANN-2011, Bruges, Belgium (2011)
113.
go back to reference Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. NIPS 2012, 1097–1105 (2012) Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. NIPS 2012, 1097–1105 (2012)
116.
go back to reference Larochelle, H., Hinton G.E.: Learning to combine foveal glimpses with a third-order Boltzmann machine. In: NIPS 2010 (2010) Larochelle, H., Hinton G.E.: Learning to combine foveal glimpses with a third-order Boltzmann machine. In: NIPS 2010 (2010)
117.
go back to reference LeCun, Y. Bengio, Y.: Convolutional Networks for Images, Speech, and Time Series. The Handbook of Brain Theory and Neural Networks, vol. 3361(10) (1995) LeCun, Y. Bengio, Y.: Convolutional Networks for Images, Speech, and Time Series. The Handbook of Brain Theory and Neural Networks, vol. 3361(10) (1995)
118.
go back to reference LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. In: Proceedings of IEEE 86(11): 2278–2324 (1998) LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. In: Proceedings of IEEE 86(11): 2278–2324 (1998)
119.
go back to reference Lienhart, R., Slaney, M.: pLSA on large scale image databases. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 4, pp. 1217–1220 (2007) Lienhart, R., Slaney, M.: pLSA on large scale image databases. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 4, pp. 1217–1220 (2007)
120.
go back to reference Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Zitnick, C.L., Dollar, P.: Microsoft COCO: common objects in context. In: Computer Vision ECCV Proceedings 2014, pp. 740–755. Springer, Cham (2014) Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Zitnick, C.L., Dollar, P.: Microsoft COCO: common objects in context. In: Computer Vision ECCV Proceedings 2014, pp. 740–755. Springer, Cham (2014)
122.
go back to reference Liu, N., Han, J., Zhang, D., Wen, S., Liu, T.: Predicting eye fixations using convolutional neural networks. In: Proceedings IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 362–370 (2015) Liu, N., Han, J., Zhang, D., Wen, S., Liu, T.: Predicting eye fixations using convolutional neural networks. In: Proceedings IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 362–370 (2015)
123.
go back to reference Liu, H., Li, B., Lv, X., Huang, Y.: Image retrieval using fused deep convolutional features. Procedia Comput. Sci. 107, 749–754 (2017)CrossRef Liu, H., Li, B., Lv, X., Huang, Y.: Image retrieval using fused deep convolutional features. Procedia Comput. Sci. 107, 749–754 (2017)CrossRef
125.
go back to reference Liu, Y., Zhang, D., Lu, G., Ma, W.-Y.: A survey of content-based image retrieval with high-level semantics. Pattern Recogn. 40, 262–282 (2007)CrossRefMATH Liu, Y., Zhang, D., Lu, G., Ma, W.-Y.: A survey of content-based image retrieval with high-level semantics. Pattern Recogn. 40, 262–282 (2007)CrossRefMATH
127.
go back to reference Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)CrossRef Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)CrossRef
128.
go back to reference Ma, L., Lu, Z., Shang, L., Li, H.: Multimodal convolutional neural networks for matching image and sentence. In: Proceedings of 2015 IEEE International Conference on Computer Vision (ICCV) (2015) Ma, L., Lu, Z., Shang, L., Li, H.: Multimodal convolutional neural networks for matching image and sentence. In: Proceedings of 2015 IEEE International Conference on Computer Vision (ICCV) (2015)
129.
go back to reference Ma, L., Lu, Z., Li, H.: Learning to answer questions from image using convolutional neural network. In: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (AAAI-16) (2016) Ma, L., Lu, Z., Li, H.: Learning to answer questions from image using convolutional neural network. In: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (AAAI-16) (2016)
130.
go back to reference Maas, A.L., Hannun, A.Y., Ng, A.Y.: Rectifier nonlinearities improve neural network acoustic models. In: Proceedings of ICML, vol. 30 (1) (2013) Maas, A.L., Hannun, A.Y., Ng, A.Y.: Rectifier nonlinearities improve neural network acoustic models. In: Proceedings of ICML, vol. 30 (1) (2013)
131.
go back to reference Maas, A.L. Hannun, A.Y. Ng, A.Y.: Rectifier nonlinearities improve neural network acoustic models. In: ICML Workshop on Deep Learning for Audio, Speech and Language Processing (2014) Maas, A.L. Hannun, A.Y. Ng, A.Y.: Rectifier nonlinearities improve neural network acoustic models. In: ICML Workshop on Deep Learning for Audio, Speech and Language Processing (2014)
132.
go back to reference Maillot, N., Thonnat, M.: Ontology based complex object recognition. Image Vis. Comput. 26(1), 102–113 (2008)CrossRef Maillot, N., Thonnat, M.: Ontology based complex object recognition. Image Vis. Comput. 26(1), 102–113 (2008)CrossRef
135.
go back to reference Mao, J., Wei, X., Yang, Y., Wang, J., Huang, Z., Yuille, A. L.: Learning like a child: fast novel visual concept learning from sentence descriptions of images. In: ICCV Proceedings, pp. 2533–2541 (2015) Mao, J., Wei, X., Yang, Y., Wang, J., Huang, Z., Yuille, A. L.: Learning like a child: fast novel visual concept learning from sentence descriptions of images. In: ICCV Proceedings, pp. 2533–2541 (2015)
137.
go back to reference Mansimov, E., Parisotto, E., Ba, J.L., Salakhutdinov, R.: Generating images from captions with attention. arXiv:1511.02793v2 [cs.LG]. Accessed on 29 Feb 2016 Mansimov, E., Parisotto, E., Ba, J.L., Salakhutdinov, R.: Generating images from captions with attention. arXiv:​1511.​02793v2 [cs.LG]. Accessed on 29 Feb 2016
139.
go back to reference Mezaris, V. Strintzis, M. G.: Object segmentation and ontologies for MPEG-2 video indexing and retrieval. In: International Conference on Image and Video Retrieval, CIVR 2004. Image and Video Retrieval, pp. 573–581 (2004) Mezaris, V. Strintzis, M. G.: Object segmentation and ontologies for MPEG-2 video indexing and retrieval. In: International Conference on Image and Video Retrieval, CIVR 2004. Image and Video Retrieval, pp. 573–581 (2004)
140.
go back to reference Mikolov, T., Corrado, G., Chen, K., Dean, J.: Efficient estimation of word representations in vector space. In: Proceedings of the International Conference on Learning Representations (ICLR 2013), pp. 1–12 (2013) Mikolov, T., Corrado, G., Chen, K., Dean, J.: Efficient estimation of word representations in vector space. In: Proceedings of the International Conference on Learning Representations (ICLR 2013), pp. 1–12 (2013)
141.
go back to reference Mesnil, G., He, X., Deng, L., Bengio, Y.: Investigation of recurrent-neural-network architectures and learning methods for spoken language understanding. In: Interspeech (2013) Mesnil, G., He, X., Deng, L., Bengio, Y.: Investigation of recurrent-neural-network architectures and learning methods for spoken language understanding. In: Interspeech (2013)
142.
go back to reference Mikolajczyk, K., Schmid, C.: A performance evaluation of local descriptors. IEEE Trans. Pattern Analy. Mach. Intell. 27(10), 1615–1630 (2005)CrossRef Mikolajczyk, K., Schmid, C.: A performance evaluation of local descriptors. IEEE Trans. Pattern Analy. Mach. Intell. 27(10), 1615–1630 (2005)CrossRef
143.
go back to reference Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: NIPS Proceedings (2013) Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: NIPS Proceedings (2013)
146.
go back to reference Mohamed, A., Dahl, G.E., Hinton, G.: Acoustic modeling using deep belief networks. IEEE Trans. Audio Speech Lang. Process. 20(1), 14–22 (2012)CrossRef Mohamed, A., Dahl, G.E., Hinton, G.: Acoustic modeling using deep belief networks. IEEE Trans. Audio Speech Lang. Process. 20(1), 14–22 (2012)CrossRef
147.
go back to reference Mosbah, M. Boucheham, B.: Matching measures in the context of CBIR: a comparative study in terms of effectiveness and efficiency. In: World Conference on Information Systems and Technologies. World CIST 2017, pp. 245–258 (2017) Mosbah, M. Boucheham, B.: Matching measures in the context of CBIR: a comparative study in terms of effectiveness and efficiency. In: World Conference on Information Systems and Technologies. World CIST 2017, pp. 245–258 (2017)
148.
go back to reference Mozer, M.C.: A focused backpropagation algorithm for temporal pattern recognition. In: Chauvin, Y., Rumelhart, D. (eds.) Backpropagation: Theory, Architectures, and Applications. Research Gate. Lawrence Erlbaum Associates, Hillsdale, NJ, pp. 137–169 (1995) Mozer, M.C.: A focused backpropagation algorithm for temporal pattern recognition. In: Chauvin, Y., Rumelhart, D. (eds.) Backpropagation: Theory, Architectures, and Applications. Research Gate. Lawrence Erlbaum Associates, Hillsdale, NJ, pp. 137–169 (1995)
151.
go back to reference Ngiam, J., Khosla, A., Kim, M., Nam, J., Lee, H., Ng, A.Y.: Multimodal deep learning. In: Proceedings of the 28th International Conference on Machine Learning (ICML) (2011) Ngiam, J., Khosla, A., Kim, M., Nam, J., Lee, H., Ng, A.Y.: Multimodal deep learning. In: Proceedings of the 28th International Conference on Machine Learning (ICML) (2011)
155.
go back to reference Noh, H., Seo, P.H., Han, B.: Image question answering using convolutional neural network with dynamic parameter prediction. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, pp. 30–38 (2016). https://doi.org/10.1109/CVPR.2016.11 Noh, H., Seo, P.H., Han, B.: Image question answering using convolutional neural network with dynamic parameter prediction. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, pp. 30–38 (2016). https://​doi.​org/​10.​1109/​CVPR.​2016.​11
156.
go back to reference Novotny, D., Larlus, D., Vedaldi, A.: Learning the structure of objects from web supervision. In: Computer Vision ECCV 2016 Workshops. Amsterdam, The Netherlands, Part 3. LNCS 9915, pp. 218–233 (2016) Novotny, D., Larlus, D., Vedaldi, A.: Learning the structure of objects from web supervision. In: Computer Vision ECCV 2016 Workshops. Amsterdam, The Netherlands, Part 3. LNCS 9915, pp. 218–233 (2016)
158.
go back to reference Parikh, A.P., Taeckstroem, O., Das, D., Uszkoreit, J.: Composable attention model for natural language inference. In: EMNLP 2016 (2016) Parikh, A.P., Taeckstroem, O., Das, D., Uszkoreit, J.: Composable attention model for natural language inference. In: EMNLP 2016 (2016)
159.
go back to reference Paulin, M., Douze, M., Harchaoui, Z., Mairal, J., Perronin, F., Schmid, C.: Local convolutional features with unsupervised training for image retrieval. In IEEE International Conference on Computer Vision (ICCV), pp. 91–99 (2015) Paulin, M., Douze, M., Harchaoui, Z., Mairal, J., Perronin, F., Schmid, C.: Local convolutional features with unsupervised training for image retrieval. In IEEE International Conference on Computer Vision (ICCV), pp. 91–99 (2015)
162.
go back to reference Perez-Rey, D., Anguita, A., Crespo, J.: Ontodataclean: ontology-based integration and preprocessing of distributed data. In: Biological and Medical Data Analysis, pp. 262–272. Springer, Heidelberg (2006) Perez-Rey, D., Anguita, A., Crespo, J.: Ontodataclean: ontology-based integration and preprocessing of distributed data. In: Biological and Medical Data Analysis, pp. 262–272. Springer, Heidelberg (2006)
163.
go back to reference Peters, R.J., Iyer, A., Itti, L., Koch, C.: Components of bottom-up gaze allocation in natural images. Vis. Res. 45, 2397–2416 (2005)CrossRef Peters, R.J., Iyer, A., Itti, L., Koch, C.: Components of bottom-up gaze allocation in natural images. Vis. Res. 45, 2397–2416 (2005)CrossRef
164.
go back to reference Petrucci, G., Ghidini, C., Rospocher, M.: Ontology learning in the deep. In: European Knowledge Acquisition Workshop EKAW 2016: Knowledge Engineering and Knowledge Management, pp. 480–495 (2016) Petrucci, G., Ghidini, C., Rospocher, M.: Ontology learning in the deep. In: European Knowledge Acquisition Workshop EKAW 2016: Knowledge Engineering and Knowledge Management, pp. 480–495 (2016)
165.
go back to reference Piras, L., Giacinto, G.: Information fusion in content based image retrieval: a comprehensive overview. J. Inf. Fusion. 37(C), 50–60 (2017) Piras, L., Giacinto, G.: Information fusion in content based image retrieval: a comprehensive overview. J. Inf. Fusion. 37(C), 50–60 (2017)
166.
go back to reference Porello, D., Cristani, M., Ferrario, R.: Integrating ontologies and computer vision for classification of objects in images. In: Proceedings of the Workshop on Neural-Cognitive Integration in German Conference on Artificial Intelligence, pp. 1–15 (2013) Porello, D., Cristani, M., Ferrario, R.: Integrating ontologies and computer vision for classification of objects in images. In: Proceedings of the Workshop on Neural-Cognitive Integration in German Conference on Artificial Intelligence, pp. 1–15 (2013)
167.
go back to reference Pyykko, J., Glowacka, D.: Interactive content-based image retrieval with deep neural networks. In: International Workshop on Symbiotic Interaction, pp. 77–88 (2016) Pyykko, J., Glowacka, D.: Interactive content-based image retrieval with deep neural networks. In: International Workshop on Symbiotic Interaction, pp. 77–88 (2016)
169.
go back to reference Reed, S., Akata, Z., Yan, X., Logeswaran, L., Schiele, B., Lee, H.: Generative adversarial text to image synthesis (2016). arXiv:1605.05396v2 [cs.NE]. Accessed on 5 Jun 2016 Reed, S., Akata, Z., Yan, X., Logeswaran, L., Schiele, B., Lee, H.: Generative adversarial text to image synthesis (2016). arXiv:​1605.​05396v2 [cs.NE]. Accessed on 5 Jun 2016
173.
go back to reference Ribeiro, R., Uhl, A., Wimmer, G., Haefner, M.: Exploring deep learning and transfer learning for colonic polyp classification. Comput. Math. Methods Med. (2016) Ribeiro, R., Uhl, A., Wimmer, G., Haefner, M.: Exploring deep learning and transfer learning for colonic polyp classification. Comput. Math. Methods Med. (2016)
174.
go back to reference Riloff, E.: Automatically generating extraction patterns from untagged text. Proc. Nat. Conf. Arti. Intell. 2, 1044–1049 (1996) Riloff, E.: Automatically generating extraction patterns from untagged text. Proc. Nat. Conf. Arti. Intell. 2, 1044–1049 (1996)
175.
go back to reference Rosten, E., Drummond, T.: Machine learning for high-speed corner detection. In: Proceeding of European Conference on Computer Vision (ECCV 2006), pp. 430–443 (2006) Rosten, E., Drummond, T.: Machine learning for high-speed corner detection. In: Proceeding of European Conference on Computer Vision (ECCV 2006), pp. 430–443 (2006)
176.
go back to reference Saenko, K., Darrell, T.: Unsupervised learning of visual sense models for polysemous word. In: Proceedings of the 22nd Annual Conference on Neural Information Processing Systems. Vancouver, Canada, pp. 1393–1400 (2008) Saenko, K., Darrell, T.: Unsupervised learning of visual sense models for polysemous word. In: Proceedings of the 22nd Annual Conference on Neural Information Processing Systems. Vancouver, Canada, pp. 1393–1400 (2008)
178.
go back to reference Salakhutinov, R.: Learning deep generative models. Ann. Rev. Stat. Appl. 2015(2), 361–385 (2015)CrossRef Salakhutinov, R.: Learning deep generative models. Ann. Rev. Stat. Appl. 2015(2), 361–385 (2015)CrossRef
181.
go back to reference Sarikaya, R., Hinton, G.E., Deoras, A.: Application of deep belief networks for natural language understanding. IEEE/ACM Trans. Audio Speech Lang. Process. 22(4), 778–784 (2014)CrossRef Sarikaya, R., Hinton, G.E., Deoras, A.: Application of deep belief networks for natural language understanding. IEEE/ACM Trans. Audio Speech Lang. Process. 22(4), 778–784 (2014)CrossRef
182.
go back to reference Schawinski, K., Zhang, C., Zhang, H., Fowler, L., Santhanam, G.K.: Generative adversarial networks recover features in astrophysical images of galaxies beyond the deconvolution limit. Monthly Notices of the Royal Astronomical Society: Letters: slx008. https://arXiv.org/pdf/1702.00403.pdf Schawinski, K., Zhang, C., Zhang, H., Fowler, L., Santhanam, G.K.: Generative adversarial networks recover features in astrophysical images of galaxies beyond the deconvolution limit. Monthly Notices of the Royal Astronomical Society: Letters: slx008. https://​arXiv.​org/​pdf/​1702.​00403.​pdf
183.
go back to reference Schuster, S., Krishna, R., Chang, A., Fei-Fei, L., Manning, C.D.: Generating semantically precise scene graphs from textual descriptions for improved image retrieval. In: Proceedings of the Fourth Workshop on Vision and Language, pp. 70–80 (2015) Schuster, S., Krishna, R., Chang, A., Fei-Fei, L., Manning, C.D.: Generating semantically precise scene graphs from textual descriptions for improved image retrieval. In: Proceedings of the Fourth Workshop on Vision and Language, pp. 70–80 (2015)
184.
go back to reference Shench, C., Song, M., Zhao, Q.: Learning high-level concepts by training a deep network on eye fixations. In: NIPS Deep Learning and Unsup Feat Learn Workshop (2012) Shench, C., Song, M., Zhao, Q.: Learning high-level concepts by training a deep network on eye fixations. In: NIPS Deep Learning and Unsup Feat Learn Workshop (2012)
185.
go back to reference Shen, C., Zhao, Q.: Learning to predict eye fixations for semantic contents using multi-layer sparse network. Neurocomputing 138, 61–68 (2014)CrossRef Shen, C., Zhao, Q.: Learning to predict eye fixations for semantic contents using multi-layer sparse network. Neurocomputing 138, 61–68 (2014)CrossRef
186.
go back to reference Shi, J., Tomasi, C.: Good features to track. In: Proceedings of 1994 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, (CVPR, 1994), pp. 593–600 (1994) Shi, J., Tomasi, C.: Good features to track. In: Proceedings of 1994 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, (CVPR, 1994), pp. 593–600 (1994)
187.
go back to reference Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: ICLR (2015) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: ICLR (2015)
188.
go back to reference Singh, M.D., Lee, M.: Temporal hierarchies in multilayer gated recurrent neural networks for language models. In: International Joint Conference on Neural Networks (IJCNN) (2017) Singh, M.D., Lee, M.: Temporal hierarchies in multilayer gated recurrent neural networks for language models. In: International Joint Conference on Neural Networks (IJCNN) (2017)
189.
go back to reference Smeulders, A.W.M., Worring, M., Santini, S., Gupta, A., Jain, R.: Content-based image retrieval at the end of the early years. IEEE Trans. Pattern Anal. Mach. Intell. 22(12), 1349–80 (2000)CrossRef Smeulders, A.W.M., Worring, M., Santini, S., Gupta, A., Jain, R.: Content-based image retrieval at the end of the early years. IEEE Trans. Pattern Anal. Mach. Intell. 22(12), 1349–80 (2000)CrossRef
190.
go back to reference Snoek, C.G.M., Smeulders, A.W.M.: Visual-concept search solved? IEEE Comput. 43(6), 76–78 (2010)CrossRef Snoek, C.G.M., Smeulders, A.W.M.: Visual-concept search solved? IEEE Comput. 43(6), 76–78 (2010)CrossRef
191.
go back to reference Srivastava, N., Salakhutdinov, R.: Multimodal learning with deep Boltzmann machines. In: NIPS 2012 (2012) Srivastava, N., Salakhutdinov, R.: Multimodal learning with deep Boltzmann machines. In: NIPS 2012 (2012)
192.
go back to reference Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1929–1958 (2014)MathSciNetMATH Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1929–1958 (2014)MathSciNetMATH
193.
go back to reference Sun, X., Huang, Z., Yin, H., Shen, H.T.: An integrated model for effective saliency prediction. In: Proceedings of Thirty-First AAAI Conference on Artificial Intelligence (2017) Sun, X., Huang, Z., Yin, H., Shen, H.T.: An integrated model for effective saliency prediction. In: Proceedings of Thirty-First AAAI Conference on Artificial Intelligence (2017)
194.
go back to reference Sundermeyer, M., Schluter, R., Ney, H.: LSTM neural networks for language modeling. In: Proceedings of Interspeech (2012) Sundermeyer, M., Schluter, R., Ney, H.: LSTM neural networks for language modeling. In: Proceedings of Interspeech (2012)
196.
go back to reference Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: CVPR (2015) Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: CVPR (2015)
199.
go back to reference Tousch, A.-M., Herbin, S., Audibert, J.-Y.: Semantic hierarchies for image annotation: a survey. Pattern Recogn. 45(1), 333–345 (2012)CrossRef Tousch, A.-M., Herbin, S., Audibert, J.-Y.: Semantic hierarchies for image annotation: a survey. Pattern Recogn. 45(1), 333–345 (2012)CrossRef
200.
go back to reference Town, Ch.: Ontological inference for image and video analysis. Mach. Vis. Appl. 17(2), 94–115 (2006)CrossRef Town, Ch.: Ontological inference for image and video analysis. Mach. Vis. Appl. 17(2), 94–115 (2006)CrossRef
201.
go back to reference Traina, A., Marques, J., Traina, C.: Fighting the semantic gap on CBIR systems through new relevance feedback techniques. In: Proceedings of the 19th IEEE Symposium on Computer-Based Medical Systems, pp. 881–886 (2006) Traina, A., Marques, J., Traina, C.: Fighting the semantic gap on CBIR systems through new relevance feedback techniques. In: Proceedings of the 19th IEEE Symposium on Computer-Based Medical Systems, pp. 881–886 (2006)
202.
go back to reference Valle, E., Cord, M.: Advanced techniques in CBIR local descriptors, visual dictionaries and bags of features. In: Tutorials of the XXII Brazilian Symposium on Computer Graphics and Image Processing (SIBGRAPI TUTORIALS), pp. 72–78 (2009) Valle, E., Cord, M.: Advanced techniques in CBIR local descriptors, visual dictionaries and bags of features. In: Tutorials of the XXII Brazilian Symposium on Computer Graphics and Image Processing (SIBGRAPI TUTORIALS), pp. 72–78 (2009)
203.
go back to reference Venugopalan, S., Xu, H., Donahue, J., Rohrbach, M., Mooney, R., Saenko. K.: Translating videos to natural language using deep recurrent neural networks. arXiv preprint arXiv:1412.4729 (2014) Venugopalan, S., Xu, H., Donahue, J., Rohrbach, M., Mooney, R., Saenko. K.: Translating videos to natural language using deep recurrent neural networks. arXiv preprint arXiv:​1412.​4729 (2014)
204.
go back to reference Vig, E., Dorr, M., Cox, D.: Large-scale optimization of hierarchical features for saliency prediction in natural images. In: CVPR (2014) Vig, E., Dorr, M., Cox, D.: Large-scale optimization of hierarchical features for saliency prediction in natural images. In: CVPR (2014)
205.
go back to reference Vincent, P., Larochelle, H., Bengio, Y., Manzagol, P.A.: Extracting and composing robust features with denoising autoencoders. In: Proceedings of the Twenty-fifth International Conference on Machine Learning (ICML08), pp 1096–1103. ACM (2008) Vincent, P., Larochelle, H., Bengio, Y., Manzagol, P.A.: Extracting and composing robust features with denoising autoencoders. In: Proceedings of the Twenty-fifth International Conference on Machine Learning (ICML08), pp 1096–1103. ACM (2008)
206.
go back to reference Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., Manzagol, P.-A.: Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. J. Mach. Learn. Res. 11, 3371–3408 (2010)MathSciNetMATH Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., Manzagol, P.-A.: Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. J. Mach. Learn. Res. 11, 3371–3408 (2010)MathSciNetMATH
208.
go back to reference Vinyals, O., Toshev, A., Bengio, S., Erhan, D.: Show and tell: a neural image caption generator. In: CVPR 2015 (2015) Vinyals, O., Toshev, A., Bengio, S., Erhan, D.: Show and tell: a neural image caption generator. In: CVPR 2015 (2015)
209.
go back to reference Wan, J., Wang, D., Hoi, S.C.H., Wu, P., Zhu, J., Zhang, Y., Li, J.: Deep learning for content-based image retrieval: a comprehensive study. In: ACM International Conference on Multimedia (MM), pp. 157–166. ACM (2014) Wan, J., Wang, D., Hoi, S.C.H., Wu, P., Zhu, J., Zhang, Y., Li, J.: Deep learning for content-based image retrieval: a comprehensive study. In: ACM International Conference on Multimedia (MM), pp. 157–166. ACM (2014)
210.
go back to reference Wang, C., Zhang, L., Zhang, H.: Learning to reduce the semantic gap in web image retrieval and annotation. In: SIGIR08, Singapore (2008) Wang, C., Zhang, L., Zhang, H.: Learning to reduce the semantic gap in web image retrieval and annotation. In: SIGIR08, Singapore (2008)
211.
go back to reference Wang, H., Cai, Y., Zhang, Y., Pan, H., Lv, W., Han H.: Deep learning for image retrieval: what works and what doesnt. In: IEEE 15th International Conference on Data Mining Workshops, pp. 1576–1583 (2015) Wang, H., Cai, Y., Zhang, Y., Pan, H., Lv, W., Han H.: Deep learning for image retrieval: what works and what doesnt. In: IEEE 15th International Conference on Data Mining Workshops, pp. 1576–1583 (2015)
212.
go back to reference Wang, H.: Semantic Deep Learning, University of Oregon, pp. 1–42 (2015) Wang, H.: Semantic Deep Learning, University of Oregon, pp. 1–42 (2015)
213.
go back to reference Wang, H., Dou, D., Lowd, D.: Ontology-based deep restricted boltzmann machine. In: 27th International Conference on Database and Expert Systems Applications, DEXA 2016, Porto, Portugal, September 5–8, 2016, Proceedings, Part I, pp. 431–445. Springer International Publishing (2016) Wang, H., Dou, D., Lowd, D.: Ontology-based deep restricted boltzmann machine. In: 27th International Conference on Database and Expert Systems Applications, DEXA 2016, Porto, Portugal, September 5–8, 2016, Proceedings, Part I, pp. 431–445. Springer International Publishing (2016)
214.
go back to reference Wang, J., Song, Y., Leung, T., Rosenberg, C., Wang, J., Philbin, J., Chen, B., Wu, Y.: Learning fine-grained image similarity with deep ranking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1386–1393. IEEE (2014) Wang, J., Song, Y., Leung, T., Rosenberg, C., Wang, J., Philbin, J., Chen, B., Wu, Y.: Learning fine-grained image similarity with deep ranking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1386–1393. IEEE (2014)
218.
go back to reference Wei, Y., Liang, X., Chen, Y., Shen, X., Cheng, M.-M., Feng, J., Zhao, Y., Yan, S.: STC: a simple to complex framework for weakly-supervised semantic segmentation. IEEE Trans. Pattern Analy. Mach. Intell. (2015) Wei, Y., Liang, X., Chen, Y., Shen, X., Cheng, M.-M., Feng, J., Zhao, Y., Yan, S.: STC: a simple to complex framework for weakly-supervised semantic segmentation. IEEE Trans. Pattern Analy. Mach. Intell. (2015)
220.
go back to reference Wimalasuriya, D.C., Dou, D.: Ontology-based information extraction: an introduction and a survey of current approaches. J. Inf. Sci. 36(3), 306–323 (2010)CrossRef Wimalasuriya, D.C., Dou, D.: Ontology-based information extraction: an introduction and a survey of current approaches. J. Inf. Sci. 36(3), 306–323 (2010)CrossRef
221.
go back to reference Wu, Y., Schuster, M., Chen, Z., Le, Q.V., Norouzi, M., Macherey, W., Krikun, M., Cao, Y., Gao, Q., Macherey, K., Klingner, J., Shah, A., et al.: Google’s neural machine translation system: bridging the gap between human and machine translation (2016). CoRR: https://arXiv.org/abs/1609.08144 Wu, Y., Schuster, M., Chen, Z., Le, Q.V., Norouzi, M., Macherey, W., Krikun, M., Cao, Y., Gao, Q., Macherey, K., Klingner, J., Shah, A., et al.: Google’s neural machine translation system: bridging the gap between human and machine translation (2016). CoRR: https://​arXiv.​org/​abs/​1609.​08144
223.
go back to reference Yang, Z., Yang, D., Dyer, C., He, X., Smola, A., Hovy, E.: Hierarchical attention networks for document classification. In: HLT-NAACL Proceedings (2016) Yang, Z., Yang, D., Dyer, C., He, X., Smola, A., Hovy, E.: Hierarchical attention networks for document classification. In: HLT-NAACL Proceedings (2016)
225.
go back to reference Yosinski, J., Clune, J., Bengio, Y., Lipson, H.: How transferable are features in deep neural networks? In: Advances in Neural Information Processing Systems 27 (NIPS 14) (2014) Yosinski, J., Clune, J., Bengio, Y., Lipson, H.: How transferable are features in deep neural networks? In: Advances in Neural Information Processing Systems 27 (NIPS 14) (2014)
227.
go back to reference Yua, S., Jiaa, S., Xu, Ch.: Convolutional neural networks for hyperspectral image classification. Neurocomputing 219, 88–98 (2017)CrossRef Yua, S., Jiaa, S., Xu, Ch.: Convolutional neural networks for hyperspectral image classification. Neurocomputing 219, 88–98 (2017)CrossRef
229.
go back to reference Zeiler, M.D., Fergus, R.: Visualizing and Understanding Convolutional Networks, ECCV 2014. Part I, LNCS 8689, 818–833 (2014) Zeiler, M.D., Fergus, R.: Visualizing and Understanding Convolutional Networks, ECCV 2014. Part I, LNCS 8689, 818–833 (2014)
233.
go back to reference Zhang, J., Lin, Z., Brandt, J., Shen, X., Sclarof, S.: Top-down neural attention by excitation backprop. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds) Computer Vision ECCV 2016. Lecture Notes in Computer Science, vol. 9908. Springer, Cham (2016) Zhang, J., Lin, Z., Brandt, J., Shen, X., Sclarof, S.: Top-down neural attention by excitation backprop. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds) Computer Vision ECCV 2016. Lecture Notes in Computer Science, vol. 9908. Springer, Cham (2016)
236.
go back to reference Zhu, J.Y., Wu, J., Xu, Y., Chang, E., Tu, Z.: Unsupervised object class discovery via saliency-guided multiple class learning. IEEE Trans. Pattern Analy. Mach. Intell. 37(4), 862–75 (2015)CrossRef Zhu, J.Y., Wu, J., Xu, Y., Chang, E., Tu, Z.: Unsupervised object class discovery via saliency-guided multiple class learning. IEEE Trans. Pattern Analy. Mach. Intell. 37(4), 862–75 (2015)CrossRef
237.
go back to reference Zhu, S., Shi, Z., Sun, C., Shen, S.: Deep neural network based image annotation. Pattern Recogn. Lett. 65, 103–108 (2015)CrossRef Zhu, S., Shi, Z., Sun, C., Shen, S.: Deep neural network based image annotation. Pattern Recogn. Lett. 65, 103–108 (2015)CrossRef
Metadata
Title
Deep Learning—A New Era in Bridging the Semantic Gap
Authors
Urszula Markowska-Kaczmar
Halina Kwaśnicka
Copyright Year
2018
DOI
https://doi.org/10.1007/978-3-319-73891-8_7

Premium Partner