Abstract
Ontologies have been intensively applied for improving multimedia search and retrieval by providing explicit meaning to visual content. Several multimedia ontologies have been recently proposed as knowledge models suitable for narrowing the well known semantic gap and for enabling the semantic interpretation of images. Since these ontologies have been created in different application contexts, establishing links between them, a task known as ontology matching, promises to fully unlock their potential in support of multimedia search and retrieval. This paper proposes and compares empirically two extensional ontology matching techniques applied to an important semantic image retrieval issue: automatically associating common-sense knowledge to multimedia concepts. First, we extend a previously introduced textual concept matching approach to use both textual and visual representation of images. In addition, a novel matching technique based on a multi-modal graph is proposed. We argue that the textual and visual modalities have to be seen as complementary rather than as exclusive sources of extensional information in order to improve the efficiency of the application of an ontology matching approach in the multimedia domain. An experimental evaluation is included in the paper.
Similar content being viewed by others
Notes
The temporal aspect of the ontology application for multimedia falls out of the scope of the current study. We refer the interested reader to the project www.vidivideo.info and the publications found there.
This constraint will be lifted for the graph-based approach.
Pearson’s measure, also discussed in [27] showed to compete closely with Spearman’s.
A good heuristics is to set that threshold at the number of concepts to be kept, k′.
The VSBM approaches have also been tested on this larger concept collection and vocabulary. The achieved results are not reported here as they were not significantly different from the ones achieved on the lower scale.
References
Athanasiadis T, Tzouvaras V, Petridis K, Precioso F, Avrithis Y, Kompatsiaris Y (2005) Using a multimedia ontology infrastructure for semantic annotation of multimedia content. In: SemAnnot’05
Dasiopoulou S, Kompatsiaris I, Strintzis M (2008) Using fuzzy dls to enhance semantic image analysis. In: Semantic multimedia. Springer, pp 31–46
Dasiopoulou S, Tzouvaras V, Kompatsiaris I, Strintzis M (2010) Enquiring MPEG-7 based multimedia ontologies. Multimed Tools Appl 46(2):1–40
Deng J, Dong W, Socher R, Li L, Li K, Fei-Fei L (2009) ImageNet: a large-scale hierarchical image database. In: CVPR, pp 710–719
Doan A, Madhavan J, Domingos P, Halevy A (2002) Learning to map between ontologies on the semantic web. In: WWW’02. ACM Press, pp 662–673
Euzenat J, Shvaiko P (2007) Ontology matching, 1st edn. Springer
Fan J, Luo H, Shen Y, Yang C (2009) Integrating visual and semantic contexts for topic network generation and word sense disambiguation. In: ACM CIVR’09, pp 1–8
Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn 3(1):1157–1182
Haveliwala T (2003) Topic-sensitive pagerank: a context-sensitive ranking algorithm for web search. IEEE Trans Knowl Data Eng 15(4):784–796
Hudelot C, Atif J, Bloch I (2008) Fuzzy spatial relation ontology for image interpretation. Fuzzy Sets Syst 159:1929–1951
Hudelot C, Maillot N, Thonnat M (2005) Symbol grounding for semantic image interpretation: from image data to semantics. In: SKCV-workshop, ICCV
Inoue, M (2004) On the need for annotation-based image retrieval. In: Proceedings of the workshop on information retrieval in context (IRiX), Sheffield, UK, pp 44–46
James N, Todorov K, Hudelot C (2010) Ontology matching for the semantic annotation of images. In: FUZZ-IEEE. IEEE Computer Society Press, Los Alamitos
Koskela M, Smeaton A (2007) An empirical study of inter-concept similarities in multimedia ontologies. In: CIVR’07. ACM, pp 464–471
Lacher MS, Groh G (2001) Facilitating the exchange of explicit knowledge through ontology mappings. In: In Proceedings of the 14th int FLAIRS conference. AAAI Press, pp 305–309
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60:91–110. doi:10.1023/B:VISI.0000029664.99615.94
Mihalcea R, Tarau P, Figa E (2004) Pagerank on semantic networks, with application to word sense disambiguation. In: ICCL. Association for Computational Linguistics, p 1126
Miller G (1995) WordNet: a lexical database for English. Commun ACM 38(11):39–41
Pan J, Yang H, Faloutsos C, Duygulu P (2004) Automatic multimedia cross-modal correlation discovery. In: ACM SIGKDD. ACM, p 658
Peraldi ISE, Kaya A, Möller R (2009) Formalizing multimedia interpretation based on abduction over description logic aboxes. In: Description logics
Russell B, Torralba A, Murphy K, Freeman W (2008) LabelMe: a database and web-based tool for image annotation. IJCV 77(1):157–173
Smeulders A, Worring M, Santini S, Gupta A, Jain R (2000) Content-based image retrieval at the end of the early years. IEEE Trans Pattern Anal Mach Intell 22(12):1349–1380
Smith J, Chang S (2006) Large-scale concept ontology for multimedia. IEEE Multimed 13(3):86–91
Snoek C, Huurnink B, Hollink L, De Rijke M, Schreiber G, Worring M (2007) Adding semantics to detectors for video retrieval. IEEE Trans Multimedia 9(5):975–986
Stumme G, Maedche A (2001) Fca-merge: bottom-up merging of ontologies. In: International joint conference on artificial intelligence, pp 225–230
Tansley R (1998) The multimedia thesaurus: an aid for multimedia information retrieval and navigation. Master’s thesis
Todorov K, Geibel P, Kühnberger K-U (2010) Extensional ontology matching with variable selection for support vector machines. In: CISIS. IEEE Computer Society Press, Los Alamitos, pp 962–968
Tong H, Faloutsos C, Pan J-Y (2006) Fast random walk with restart and its applications. In: Industrial conference on data mining 06. IEEE Computer Society, Washington, pp 613–622
Wang C, Jing F, Zhang L, Zhang H (2006) Image annotation refinement using random walk with restarts. In: ACM multimedia, p 650
Wu L, Hua X-S, Yu N, Ma W-Y, Li S (2008) Flickr distance. In: Multimedia 08. ACM, pp 31–40
Yang Y, Pedersen JO (1997) A comparative study on feature selection in text categorization. In: Fourteenth ICML. Morgan Kaufmann, San Mateo, pp 412–420
Yao BZ, Yang X, Lin L, Lee MW, Zhu S-C (2010) I2t: image parsing to text description. In: IEEE proceedings, vol 98, no 8, pp. 1485–1508
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Todorov, K., James, N. & Hudelot, C. Multimedia ontology matching by using visual and textual modalities. Multimed Tools Appl 62, 401–425 (2013). https://doi.org/10.1007/s11042-011-0912-0
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-011-0912-0