ABSTRACT
Late fusion of independent retrieval methods is the simpler approach and a widely used one for combining visual and textual information for the search process. Usually each retrieval method is based on a single modality, or even, when several methods are considered per modality, all of them use the same information for indexing/querying. The latter reduces the diversity and complementariness of documents considered for the fusion, as a consequence the performance of the fusion approach is poor.
In this paper we study the combination of multiple heterogeneous methods for image retrieval in annotated collections. Heterogeneousness is considered in terms of i) the modality in which the methods are based on, ii) in the information they use for indexing/querying and iii) in the individual performance of the methods. Different settings for the fusion are considered including weighted, global, per-modality and hierarchical. We report experimental results, in an image retrieval benchmark, that show that the proposed combination outperforms significantly any of the individual methods we consider. Retrieval performance is comparable to the best performance obtained in the context of ImageCLEF2007. An interesting result is that even methods that perform poor (individually) resulted very useful to the fusion strategy. Furthermore, opposed to work reported in the literature, better results were obtained by assigning a low weight to text-based methods. The main contribution of this paper is experimental, several interesting findings are reported that motivate further research on diverse subjects.
- R. Baeza-Yates and B. Ribeiro-Neto. Modern Information Retrieval. Pearson E. L., 1999. Google ScholarDigital Library
- R. Besancon and C. Millet. Merging results from different media: Lic2m experiments at imageclef 2005. In Working notes of the CLEF 2005. CLEF.Google Scholar
- Y. Chang and H. Chen. Approaches of using a word-image ontology and an annotated image corpus as intermedia for cross-language image retrieval. In Working Notes of the CLEF. CLEF, 2006. Google ScholarDigital Library
- P. Clough, M. Grubinger, T. Deselaers, A. Hanbury, and H. Müller. Overview of the imageclef 2007 photographic retrieval task. In CLEF 2007, volume 5152 of LNCS. CLEF, Springer-Verlag, 2008. Google ScholarDigital Library
- H. J. Escalante and et al. Towards annotation-based query and document expansion for image retrieval. In CLEF 2007, volume 5152 of LNCS, pages 546--553. Springer-Verlag, 2008. Google ScholarDigital Library
- T. Gass, T. Weyand, T. Deselaers, and H. Ney. Fire in imageclef 2007: Support vector machines and logistic regression to fuse image descriptors in for photo retrieval. volume 5152 of LNCS. Springer-Verlag, 2008. Google ScholarDigital Library
- A. Goodrum. Image information retrieval: An overview of current research. Journal of Informing Science, 3(2), 2000.Google Scholar
- M. Grubinger, P. Clough, H. Müller, and T. Deselaers. The iapr tc-12 benchmark: A new evaluation resource for visual information systems. In Proc. of the Intl. Workshop OntoImage'2006 Language Resources for CBIR, Genoa, Italy, 2006.Google Scholar
- C. Hernández and L. E. Sucar. Markov random fields and spatial information to improve automatic image annotation. In Proc. of the the 2007 Pacific-Rim Symposium on Image and Video Technology, volume 4872 of LNCS, pages 879--892. Springer, 2007. Google ScholarDigital Library
- R. Izquierdo-Beviá, D. Tomás, M. Saiz-Noeda, and J. L. Vicedo. University of alicante in imageclef2005. In Working Notes of the CLEF. CLEF, 2005.Google Scholar
- M. M. Rautiainen and T. Seppdnen. Comparison of visual features and fusion techniques in automatic detection of concepts from news video. In Proceedings of the IEEE ICME, pages 932--935, 2005.Google ScholarCross Ref
- P. Over and A. F. Smeaton., editors. Proc. of the international workshop on TRECVID video summarization., Augsburg, Bavaria, Germany., 2007. Google ScholarCross Ref
- V. Peinado, F. López-Ostenero, and J. Gonzalo. Uned at imageclef 2005: Automatically structured queries with named entities over metadata. In Working Notes of the CLEF. CLEF, 2005. Google ScholarDigital Library
- J. L. R. Datta, D. Joshi and J. Z. Wang. Image retrieval: Ideas, in uences, and trends of the new age. ACM Computing Surveys, to appear, 2008. Google ScholarDigital Library
- M. Rautiainen, T. Ojala, and S. Tapio. Analyzing the performance of visual, concept and text features in content-based video retrieval. In MIR'04: Proc. of the 6th ACM workshop on Multimedia information retrieval, pages 197--204, New York, NY, USA, 2004. ACM Press. Google ScholarDigital Library
- S. Sclaroff, M. L. Cascia, and S. Sethi. Unifying textual and visual cues for content-based image retrieval on the world wide web. Computer Vision, 75(1/2):86--98, July/August 1999. Google ScholarDigital Library
- C. Snoek, M. Worring, and A. Smeulders. Early versus late fusion in semantic video analysis. In Proc. of the 13th Annual ACM Conference on Multimedia, pages 399--402, Singapore, 2005. ACM. Google ScholarDigital Library
- D. Zeimpekis and E. Gallopoulos. Tmg: A matlab toolbox for generating term-document matrices from text collections. In Recent Advances in Clustering, pages 187--210. Springer, 2005.Google Scholar
Index Terms
- Late fusion of heterogeneous methods for multimedia image retrieval
Recommendations
Early versus late fusion in semantic video analysis
MULTIMEDIA '05: Proceedings of the 13th annual ACM international conference on MultimediaSemantic analysis of multimodal video aims to index segments of interest at a conceptual level. In reaching this goal, it requires an analysis of several information streams. At some point in the analysis these streams need to be fused. In this paper, ...
A relevant image search engine with late fusion: mixing the roles of textual and visual descriptors
IUI '11: Proceedings of the 16th international conference on Intelligent user interfacesA fundamental problem in image retrieval is how to improve the text-based retrieval systems, which is known as "bridging the semantic gap". The reliance on visual similarity for judging semantic similarity may be problematic due to the semantic gap ...
On Comparing Early and Late Fusion Methods
Advances in Computational IntelligenceAbstractThis paper presents a theoretical comparison of early and late fusion methods. An initial discussion on the conditions to apply early or late (soft or hard) fusion is introduced. The analysis show that, if large training sets are available, early ...
Comments