research-article

Late fusion of heterogeneous methods for multimedia image retrieval

Authors:
Hugo Jair Escalante

National Institute of Astrophysics, Optics and Electronics, Puebla, Mexico

National Institute of Astrophysics, Optics and Electronics, Puebla, Mexico
View Profile

,
Carlos A. Hérnadez

National Institute of Astrophysics, Optics and Electronics, Puebla, Mexico

National Institute of Astrophysics, Optics and Electronics, Puebla, Mexico
View Profile

,
Luis Enrique Sucar

National Institute of Astrophysics, Optics and Electronics, Puebla, Mexico

National Institute of Astrophysics, Optics and Electronics, Puebla, Mexico
View Profile

,
Manuel Montes

National Institute of Astrophysics, Optics and Electronics, Puebla, Mexico

National Institute of Astrophysics, Optics and Electronics, Puebla, Mexico
View Profile

MIR '08: Proceedings of the 1st ACM international conference on Multimedia information retrievalOctober 2008Pages 172–179https://doi.org/10.1145/1460096.1460125

Published:30 October 2008Publication History

MIR '08: Proceedings of the 1st ACM international conference on Multimedia information retrieval

Pages 172–179

ABSTRACT

Late fusion of independent retrieval methods is the simpler approach and a widely used one for combining visual and textual information for the search process. Usually each retrieval method is based on a single modality, or even, when several methods are considered per modality, all of them use the same information for indexing/querying. The latter reduces the diversity and complementariness of documents considered for the fusion, as a consequence the performance of the fusion approach is poor.

In this paper we study the combination of multiple heterogeneous methods for image retrieval in annotated collections. Heterogeneousness is considered in terms of i) the modality in which the methods are based on, ii) in the information they use for indexing/querying and iii) in the individual performance of the methods. Different settings for the fusion are considered including weighted, global, per-modality and hierarchical. We report experimental results, in an image retrieval benchmark, that show that the proposed combination outperforms significantly any of the individual methods we consider. Retrieval performance is comparable to the best performance obtained in the context of ImageCLEF2007. An interesting result is that even methods that perform poor (individually) resulted very useful to the fusion strategy. Furthermore, opposed to work reported in the literature, better results were obtained by assigning a low weight to text-based methods. The main contribution of this paper is experimental, several interesting findings are reported that motivate further research on diverse subjects.

References

R. Baeza-Yates and B. Ribeiro-Neto. Modern Information Retrieval. Pearson E. L., 1999. Google ScholarDigital Library
R. Besancon and C. Millet. Merging results from different media: Lic2m experiments at imageclef 2005. In Working notes of the CLEF 2005. CLEF.Google Scholar
Y. Chang and H. Chen. Approaches of using a word-image ontology and an annotated image corpus as intermedia for cross-language image retrieval. In Working Notes of the CLEF. CLEF, 2006. Google ScholarDigital Library
P. Clough, M. Grubinger, T. Deselaers, A. Hanbury, and H. Müller. Overview of the imageclef 2007 photographic retrieval task. In CLEF 2007, volume 5152 of LNCS. CLEF, Springer-Verlag, 2008. Google ScholarDigital Library
H. J. Escalante and et al. Towards annotation-based query and document expansion for image retrieval. In CLEF 2007, volume 5152 of LNCS, pages 546--553. Springer-Verlag, 2008. Google ScholarDigital Library
T. Gass, T. Weyand, T. Deselaers, and H. Ney. Fire in imageclef 2007: Support vector machines and logistic regression to fuse image descriptors in for photo retrieval. volume 5152 of LNCS. Springer-Verlag, 2008. Google ScholarDigital Library
A. Goodrum. Image information retrieval: An overview of current research. Journal of Informing Science, 3(2), 2000.Google Scholar
M. Grubinger, P. Clough, H. Müller, and T. Deselaers. The iapr tc-12 benchmark: A new evaluation resource for visual information systems. In Proc. of the Intl. Workshop OntoImage'2006 Language Resources for CBIR, Genoa, Italy, 2006.Google Scholar
C. Hernández and L. E. Sucar. Markov random fields and spatial information to improve automatic image annotation. In Proc. of the the 2007 Pacific-Rim Symposium on Image and Video Technology, volume 4872 of LNCS, pages 879--892. Springer, 2007. Google ScholarDigital Library
R. Izquierdo-Beviá, D. Tomás, M. Saiz-Noeda, and J. L. Vicedo. University of alicante in imageclef2005. In Working Notes of the CLEF. CLEF, 2005.Google Scholar
M. M. Rautiainen and T. Seppdnen. Comparison of visual features and fusion techniques in automatic detection of concepts from news video. In Proceedings of the IEEE ICME, pages 932--935, 2005.Google ScholarCross Ref
P. Over and A. F. Smeaton., editors. Proc. of the international workshop on TRECVID video summarization., Augsburg, Bavaria, Germany., 2007. Google ScholarCross Ref
V. Peinado, F. López-Ostenero, and J. Gonzalo. Uned at imageclef 2005: Automatically structured queries with named entities over metadata. In Working Notes of the CLEF. CLEF, 2005. Google ScholarDigital Library
J. L. R. Datta, D. Joshi and J. Z. Wang. Image retrieval: Ideas, in uences, and trends of the new age. ACM Computing Surveys, to appear, 2008. Google ScholarDigital Library
M. Rautiainen, T. Ojala, and S. Tapio. Analyzing the performance of visual, concept and text features in content-based video retrieval. In MIR'04: Proc. of the 6th ACM workshop on Multimedia information retrieval, pages 197--204, New York, NY, USA, 2004. ACM Press. Google ScholarDigital Library
S. Sclaroff, M. L. Cascia, and S. Sethi. Unifying textual and visual cues for content-based image retrieval on the world wide web. Computer Vision, 75(1/2):86--98, July/August 1999. Google ScholarDigital Library
C. Snoek, M. Worring, and A. Smeulders. Early versus late fusion in semantic video analysis. In Proc. of the 13th Annual ACM Conference on Multimedia, pages 399--402, Singapore, 2005. ACM. Google ScholarDigital Library
D. Zeimpekis and E. Gallopoulos. Tmg: A matlab toolbox for generating term-document matrices from text collections. In Recent Advances in Clustering, pages 187--210. Springer, 2005.Google Scholar

Index Terms

Late fusion of heterogeneous methods for multimedia image retrieval
1. Information systems
  1. Information retrieval
    1. Retrieval models and ranking
    2. Retrieval tasks and goals
      1. Document filtering
      2. Information extraction

Recommendations

Early versus late fusion in semantic video analysis
MULTIMEDIA '05: Proceedings of the 13th annual ACM international conference on Multimedia

Semantic analysis of multimodal video aims to index segments of interest at a conceptual level. In reaching this goal, it requires an analysis of several information streams. At some point in the analysis these streams need to be fused. In this paper, ...
Read More
A relevant image search engine with late fusion: mixing the roles of textual and visual descriptors
IUI '11: Proceedings of the 16th international conference on Intelligent user interfaces

A fundamental problem in image retrieval is how to improve the text-based retrieval systems, which is known as "bridging the semantic gap". The reliance on visual similarity for judging semantic similarity may be problematic due to the semantic gap ...
Read More
On Comparing Early and Late Fusion Methods
Advances in Computational Intelligence
Abstract
This paper presents a theoretical comparison of early and late fusion methods. An initial discussion on the conditions to apply early or late (soft or hard) fusion is introduced. The analysis show that, if large training sets are available, early ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
MIR '08: Proceedings of the 1st ACM international conference on Multimedia information retrieval
October 2008
506 pages
ISBN:9781605583129
DOI:10.1145/1460096
General Chair:
Michael S. Lew
Leiden University, The Netherlands
,
Program Chairs:
Alberto del Bimbo
University of Florence, Italy
,
Erwin M. Bakker
Leiden University, The Netherlands
Copyright © 2008 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 30 October 2008
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
image retrieval
late fusion
Qualifiers
- research-article
Conference
Upcoming Conference
MM '24

Sponsor:

sigmm

MM '24: The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne , VIC , Australia
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 51
  Total Citations
  View Citations
- 521
  Total Downloads
- Downloads (Last 12 months)7
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Late fusion of heterogeneous methods for multimedia image retrieval

MIR '08: Proceedings of the 1st ACM international conference on Multimedia information retrieval

ABSTRACT

References

Cited By

Index Terms

Recommendations

Early versus late fusion in semantic video analysis

A relevant image search engine with late fusion: mixing the roles of textual and visual descriptors

On Comparing Early and Late Fusion Methods