nach oben

International Journal of Multimedia Information Retrieval

Erschienen in:

01.04.2012 | Invited Paper

Multimodal Image Retrieval

Fusing modalities with multilayer multimodal pLSA

verfasst von: Stefan Romberg, Rainer Lienhart, Eva Hörster

Erschienen in: International Journal of Multimedia Information Retrieval | Ausgabe 1/2012

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

In this work, we extend the standard single-layer probabilistic Latent Semantic Analysis (pLSA) (Hofmann in Mach Learn 42(1–2):177–196, 2001) to multiple layers. As multiple layers should naturally handle multiple modalities and a hierarchy of abstractions, we denote this new approach multilayer multimodal probabilistic Latent Semantic Analysis (mm-pLSA). We derive the training and inference rules for the smallest possible non-degenerated mm-pLSA model: a model with two leaf-pLSAs and a single top-level pLSA node merging the two leaf-pLSAs. We evaluate this approach on two pairs of different modalities: SIFT features and image annotations (tags) as well as the combination of SIFT and HOG features. We also propose a fast and strictly stepwise forward procedure to initialize the bottom–up mm-pLSA model, which in turn can then be post-optimized by the general mm-pLSA learning algorithm. The proposed approach is evaluated in a query-by-example retrieval task where various variants of our mm-pLSA system are compared to systems relying on a single modality and other ad-hoc combinations of feature histograms. We further describe possible pitfalls of the mm-pLSA training and analyze the resulting model yielding an intuitive explanation of its behaviour.

Vorheriger Artikel Bridging the gap between expert and novice users for video search

Nächster Artikel Large-scale near-duplicate image retrieval by kernel density estimation

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

A complete derivation of the EM-update equation for this multilayer multimodel pLSA model can be found at http://www.multimedia-computing.de/wiki/mm-pLSA

The dataset and additional material are available at http://www.multimedia-computing.de/wiki/Flickr-10M

Barnard K, Duygulu P, Forsyth D, Blei DM, Hofmann T, Poggio T, Shawe-taylor J (2003) Matching words and pictures. J Mach Learn Res 3:1107–1135MATH

Bay H, Ess A, Tuytelaars T, Van Gool L (2008) SURF: speeded up robust features. Comput Vis Imag Underst 110(3):346–359CrossRef

Berg AC, Berg TL, Malik J (2005) Shape matching and object recognition using low distortion correspondences. In: IEEE conference on computer vision and pattern recognition (CVPR’05), vol 1. Washington, DC, pp 26–33

Blei D, Lafferty J (2006) Correlated topic models. In: Advances in neural information processing systems, vol 18, pp 147–154

Blei DM, Jordan MI (2003) Modeling annotated data. In: ACM SIGIR conference on research and development in information retrieval (SIGIR’03), pp 127–134

Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022MATH

Bosch A, Zisserman A, Muñoz X (2006) Scene classification via pLSA. Eur Confer Comput Vis (ECCV’06) 3954:517–530

Dempster A, Laird N, Rubin D (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc 39(1):1–38MathSciNetMATH

Deng J, Dong W, Socher R, Li L, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: IEEE conference on computer vision and pattern recognition (CVPR’09)

10.

Everingham M, Van Gool L, Williams C, Winn J, Zisserman A (2009) The pascal visual object classes (VOC) challenge. Int J Comput Vis (IJCV’04) 88(2):303–338CrossRef

11.

Fellbaum C (1998) WordNet: an electronic lexical database. MIT Press, CambridgeMATH

12.

Felzenszwalb P, Girshick R, McAllester D, Ramanan D (2010) Object detection with discriminatively trained part-based models. IEEE Trans Pattern Anal Mach Intell (PAMI’10), 32(9)

13.

Greif T, Hörster E, Lienhart R (2008) Correlated topic models for image retrieval. Technical Report TR2008–09, University of Augsburg

14.

Hawkins J, Blakeslee S (2004) On intelligence. Times Books, New York

15.

Hinton GE, Salakhutdinov RR (2006) Reducing the dimensionality of data with neural networks. Science 313(5786):504–507MathSciNetMATHCrossRef

16.

Hofmann T (2001) Unsupervised learning by probabilistic latent semantic analysis. Mach Learn 42(1–2):177–196MATHCrossRef

17.

Hörster E, Lienhart R (2008) Deep networks for image retrieval on large-scale databases. In: ACM international conference on multimedia (MM’08), New York, pp 643–646

18.

Hörster E, Lienhart R, Slaney M (2007) Image retrieval on large-scale image databases. In: ACM international conference on content-based image and video retrieval (CIVR’07), pp 17–24

19.

Hörster E, Lienhart R, Slaney M (2008) Continuous visual vocabulary models for pL-based scene recognition. In: ACM international conference on content-based image and video retrieval (CIVR’08), New York, pp 319–328

20.

Kennedy L, Naaman M, Ahern S, Nair R, Rattenbury T (2007) How flickr helps us make sense of the world: context and content in community-contributed media collections. In: ACM international conference on multimedia (MM’07), New York, pp 631–640

21.

Lienhart R, Romberg S, Hörster E (2009) Multilayer pLSA for multimodal image retrieval (CIVR’09). In: ACM international conference on image and video retrieval, vol 14

22.

Lienhart R, Slaney M (2007) pLSA on large scale image databases. IEEE Int Confer Acoust Speech Signal Process (ICASSP’07) IV:1217–1220

23.

Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis (IJCV’04) 60(2):91–110CrossRef

24.

Monay F, Gatica-Perez D (2004) pLSA-based image auto-annotation: constraining the latent space. In: ACM international conference on multimedia (MM?04), New York, pp 348–351

25.

Nister D, Stewenius H (2006) Scalable recognition with a vocabulary tree. IEEE Confer Comput Vis Pattern Recogn (CVPR’06) 2:2161–2168

26.

Philbin J, Chum O, Isard M, Sivic J, Zisserman A (2007) Object retrieval with large vocabularies and fast spatial matching. IEEE Confer Comput Vis Pattern Recogn (CVPR’07) 3613:1575–1589

27.

Romberg S, Horster E, Lienhart R (2009) Multimodal pLSA on visual features and tags. In: IEEE international conference on multimedia and expo (ICME’09), pp 414–417

28.

Shechtman E, Irani M (2007) Matching local self-similarities across images and videos. In: IEEE conference on computer vision and pattern recognition (CVPR’07)

29.

Sivic J, Russell BC, Zisserman A, Freeman WT, Efros AA (2008) Unsupervised discovery of visual object class hierarchies. In: IEEE conference on computer vision and pattern recognition (CVPR’08)

30.

Sivic J, Zisserman A (2003) Video Google: a text retrieval approach to object matching in videos. In: International conference on computer vision (ICCV’03)

31.

Xiao J, Hays J, Ehinger K, Oliva A, Torralba A (2010) Sun database: Large-scale scene recognition from abbey to zoo. In: IEEE conference on computer vision and pattern recognition (CVPR’10)

32.

Zhang L, Wang X-j (2011) Multi-Feature pLSA for combining visual features in image annotation. In: ACM international conference on multimedia (MM’11), Scottsdale, Arizona, pp 1513–1516

Titel: Multimodal Image Retrieval
Fusing modalities with multilayer multimodal pLSA
verfasst von: Stefan Romberg
Rainer Lienhart
Eva Hörster
Publikationsdatum: 01.04.2012
Verlag: Springer-Verlag
Erschienen in: International Journal of Multimedia Information Retrieval / Ausgabe 1/2012
Print ISSN: 2192-6611
Elektronische ISSN: 2192-662X
DOI: https://doi.org/10.1007/s13735-012-0006-4

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Weitere Artikel der Ausgabe 1/2012

Multimedia information retrieval in the twenty-first century

Large-scale near-duplicate image retrieval by kernel density estimation

The heterogeneous feature selection with structural sparsity for multimedia annotation and hashing: a survey

Bridging the gap between expert and novice users for video search

Multimedia semantics-aware query-adaptive hashing with bits reconfigurability