Skip to main content
Erschienen in: International Journal of Multimedia Information Retrieval 1/2012

01.04.2012 | Invited Paper

Multimodal Image Retrieval

Fusing modalities with multilayer multimodal pLSA

verfasst von: Stefan Romberg, Rainer Lienhart, Eva Hörster

Erschienen in: International Journal of Multimedia Information Retrieval | Ausgabe 1/2012

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In this work, we extend the standard single-layer probabilistic Latent Semantic Analysis (pLSA) (Hofmann in Mach Learn 42(1–2):177–196, 2001) to multiple layers. As multiple layers should naturally handle multiple modalities and a hierarchy of abstractions, we denote this new approach multilayer multimodal probabilistic Latent Semantic Analysis (mm-pLSA). We derive the training and inference rules for the smallest possible non-degenerated mm-pLSA model: a model with two leaf-pLSAs and a single top-level pLSA node merging the two leaf-pLSAs. We evaluate this approach on two pairs of different modalities: SIFT features and image annotations (tags) as well as the combination of SIFT and HOG features. We also propose a fast and strictly stepwise forward procedure to initialize the bottom–up mm-pLSA model, which in turn can then be post-optimized by the general mm-pLSA learning algorithm. The proposed approach is evaluated in a query-by-example retrieval task where various variants of our mm-pLSA system are compared to systems relying on a single modality and other ad-hoc combinations of feature histograms. We further describe possible pitfalls of the mm-pLSA training and analyze the resulting model yielding an intuitive explanation of its behaviour.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
A complete derivation of the EM-update equation for this multilayer multimodel pLSA model can be found at http://​www.​multimedia-computing.​de/​wiki/​mm-pLSA
 
2
The dataset and additional material are available at http://​www.​multimedia-computing.​de/​wiki/​Flickr-10M
 
Literatur
1.
Zurück zum Zitat Barnard K, Duygulu P, Forsyth D, Blei DM, Hofmann T, Poggio T, Shawe-taylor J (2003) Matching words and pictures. J Mach Learn Res 3:1107–1135MATH Barnard K, Duygulu P, Forsyth D, Blei DM, Hofmann T, Poggio T, Shawe-taylor J (2003) Matching words and pictures. J Mach Learn Res 3:1107–1135MATH
2.
Zurück zum Zitat Bay H, Ess A, Tuytelaars T, Van Gool L (2008) SURF: speeded up robust features. Comput Vis Imag Underst 110(3):346–359CrossRef Bay H, Ess A, Tuytelaars T, Van Gool L (2008) SURF: speeded up robust features. Comput Vis Imag Underst 110(3):346–359CrossRef
3.
Zurück zum Zitat Berg AC, Berg TL, Malik J (2005) Shape matching and object recognition using low distortion correspondences. In: IEEE conference on computer vision and pattern recognition (CVPR’05), vol 1. Washington, DC, pp 26–33 Berg AC, Berg TL, Malik J (2005) Shape matching and object recognition using low distortion correspondences. In: IEEE conference on computer vision and pattern recognition (CVPR’05), vol 1. Washington, DC, pp 26–33
4.
Zurück zum Zitat Blei D, Lafferty J (2006) Correlated topic models. In: Advances in neural information processing systems, vol 18, pp 147–154 Blei D, Lafferty J (2006) Correlated topic models. In: Advances in neural information processing systems, vol 18, pp 147–154
5.
Zurück zum Zitat Blei DM, Jordan MI (2003) Modeling annotated data. In: ACM SIGIR conference on research and development in information retrieval (SIGIR’03), pp 127–134 Blei DM, Jordan MI (2003) Modeling annotated data. In: ACM SIGIR conference on research and development in information retrieval (SIGIR’03), pp 127–134
6.
Zurück zum Zitat Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022MATH Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022MATH
7.
Zurück zum Zitat Bosch A, Zisserman A, Muñoz X (2006) Scene classification via pLSA. Eur Confer Comput Vis (ECCV’06) 3954:517–530 Bosch A, Zisserman A, Muñoz X (2006) Scene classification via pLSA. Eur Confer Comput Vis (ECCV’06) 3954:517–530
8.
Zurück zum Zitat Dempster A, Laird N, Rubin D (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc 39(1):1–38MathSciNetMATH Dempster A, Laird N, Rubin D (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc 39(1):1–38MathSciNetMATH
9.
Zurück zum Zitat Deng J, Dong W, Socher R, Li L, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: IEEE conference on computer vision and pattern recognition (CVPR’09) Deng J, Dong W, Socher R, Li L, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: IEEE conference on computer vision and pattern recognition (CVPR’09)
10.
Zurück zum Zitat Everingham M, Van Gool L, Williams C, Winn J, Zisserman A (2009) The pascal visual object classes (VOC) challenge. Int J Comput Vis (IJCV’04) 88(2):303–338CrossRef Everingham M, Van Gool L, Williams C, Winn J, Zisserman A (2009) The pascal visual object classes (VOC) challenge. Int J Comput Vis (IJCV’04) 88(2):303–338CrossRef
11.
Zurück zum Zitat Fellbaum C (1998) WordNet: an electronic lexical database. MIT Press, CambridgeMATH Fellbaum C (1998) WordNet: an electronic lexical database. MIT Press, CambridgeMATH
12.
Zurück zum Zitat Felzenszwalb P, Girshick R, McAllester D, Ramanan D (2010) Object detection with discriminatively trained part-based models. IEEE Trans Pattern Anal Mach Intell (PAMI’10), 32(9) Felzenszwalb P, Girshick R, McAllester D, Ramanan D (2010) Object detection with discriminatively trained part-based models. IEEE Trans Pattern Anal Mach Intell (PAMI’10), 32(9)
13.
Zurück zum Zitat Greif T, Hörster E, Lienhart R (2008) Correlated topic models for image retrieval. Technical Report TR2008–09, University of Augsburg Greif T, Hörster E, Lienhart R (2008) Correlated topic models for image retrieval. Technical Report TR2008–09, University of Augsburg
14.
Zurück zum Zitat Hawkins J, Blakeslee S (2004) On intelligence. Times Books, New York Hawkins J, Blakeslee S (2004) On intelligence. Times Books, New York
15.
16.
Zurück zum Zitat Hofmann T (2001) Unsupervised learning by probabilistic latent semantic analysis. Mach Learn 42(1–2):177–196MATHCrossRef Hofmann T (2001) Unsupervised learning by probabilistic latent semantic analysis. Mach Learn 42(1–2):177–196MATHCrossRef
17.
Zurück zum Zitat Hörster E, Lienhart R (2008) Deep networks for image retrieval on large-scale databases. In: ACM international conference on multimedia (MM’08), New York, pp 643–646 Hörster E, Lienhart R (2008) Deep networks for image retrieval on large-scale databases. In: ACM international conference on multimedia (MM’08), New York, pp 643–646
18.
Zurück zum Zitat Hörster E, Lienhart R, Slaney M (2007) Image retrieval on large-scale image databases. In: ACM international conference on content-based image and video retrieval (CIVR’07), pp 17–24 Hörster E, Lienhart R, Slaney M (2007) Image retrieval on large-scale image databases. In: ACM international conference on content-based image and video retrieval (CIVR’07), pp 17–24
19.
Zurück zum Zitat Hörster E, Lienhart R, Slaney M (2008) Continuous visual vocabulary models for pL-based scene recognition. In: ACM international conference on content-based image and video retrieval (CIVR’08), New York, pp 319–328 Hörster E, Lienhart R, Slaney M (2008) Continuous visual vocabulary models for pL-based scene recognition. In: ACM international conference on content-based image and video retrieval (CIVR’08), New York, pp 319–328
20.
Zurück zum Zitat Kennedy L, Naaman M, Ahern S, Nair R, Rattenbury T (2007) How flickr helps us make sense of the world: context and content in community-contributed media collections. In: ACM international conference on multimedia (MM’07), New York, pp 631–640 Kennedy L, Naaman M, Ahern S, Nair R, Rattenbury T (2007) How flickr helps us make sense of the world: context and content in community-contributed media collections. In: ACM international conference on multimedia (MM’07), New York, pp 631–640
21.
Zurück zum Zitat Lienhart R, Romberg S, Hörster E (2009) Multilayer pLSA for multimodal image retrieval (CIVR’09). In: ACM international conference on image and video retrieval, vol 14 Lienhart R, Romberg S, Hörster E (2009) Multilayer pLSA for multimodal image retrieval (CIVR’09). In: ACM international conference on image and video retrieval, vol 14
22.
Zurück zum Zitat Lienhart R, Slaney M (2007) pLSA on large scale image databases. IEEE Int Confer Acoust Speech Signal Process (ICASSP’07) IV:1217–1220 Lienhart R, Slaney M (2007) pLSA on large scale image databases. IEEE Int Confer Acoust Speech Signal Process (ICASSP’07) IV:1217–1220
23.
Zurück zum Zitat Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis (IJCV’04) 60(2):91–110CrossRef Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis (IJCV’04) 60(2):91–110CrossRef
24.
Zurück zum Zitat Monay F, Gatica-Perez D (2004) pLSA-based image auto-annotation: constraining the latent space. In: ACM international conference on multimedia (MM?04), New York, pp 348–351 Monay F, Gatica-Perez D (2004) pLSA-based image auto-annotation: constraining the latent space. In: ACM international conference on multimedia (MM?04), New York, pp 348–351
25.
Zurück zum Zitat Nister D, Stewenius H (2006) Scalable recognition with a vocabulary tree. IEEE Confer Comput Vis Pattern Recogn (CVPR’06) 2:2161–2168 Nister D, Stewenius H (2006) Scalable recognition with a vocabulary tree. IEEE Confer Comput Vis Pattern Recogn (CVPR’06) 2:2161–2168
26.
Zurück zum Zitat Philbin J, Chum O, Isard M, Sivic J, Zisserman A (2007) Object retrieval with large vocabularies and fast spatial matching. IEEE Confer Comput Vis Pattern Recogn (CVPR’07) 3613:1575–1589 Philbin J, Chum O, Isard M, Sivic J, Zisserman A (2007) Object retrieval with large vocabularies and fast spatial matching. IEEE Confer Comput Vis Pattern Recogn (CVPR’07) 3613:1575–1589
27.
Zurück zum Zitat Romberg S, Horster E, Lienhart R (2009) Multimodal pLSA on visual features and tags. In: IEEE international conference on multimedia and expo (ICME’09), pp 414–417 Romberg S, Horster E, Lienhart R (2009) Multimodal pLSA on visual features and tags. In: IEEE international conference on multimedia and expo (ICME’09), pp 414–417
28.
Zurück zum Zitat Shechtman E, Irani M (2007) Matching local self-similarities across images and videos. In: IEEE conference on computer vision and pattern recognition (CVPR’07) Shechtman E, Irani M (2007) Matching local self-similarities across images and videos. In: IEEE conference on computer vision and pattern recognition (CVPR’07)
29.
Zurück zum Zitat Sivic J, Russell BC, Zisserman A, Freeman WT, Efros AA (2008) Unsupervised discovery of visual object class hierarchies. In: IEEE conference on computer vision and pattern recognition (CVPR’08) Sivic J, Russell BC, Zisserman A, Freeman WT, Efros AA (2008) Unsupervised discovery of visual object class hierarchies. In: IEEE conference on computer vision and pattern recognition (CVPR’08)
30.
Zurück zum Zitat Sivic J, Zisserman A (2003) Video Google: a text retrieval approach to object matching in videos. In: International conference on computer vision (ICCV’03) Sivic J, Zisserman A (2003) Video Google: a text retrieval approach to object matching in videos. In: International conference on computer vision (ICCV’03)
31.
Zurück zum Zitat Xiao J, Hays J, Ehinger K, Oliva A, Torralba A (2010) Sun database: Large-scale scene recognition from abbey to zoo. In: IEEE conference on computer vision and pattern recognition (CVPR’10) Xiao J, Hays J, Ehinger K, Oliva A, Torralba A (2010) Sun database: Large-scale scene recognition from abbey to zoo. In: IEEE conference on computer vision and pattern recognition (CVPR’10)
32.
Zurück zum Zitat Zhang L, Wang X-j (2011) Multi-Feature pLSA for combining visual features in image annotation. In: ACM international conference on multimedia (MM’11), Scottsdale, Arizona, pp 1513–1516 Zhang L, Wang X-j (2011) Multi-Feature pLSA for combining visual features in image annotation. In: ACM international conference on multimedia (MM’11), Scottsdale, Arizona, pp 1513–1516
Metadaten
Titel
Multimodal Image Retrieval
Fusing modalities with multilayer multimodal pLSA
verfasst von
Stefan Romberg
Rainer Lienhart
Eva Hörster
Publikationsdatum
01.04.2012
Verlag
Springer-Verlag
Erschienen in
International Journal of Multimedia Information Retrieval / Ausgabe 1/2012
Print ISSN: 2192-6611
Elektronische ISSN: 2192-662X
DOI
https://doi.org/10.1007/s13735-012-0006-4

Weitere Artikel der Ausgabe 1/2012

International Journal of Multimedia Information Retrieval 1/2012 Zur Ausgabe