Skip to main content
Erschienen in: International Journal of Multimedia Information Retrieval 3/2013

01.09.2013 | Regular Paper

Mobile video concept classification

verfasst von: Wei Jiang

Erschienen in: International Journal of Multimedia Information Retrieval | Ausgabe 3/2013

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Mobile content-based multimedia analysis has attracted much attention with the growing popularity of high-end mobile devices. Most previous systems focus on mobile visual search, i.e., to search images with visually duplicate or near-duplicate objects (e.g., products and landmarks). There remains a strong need for effective mobile video classification solutions, where videos that are not visually duplicate or near-duplicate but are from similar high-level semantic categories can be identified. In this work, we develop a mobile video classification system based on multi-modal analysis. On the mobile side, both visual and audio features are extracted from the input video, and these features are further compressed into compact hash bits for efficient transmission. On the server side, the received hash bits are used to compute the audio and visual Bag-of-Words representations for multi-modal concept classification. We propose a novel method where hash functions are learned based on the multi-modal information from the visual and audio codewords. Compared with traditional ways of computing visual-based and audio-based hash functions based on raw visual and audio local features separately, our method exploits the co-occurrences of audio and visual codewords as augmenting information and significantly improves the classification performance. The cost budget of our system for mobile data storage, computation, and transmission is similar to that in state-of-the-art mobile visual search systems. Extensive experiments over 10,000 YouTube videos show that our system can achieve similar classification accuracy with conventional server-based video classification systems using uncompressed raw descriptors.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
Information such as text or meta data is not generally used because its existence is not guaranteed.
 
Literatur
1.
Zurück zum Zitat Yanagawa WHA, Chang S (2006) Brief descriptions of visual features for baseline trecvid concept detectors. Columbia University ADVENT Technical, Report 219-2006-5 Yanagawa WHA, Chang S (2006) Brief descriptions of visual features for baseline trecvid concept detectors. Columbia University ADVENT Technical, Report 219-2006-5
3.
Zurück zum Zitat Chandrasekhar V, Takacs G, Chen D, Tsai S, Reznik Y, Grzeszczuk R, Girod B (2012) Compressed histogram of gradients: a low-bitrate descriptor. Int J Comput Vis 96(3):384–399CrossRef Chandrasekhar V, Takacs G, Chen D, Tsai S, Reznik Y, Grzeszczuk R, Girod B (2012) Compressed histogram of gradients: a low-bitrate descriptor. Int J Comput Vis 96(3):384–399CrossRef
4.
Zurück zum Zitat Chang S, Ellis D, Jiang W, Lee K, Yanagawa A, Loui A, Luo J (2007) Large-scale multimodal semantic concept detection for consumer video. ACM Multimedia, Information Retrieval, pp 255–264 Chang S, Ellis D, Jiang W, Lee K, Yanagawa A, Loui A, Luo J (2007) Large-scale multimodal semantic concept detection for consumer video. ACM Multimedia, Information Retrieval, pp 255–264
5.
Zurück zum Zitat Cotton C, Ellis D, Loui A (2011) Soundtrack classification by transient events. In: IEEE International Conference on Acoustics, Speech, and, Signal Processing, pp 473–476 Cotton C, Ellis D, Loui A (2011) Soundtrack classification by transient events. In: IEEE International Conference on Acoustics, Speech, and, Signal Processing, pp 473–476
6.
Zurück zum Zitat Datar M, Immorlica N, Indyk P, Mirrokni VS (2004) Locality-sensitive hashing scheme based on p-stable distributions. Annual Symposium on Computational Geometry, pp 253–262 Datar M, Immorlica N, Indyk P, Mirrokni VS (2004) Locality-sensitive hashing scheme based on p-stable distributions. Annual Symposium on Computational Geometry, pp 253–262
7.
Zurück zum Zitat Rongrong J, Yu FX, Chang S (2011) Active query sensing for mobile location search. In: ACM Multimedia, pp 3–12 Rongrong J, Yu FX, Chang S (2011) Active query sensing for mobile location search. In: ACM Multimedia, pp 3–12
8.
Zurück zum Zitat Gionis A, Indyk P, Motwani R (1999) Similarity search in high dimensions via hashing. In: International Conference on Very Large Data, Bases, pp 518–529 Gionis A, Indyk P, Motwani R (1999) Similarity search in high dimensions via hashing. In: International Conference on Very Large Data, Bases, pp 518–529
9.
Zurück zum Zitat Gong Y, Lazebnik S (2011) Iterative quantization: a procrustean approach to learning binary codes. In: IEEE International Conference on Computer Vision and Pattern Recognition Gong Y, Lazebnik S (2011) Iterative quantization: a procrustean approach to learning binary codes. In: IEEE International Conference on Computer Vision and Pattern Recognition
11.
Zurück zum Zitat He J, Feng J, Liu X, Cheng T, Lin TH, chung H, Chang SF (2011) Mobile product search with bag of hash bits. In: ACM Multimedia, pp 839–840 He J, Feng J, Liu X, Cheng T, Lin TH, chung H, Chang SF (2011) Mobile product search with bag of hash bits. In: ACM Multimedia, pp 839–840
12.
Zurück zum Zitat He J, Feng J, Liu X, Cheng T, Lin TH, Chung H, Chang SF (2012) Mobile product search with bag of hash bits and boundary reranking. In: IEEE International Conference on Computer Vision and, Pattern Recognition, pp 3005–3012 He J, Feng J, Liu X, Cheng T, Lin TH, Chung H, Chang SF (2012) Mobile product search with bag of hash bits and boundary reranking. In: IEEE International Conference on Computer Vision and, Pattern Recognition, pp 3005–3012
13.
Zurück zum Zitat Jiang W, Loui A, Lei P (2012) A consumer video search system by audio-visual concept classification. IEEE Computer Vision and Pattern Recognition Workshops, Providence Jiang W, Loui A, Lei P (2012) A consumer video search system by audio-visual concept classification. IEEE Computer Vision and Pattern Recognition Workshops, Providence
14.
Zurück zum Zitat Jiang Y (2012) Super: towards real-time event recognition in internet videos. In: ACM International Conference on Multimedia Retrieval Jiang Y (2012) Super: towards real-time event recognition in internet videos. In: ACM International Conference on Multimedia Retrieval
15.
Zurück zum Zitat Jiang Y et al. (2011) Towards optimal bag-of-features for object categorization and semantic video retrieval. In: ACM International Conference on Multimedia Retrieval Jiang Y et al. (2011) Towards optimal bag-of-features for object categorization and semantic video retrieval. In: ACM International Conference on Multimedia Retrieval
16.
Zurück zum Zitat Kulis B, Darrell T (2009) Learning to hash with binary reconstructive embeddings. NIPS Kulis B, Darrell T (2009) Learning to hash with binary reconstructive embeddings. NIPS
17.
Zurück zum Zitat Kulis B, Grauman K (2012) Kernelized locality-sensitive hashing. IEEE Transact Pattern Anal Mach Intell 34(6):1092–1104CrossRef Kulis B, Grauman K (2012) Kernelized locality-sensitive hashing. IEEE Transact Pattern Anal Mach Intell 34(6):1092–1104CrossRef
18.
Zurück zum Zitat Liu W, Wang J, Kumar S, Chang SF (2011) Hashing with graphs. International Conference on Machine Learning, Bellevue Liu W, Wang J, Kumar S, Chang SF (2011) Hashing with graphs. International Conference on Machine Learning, Bellevue
19.
Zurück zum Zitat Liu W, Wang J, Ji R, Jiang Y, Chang SF (2012) Supervised hashing with kernels. In: IEEE International Conference on Computer Vision and Pattern Recognition, Providence Liu W, Wang J, Ji R, Jiang Y, Chang SF (2012) Supervised hashing with kernels. In: IEEE International Conference on Computer Vision and Pattern Recognition, Providence
20.
Zurück zum Zitat Maji S, Berg A, Malik J (2008) Classification using intersection kenrel support vector machines is efficient. IEEE International Conference on Computer Vision and, Pattern Recognition Maji S, Berg A, Malik J (2008) Classification using intersection kenrel support vector machines is efficient. IEEE International Conference on Computer Vision and, Pattern Recognition
21.
Zurück zum Zitat Marszałek M, Laptev I, Schmid C (2009) Actions in context. IEEE International Conference on Computer Vision and, Pattern Recognition Marszałek M, Laptev I, Schmid C (2009) Actions in context. IEEE International Conference on Computer Vision and, Pattern Recognition
22.
Zurück zum Zitat Mikolajczyk K, Schmid C (1995) A performance evaluation of local descriptors. IEEE Transact Pattern Anal Mach Intell 27(10):1615–1630CrossRef Mikolajczyk K, Schmid C (1995) A performance evaluation of local descriptors. IEEE Transact Pattern Anal Mach Intell 27(10):1615–1630CrossRef
23.
Zurück zum Zitat Moosmann F, Triggs B, Jurie F (2006) Fast discriminative visual codebooks using randomized clustering forests. NIPS, pp 985–992 Moosmann F, Triggs B, Jurie F (2006) Fast discriminative visual codebooks using randomized clustering forests. NIPS, pp 985–992
25.
Zurück zum Zitat Nister D, Stewenius H (2006) Scalable recognition with a vocabulary tree. IEEE International Conference on Computer Vision and Pattern Recognition, pp 2161–2168 Nister D, Stewenius H (2006) Scalable recognition with a vocabulary tree. IEEE International Conference on Computer Vision and Pattern Recognition, pp 2161–2168
26.
Zurück zum Zitat Parker C (2010) An exploration of semantic audio classification. In: Technical Report 345596K, Eastman Kodak Company Parker C (2010) An exploration of semantic audio classification. In: Technical Report 345596K, Eastman Kodak Company
27.
Zurück zum Zitat Salakhutdinov R, Hinton G (2009) Semantic hashing. Int J Approx Reason 50(7):969–978CrossRef Salakhutdinov R, Hinton G (2009) Semantic hashing. Int J Approx Reason 50(7):969–978CrossRef
28.
Zurück zum Zitat Takacs G, Chandrasekhar V, Gelfand N, Xiong Y, Chen WC, Bismpigiannis T, Grzeszczuk R, Pulli K, Girod B (2008) Outdoors augmented reality on mobile phone using loxel-based visual feature organization. In: ACM international conference on Multimedia, information retrieval, pp 427–434 Takacs G, Chandrasekhar V, Gelfand N, Xiong Y, Chen WC, Bismpigiannis T, Grzeszczuk R, Pulli K, Girod B (2008) Outdoors augmented reality on mobile phone using loxel-based visual feature organization. In: ACM international conference on Multimedia, information retrieval, pp 427–434
29.
Zurück zum Zitat Uijings J, Smeulders A, Scha R (2010) Real-time visual concept classification. IEEE Transact Multimed 12(7):665–681 Uijings J, Smeulders A, Scha R (2010) Real-time visual concept classification. IEEE Transact Multimed 12(7):665–681
30.
Zurück zum Zitat Wang J, Kumar S, Chang S (2012) Semi-supervised hashing for large scale search, IEEE Transact Pattern Anal Mach Intell Wang J, Kumar S, Chang S (2012) Semi-supervised hashing for large scale search, IEEE Transact Pattern Anal Mach Intell
31.
Zurück zum Zitat Weiss Y, Torralba A, Fergus R (2008) Spectral hashing, NIPS Weiss Y, Torralba A, Fergus R (2008) Spectral hashing, NIPS
Metadaten
Titel
Mobile video concept classification
verfasst von
Wei Jiang
Publikationsdatum
01.09.2013
Verlag
Springer London
Erschienen in
International Journal of Multimedia Information Retrieval / Ausgabe 3/2013
Print ISSN: 2192-6611
Elektronische ISSN: 2192-662X
DOI
https://doi.org/10.1007/s13735-012-0027-z

Weitere Artikel der Ausgabe 3/2013

International Journal of Multimedia Information Retrieval 3/2013 Zur Ausgabe