nach oben

International Journal of Multimedia Information Retrieval

Erschienen in:

01.09.2013 | Regular Paper

Mobile video concept classification

verfasst von: Wei Jiang

Erschienen in: International Journal of Multimedia Information Retrieval | Ausgabe 3/2013

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Mobile content-based multimedia analysis has attracted much attention with the growing popularity of high-end mobile devices. Most previous systems focus on mobile visual search, i.e., to search images with visually duplicate or near-duplicate objects (e.g., products and landmarks). There remains a strong need for effective mobile video classification solutions, where videos that are not visually duplicate or near-duplicate but are from similar high-level semantic categories can be identified. In this work, we develop a mobile video classification system based on multi-modal analysis. On the mobile side, both visual and audio features are extracted from the input video, and these features are further compressed into compact hash bits for efficient transmission. On the server side, the received hash bits are used to compute the audio and visual Bag-of-Words representations for multi-modal concept classification. We propose a novel method where hash functions are learned based on the multi-modal information from the visual and audio codewords. Compared with traditional ways of computing visual-based and audio-based hash functions based on raw visual and audio local features separately, our method exploits the co-occurrences of audio and visual codewords as augmenting information and significantly improves the classification performance. The cost budget of our system for mobile data storage, computation, and transmission is similar to that in state-of-the-art mobile visual search systems. Extensive experiments over 10,000 YouTube videos show that our system can achieve similar classification accuracy with conventional server-based video classification systems using uncompressed raw descriptors.

Vorheriger Artikel A geometrical distance measure for determining the similarity of musical harmony

Nächster Artikel Searching for images by video

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Information such as text or meta data is not generally used because its existence is not guaranteed.

Yanagawa WHA, Chang S (2006) Brief descriptions of visual features for baseline trecvid concept detectors. Columbia University ADVENT Technical, Report 219-2006-5

Amazon. Snaptell. http://a9.amazon.com/-/company/snaptell.jsp

Chandrasekhar V, Takacs G, Chen D, Tsai S, Reznik Y, Grzeszczuk R, Girod B (2012) Compressed histogram of gradients: a low-bitrate descriptor. Int J Comput Vis 96(3):384–399CrossRef

Chang S, Ellis D, Jiang W, Lee K, Yanagawa A, Loui A, Luo J (2007) Large-scale multimodal semantic concept detection for consumer video. ACM Multimedia, Information Retrieval, pp 255–264

Cotton C, Ellis D, Loui A (2011) Soundtrack classification by transient events. In: IEEE International Conference on Acoustics, Speech, and, Signal Processing, pp 473–476

Datar M, Immorlica N, Indyk P, Mirrokni VS (2004) Locality-sensitive hashing scheme based on p-stable distributions. Annual Symposium on Computational Geometry, pp 253–262

Rongrong J, Yu FX, Chang S (2011) Active query sensing for mobile location search. In: ACM Multimedia, pp 3–12

Gionis A, Indyk P, Motwani R (1999) Similarity search in high dimensions via hashing. In: International Conference on Very Large Data, Bases, pp 518–529

Gong Y, Lazebnik S (2011) Iterative quantization: a procrustean approach to learning binary codes. In: IEEE International Conference on Computer Vision and Pattern Recognition

10.

Google. Google Goggles. http://www.google.com/mobile/goggles/

11.

He J, Feng J, Liu X, Cheng T, Lin TH, chung H, Chang SF (2011) Mobile product search with bag of hash bits. In: ACM Multimedia, pp 839–840

12.

He J, Feng J, Liu X, Cheng T, Lin TH, Chung H, Chang SF (2012) Mobile product search with bag of hash bits and boundary reranking. In: IEEE International Conference on Computer Vision and, Pattern Recognition, pp 3005–3012

13.

Jiang W, Loui A, Lei P (2012) A consumer video search system by audio-visual concept classification. IEEE Computer Vision and Pattern Recognition Workshops, Providence

14.

Jiang Y (2012) Super: towards real-time event recognition in internet videos. In: ACM International Conference on Multimedia Retrieval

15.

Jiang Y et al. (2011) Towards optimal bag-of-features for object categorization and semantic video retrieval. In: ACM International Conference on Multimedia Retrieval

16.

Kulis B, Darrell T (2009) Learning to hash with binary reconstructive embeddings. NIPS

17.

Kulis B, Grauman K (2012) Kernelized locality-sensitive hashing. IEEE Transact Pattern Anal Mach Intell 34(6):1092–1104CrossRef

18.

Liu W, Wang J, Kumar S, Chang SF (2011) Hashing with graphs. International Conference on Machine Learning, Bellevue

19.

Liu W, Wang J, Ji R, Jiang Y, Chang SF (2012) Supervised hashing with kernels. In: IEEE International Conference on Computer Vision and Pattern Recognition, Providence

20.

Maji S, Berg A, Malik J (2008) Classification using intersection kenrel support vector machines is efficient. IEEE International Conference on Computer Vision and, Pattern Recognition

21.

Marszałek M, Laptev I, Schmid C (2009) Actions in context. IEEE International Conference on Computer Vision and, Pattern Recognition

22.

Mikolajczyk K, Schmid C (1995) A performance evaluation of local descriptors. IEEE Transact Pattern Anal Mach Intell 27(10):1615–1630CrossRef

23.

Moosmann F, Triggs B, Jurie F (2006) Fast discriminative visual codebooks using randomized clustering forests. NIPS, pp 985–992

24.

NIST. TRECVID. http://trecvid.nist.gov/

25.

Nister D, Stewenius H (2006) Scalable recognition with a vocabulary tree. IEEE International Conference on Computer Vision and Pattern Recognition, pp 2161–2168

26.

Parker C (2010) An exploration of semantic audio classification. In: Technical Report 345596K, Eastman Kodak Company

27.

Salakhutdinov R, Hinton G (2009) Semantic hashing. Int J Approx Reason 50(7):969–978CrossRef

28.

Takacs G, Chandrasekhar V, Gelfand N, Xiong Y, Chen WC, Bismpigiannis T, Grzeszczuk R, Pulli K, Girod B (2008) Outdoors augmented reality on mobile phone using loxel-based visual feature organization. In: ACM international conference on Multimedia, information retrieval, pp 427–434

29.

Uijings J, Smeulders A, Scha R (2010) Real-time visual concept classification. IEEE Transact Multimed 12(7):665–681

30.

Wang J, Kumar S, Chang S (2012) Semi-supervised hashing for large scale search, IEEE Transact Pattern Anal Mach Intell

31.

Weiss Y, Torralba A, Fergus R (2008) Spectral hashing, NIPS

Titel: Mobile video concept classification
verfasst von: Wei Jiang
Publikationsdatum: 01.09.2013
Verlag: Springer London
Erschienen in: International Journal of Multimedia Information Retrieval / Ausgabe 3/2013
Print ISSN: 2192-6611
Elektronische ISSN: 2192-662X
DOI: https://doi.org/10.1007/s13735-012-0027-z

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Weitere Artikel der Ausgabe 3/2013

An intelligent content-based image retrieval system for clinical decision support in brain tumor diagnosis

Searching for images by video

Multimodal biomedical image retrieval using hierarchical classification and modality fusion

A geometrical distance measure for determining the similarity of musical harmony