nach oben

International Journal of Multimedia Information Retrieval

Erschienen in:

01.06.2016 | Regular Paper

Automatic environmental sound concepts discovery for video retrieval

verfasst von: Issam Feki, Anis Ben Ammar, Adel M. Alimi

Erschienen in: International Journal of Multimedia Information Retrieval | Ausgabe 2/2016

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

This paper characterizes a new method for video–soundtrack retrieval based on environmental sounds. Actually, a set of 26 semantic audio concepts is employed. This set is chosen for its helpfulness to the users in terms of video browsing. Additionally, a set of 2000 videos has been annotated with these concepts. To enhance a new signal processing, we start with the separation of the audio sources. In addition, using a fundamental representation of the audio signal as a sequence of Mel Frequency Cepstral Coefficient, we can carry out experiments with three signal representations: the Support Vector machines, the Gaussian Mixture Model and the Hidden Markov Model. Throughout the experiment synthesis, we maintain the Gaussian Mixture Model classifier based on the Kullback–Leibler distance measure. As a matter of fact, we preserve this audio concept classification to integrate a video retrieval system. Hence, the obtained results mirror the effectiveness of our approaches in distinguishing environmental sound and researching video.

Vorheriger Artikel An efficient method for video shot boundary detection and keyframe extraction using SIFT-point distribution histogram

Nächster Artikel Classification of color texture images based on modified WLD

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Saunders J, Lockheed Martin Co (1996) Real-time discrimination of broadcast speech/music. In: IEEE International Conference on Acoustic, Speech, Signal Process, Atlanta, pp 993–996

Williams G, Ellis, Daniel PW (1999) Speech/music discrimination based on posterior probability features. In: 6th European Conference on Speech Communication and Technology. Budapest

Scheirer E, Slaney M (1997) Construction and evaluation of a robust multifeature speech/music discriminator. In: IEEE International Conferences on Acoust, Speech, Signal Process, Munich, pp 1331–1334

Ajmera J, McCowan I, Bourlard H (2003) Speech/music segmentation using entropy and dynamism features in a HMM classification framework. Elsevier Speech Commun 40(3):351–363CrossRef

Zhang T, Kuo C-CJ (2001) Audio content analysis for online audiovisual data segmentation and classification. IEEE Trans Speech Audio Process 9(4):441–457 FallCrossRef

Guo G, Li SZ (2003) Content-based audio classification and retrieval by support vector machines. IEEE Trans Neural Netw 14(1):209–215CrossRef

Wold E, Blum T, Wheaton J (1996) Content-based classification, search and retrieval of audio. IEEE Trans Multimed 3(3):27–36CrossRef

Malkin R, Waibel A (2005) Classifying user environments for mobile applications using linear autoencoding of ambient audio. Proc IEEE Int Conf Acoustic Speech Signal Process 5:509–512

Milner BL, Smith D (2006) Acoustic environment classification. ACM Trans Speech Lang Process 3(2):1–22MathSciNet

10.

Chu S, Narayanan S, Kuo C-CJ (2006) Content analysis for acoustic environment classification in mobile robots. In: International Conference on Aurally Informed Performance: Integrating Machine Listening and Auditory Presentation in Robotic System, Arlington, pp 16–21

11.

Su F, Yang L, Lu T, Wang G (2011) Environmental sound classification for scene recognition using local discriminant bases and hmm. In: 19th ACM international conference on Multimedia, Newyork, pp 1389–1392

12.

Okuyucu C, Sert M, Yazici A (2013) Audio feature and classifier analysis for efficient recognition of environmental sounds. IEEE International Symposium on Multimedia. Anaheim, pp 125–132

13.

Xia-qing X, Quan-wei B, Lei H, Xu W (2013) Study and application of semantic-based image retrieval. J China Univ Posts Telecommun 20(2):136–142

14.

Andre-Obrecht R (1988) A new statistical approach for automatic segmentation of continuous speech signals. IEEE Trans Acoustic Speech Signal Process 36(1):29–40CrossRef

15.

Thornburg H (2005) Detection and modeling of transient audio signals with prior information. Ph.D. dissertation, Stanford Univ., Stanford

16.

Ellis DPP, Lee K (2004) Minimal-impact audio-based personal archives. 1st ACM Workshop Continuous Archiving and Recording of Personal Experiences CARPE-04, New York

17.

Lie Lu, Hanjalic A (2006) Audio elements based auditory scene segmentation. In: IEEE International Conference on Acoustic, Speech, Signal Process, Toulouse, France

18.

Wichern G, Thornburg H, Mechtley B, Fink A, Tu K, Spanias A (2007) Robust multi-feature segmentation and indexing for natural sound environments. In: IEEE/EURASIP International Workshop Content- Based Multimedia Indexing, Bordeaux, France, pp 69–76

19.

Jafer E, Mahdi AE (2003) Wavelet based voiced/unvoiced classification algorithm. EURASIP Conference focused on video/ image processing and multimedia communications, pp 667–672

20.

Feki I, Ben Ammar A, Alimi AM (2012) New process to identify audio concepts based on binary classifiers encapsulation. Int J Comp Elect Eng 4(4):515–518CrossRef

21.

Feki I, Ben Ammar A, Alimi AM (2014) Query sound-by-example video retrieval framework. In: IEEE proceedings of International Conference on Hybrid Intelligent Systems, Kuwait, pp 297–302

22.

Vasconcelos N (2004) On the efficient evaluation of probabilistic similarity functions for image retrieval. IEEE Trans Inform Theory 50(7):1482–1496MathSciNetCrossRefMATH

23.

Helén M, Virtanen T (2007) Audio query by example of audio signals using Euclidean distance between Gaussian mixture models. IEEE International Conference on Audio, Speech and Signal Processing, Honolulu, USA, pp 225–228

24.

Zhao J, Zhang Z, Han S, Qu C, Yuan Z, Zhang D (2011) SVM based forest fire detection using static and dynamic features. Comp Sci Inform Syst 8(3):821–841CrossRef

25.

Rabiner L, Juang B (1993) Fundamentals of speech recognition. Prentice Hall, New JerseyMATH

26.

Weitao W, Yuehui J, Tan Y, Yidong C (2012) A video quality assessment method using subjective and objective mapping stategy. In: IEEE International Conference on Cloud Computing and Intelligent Systems, vol 2, Hangzhou, pp 514–518

27.

Jadhav SM, Patil VS (2012) Review of significant researches on multimedia information retrieval. In: IEEE International Conference on Communication, Information and Computing Technology, Mumbai, pp 1–6

Titel: Automatic environmental sound concepts discovery for video retrieval
verfasst von: Issam Feki
Anis Ben Ammar
Adel M. Alimi
Publikationsdatum: 01.06.2016
Verlag: Springer London
Erschienen in: International Journal of Multimedia Information Retrieval / Ausgabe 2/2016
Print ISSN: 2192-6611
Elektronische ISSN: 2192-662X
DOI: https://doi.org/10.1007/s13735-016-0096-5

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Weitere Artikel der Ausgabe 2/2016

On the use of commonsense ontology for multimedia event recounting

State of the journal

Learning “initial feature weights” for CBIR using query augmentation

An efficient method for video shot boundary detection and keyframe extraction using SIFT-point distribution histogram

Classification of color texture images based on modified WLD