Top

International Journal of Speech Technology

Published in:

01-06-2013

Multiclass support vector machines for environmental sounds classification in visual domain based on log-Gabor filters

Authors: Souli Sameh, Zied Lachiri

Published in: International Journal of Speech Technology | Issue 2/2013

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

This paper presents an approach aimed at recognizing environmental sounds for surveillance and security applications.

We propose a robust environmental sound classification approach, based on spectrograms features derive from log-Gabor filters. This approach includes three methods. In the first two methods, the spectrograms are passed through an appropriate log-Gabor filter banks and the outputs are averaged and underwent an optimal feature selection procedure based on a mutual information criteria. The third method uses the same steps but applied only to three patches extracted from each spectrogram.

To investigate the accuracy of the proposed methods, we conduct experiments using a large database containing 10 environmental sound classes. The classification results based on Multiclass Support Vector Machines show that the second method is the most efficient with an average classification accuracy of 89.62 %.

previous article Characterization and recognition of emotions from speech using excitation source information

next article Robust emotional speech classification in the presence of babble noise

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Chu, S., Narayanan, S., & Kuo, C. C. J. (2009). Environmental sound recognition with time-frequency audio features. IEEE Transactions on Audio, Speech, and Language Processing, 17, 1142–1158. CrossRef

Dennis, J., Tran, H. D., & Li, H. (2011). Spectrogram image feature for sound event classification in mismatched conditions. IEEE Signal Processing Letters, 18, 130–133. CrossRef

Ezzat, T., Bouvrie, J., & Poggio, T. (2007). Spectro-temporal analysis of speech using 2-d Gabor filters. In Proc. interspeech (pp. 1–4).

He, L., Lech, M., Maddage, N., & Allen, N. (2009a). Stress and emotion recognition using log-Gabor filter. In Proc. of 3rd international conference on affective computing and intelligent interaction and workshops, ACII, Amsterdam (pp. 1–6).

He, L., Lech, M., Maddage, N. C., & Allen, N. (2009b). Stress detection using speech spectrograms and sigma-pi neuron units. In Proc. of int. conf. on natural computation (pp. 260–264).

Hsu, C.-W., & Lin, C.-J. (2002). A comparison of methods for multi-class support vector machines. IEEE Transactions on Neural Networks, 13, 415–425. CrossRef

Hsu, C.-W., Chang, C.-C., & Lin, C.-J. (2009). A practical guide to support vector classification. Department of Computer Science and Information Engineering National Taiwan University, Taipei, Taiwan. Available: www.csie.ntu.edu.tw/~cjlin/.

Kleinschmidt, M. (2002). Methods for capturing spectro-temporal modulations in automatic speech recognition. IEEE Transactions on Acoustics, Speech, and Signal Processing, 88, 416–422.

Kleinschmidt, M. (2003). Localized spectro-temporal features for automatic speech recognition. In Proc. Eurospeech (pp. 2573–2576).

Kwak, N., & Choi, C. (2002). Input feature selection for classification problems. IEEE Transactions on Neural Networks, 13, 143–159. CrossRef

Kuncheva, L. I. (2004). Combining pattern classifiers methods and algorithms. New York: Wiley. ISBN 0-471-21078-1. MATHCrossRef

Lamper, T. A., & O’Keefe, S. E. M. (2010). A survey of spectrogram track detection algorithms. Applied Acoustics, 71, 87–100. CrossRef

Leonardo Software website. http://www.leonardosoft.com.

Mallat, S. (1999). A wavelet tour of signal processing (2nd edn.). San Diego: Academic Press. MATH

Mallat, S., & Peyré, G. (2007). A review of bandelet methods for geometrical image representation. Numerical Algorithms, 44, 205–234. MathSciNetMATHCrossRef

Rabaoui, A., Davy, M., Rossignol, S., & Ellouze, N. (2008). Using one-class SVMs and wavelets for audio surveillance. IEEE Transactions on Information Forensics and Security, 3, 763–775. CrossRef

Scholkopf, B., & Smola, A. (2001). Learning with kernels. Cambridge: MIT Press.

Schulz-Mir, H., Serre, T., Wolf, L., Bileschi, S., Riesenhuber, M., & Poggio, T. (2007). Robust object recognition with cortex-like mechanisms. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29, 411–426. CrossRef

Souli, S., & Lachiri, Z. (2011). Environmental sounds classification based on visual features. In Lecture notes on computer science: Vol. 7042. Proc. of CIARP, Chile (pp. 459–466). Berlin: Springer.

Vapnik, V. N. (1999). An overview of statistical learning theory. IEEE Transactions on Neural Networks, 10, 988–999. CrossRef

Vapnik, V., & Chapelle, O. (2000). Bounds on error expectation for support vector machines. Neural Computation, 12, 2013–2036. CrossRef

Wang, J.-C., Lee, H.-P., Wang, J.-F., & Lin, C.-B. (2008). Robust environmental sound recognition for home automation. IEEE Transactions on Automation Science and Engineering, 5, 25–31. CrossRef

Xinyi, Z., Jianxiao, Y., & Qiang, H. (2009). Research of STRAIGHT spectrogram and difference subspace algorithm for speech recognition. In Int. congress on image and signal processing (CISP) (pp. 1–4).

Yu, G., & Slotine, J. J. (2008). Fast wavelet-based visual classification. In Proc. of IEEE international conference on pattern recognition, ICPR, Tampa (pp. 1–5).

Yu, G., & Slotine, J. J. (2009). Audio classification from time-frequency texture. In Proc. IEEE ICASSP, Taipei (pp. 1677–1680).

Yu, G., Mallat, S., & Bacry, E. (2008). Audio denoising by time-frequency block thresholding. IEEE Transactions on Signal Processing, 56, 1830–1839. MathSciNetCrossRef

Title: Multiclass support vector machines for environmental sounds classification in visual domain based on log-Gabor filters
Authors: Souli Sameh
Zied Lachiri
Publication date: 01-06-2013
Publisher: Springer US
Published in: International Journal of Speech Technology / Issue 2/2013
Print ISSN: 1381-2416
Electronic ISSN: 1572-8110
DOI: https://doi.org/10.1007/s10772-012-9174-0

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Other articles of this Issue 2/2013

Robust emotional speech classification in the presence of babble noise

Gender-dependent emotion recognition based on HMMs and SPHMMs

Expressive speech synthesis: a review

An efficient lattice-based phonetic search method for accelerating keyword spotting in large speech databases

Emotion recognition from speech using global and local prosodic features

The optimized wavelet filters for speech compression