Skip to main content
Top
Published in: International Journal of Speech Technology 2/2013

01-06-2013

Multiclass support vector machines for environmental sounds classification in visual domain based on log-Gabor filters

Authors: Souli Sameh, Zied Lachiri

Published in: International Journal of Speech Technology | Issue 2/2013

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

This paper presents an approach aimed at recognizing environmental sounds for surveillance and security applications.
We propose a robust environmental sound classification approach, based on spectrograms features derive from log-Gabor filters. This approach includes three methods. In the first two methods, the spectrograms are passed through an appropriate log-Gabor filter banks and the outputs are averaged and underwent an optimal feature selection procedure based on a mutual information criteria. The third method uses the same steps but applied only to three patches extracted from each spectrogram.
To investigate the accuracy of the proposed methods, we conduct experiments using a large database containing 10 environmental sound classes. The classification results based on Multiclass Support Vector Machines show that the second method is the most efficient with an average classification accuracy of 89.62 %.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
go back to reference Chu, S., Narayanan, S., & Kuo, C. C. J. (2009). Environmental sound recognition with time-frequency audio features. IEEE Transactions on Audio, Speech, and Language Processing, 17, 1142–1158. CrossRef Chu, S., Narayanan, S., & Kuo, C. C. J. (2009). Environmental sound recognition with time-frequency audio features. IEEE Transactions on Audio, Speech, and Language Processing, 17, 1142–1158. CrossRef
go back to reference Dennis, J., Tran, H. D., & Li, H. (2011). Spectrogram image feature for sound event classification in mismatched conditions. IEEE Signal Processing Letters, 18, 130–133. CrossRef Dennis, J., Tran, H. D., & Li, H. (2011). Spectrogram image feature for sound event classification in mismatched conditions. IEEE Signal Processing Letters, 18, 130–133. CrossRef
go back to reference Ezzat, T., Bouvrie, J., & Poggio, T. (2007). Spectro-temporal analysis of speech using 2-d Gabor filters. In Proc. interspeech (pp. 1–4). Ezzat, T., Bouvrie, J., & Poggio, T. (2007). Spectro-temporal analysis of speech using 2-d Gabor filters. In Proc. interspeech (pp. 1–4).
go back to reference He, L., Lech, M., Maddage, N., & Allen, N. (2009a). Stress and emotion recognition using log-Gabor filter. In Proc. of 3rd international conference on affective computing and intelligent interaction and workshops, ACII, Amsterdam (pp. 1–6). He, L., Lech, M., Maddage, N., & Allen, N. (2009a). Stress and emotion recognition using log-Gabor filter. In Proc. of 3rd international conference on affective computing and intelligent interaction and workshops, ACII, Amsterdam (pp. 1–6).
go back to reference He, L., Lech, M., Maddage, N. C., & Allen, N. (2009b). Stress detection using speech spectrograms and sigma-pi neuron units. In Proc. of int. conf. on natural computation (pp. 260–264). He, L., Lech, M., Maddage, N. C., & Allen, N. (2009b). Stress detection using speech spectrograms and sigma-pi neuron units. In Proc. of int. conf. on natural computation (pp. 260–264).
go back to reference Hsu, C.-W., & Lin, C.-J. (2002). A comparison of methods for multi-class support vector machines. IEEE Transactions on Neural Networks, 13, 415–425. CrossRef Hsu, C.-W., & Lin, C.-J. (2002). A comparison of methods for multi-class support vector machines. IEEE Transactions on Neural Networks, 13, 415–425. CrossRef
go back to reference Hsu, C.-W., Chang, C.-C., & Lin, C.-J. (2009). A practical guide to support vector classification. Department of Computer Science and Information Engineering National Taiwan University, Taipei, Taiwan. Available: www.csie.ntu.edu.tw/~cjlin/. Hsu, C.-W., Chang, C.-C., & Lin, C.-J. (2009). A practical guide to support vector classification. Department of Computer Science and Information Engineering National Taiwan University, Taipei, Taiwan. Available: www.​csie.​ntu.​edu.​tw/​~cjlin/​.
go back to reference Kleinschmidt, M. (2002). Methods for capturing spectro-temporal modulations in automatic speech recognition. IEEE Transactions on Acoustics, Speech, and Signal Processing, 88, 416–422. Kleinschmidt, M. (2002). Methods for capturing spectro-temporal modulations in automatic speech recognition. IEEE Transactions on Acoustics, Speech, and Signal Processing, 88, 416–422.
go back to reference Kleinschmidt, M. (2003). Localized spectro-temporal features for automatic speech recognition. In Proc. Eurospeech (pp. 2573–2576). Kleinschmidt, M. (2003). Localized spectro-temporal features for automatic speech recognition. In Proc. Eurospeech (pp. 2573–2576).
go back to reference Kwak, N., & Choi, C. (2002). Input feature selection for classification problems. IEEE Transactions on Neural Networks, 13, 143–159. CrossRef Kwak, N., & Choi, C. (2002). Input feature selection for classification problems. IEEE Transactions on Neural Networks, 13, 143–159. CrossRef
go back to reference Kuncheva, L. I. (2004). Combining pattern classifiers methods and algorithms. New York: Wiley. ISBN 0-471-21078-1. MATHCrossRef Kuncheva, L. I. (2004). Combining pattern classifiers methods and algorithms. New York: Wiley. ISBN 0-471-21078-1. MATHCrossRef
go back to reference Lamper, T. A., & O’Keefe, S. E. M. (2010). A survey of spectrogram track detection algorithms. Applied Acoustics, 71, 87–100. CrossRef Lamper, T. A., & O’Keefe, S. E. M. (2010). A survey of spectrogram track detection algorithms. Applied Acoustics, 71, 87–100. CrossRef
go back to reference Mallat, S. (1999). A wavelet tour of signal processing (2nd edn.). San Diego: Academic Press. MATH Mallat, S. (1999). A wavelet tour of signal processing (2nd edn.). San Diego: Academic Press. MATH
go back to reference Mallat, S., & Peyré, G. (2007). A review of bandelet methods for geometrical image representation. Numerical Algorithms, 44, 205–234. MathSciNetMATHCrossRef Mallat, S., & Peyré, G. (2007). A review of bandelet methods for geometrical image representation. Numerical Algorithms, 44, 205–234. MathSciNetMATHCrossRef
go back to reference Rabaoui, A., Davy, M., Rossignol, S., & Ellouze, N. (2008). Using one-class SVMs and wavelets for audio surveillance. IEEE Transactions on Information Forensics and Security, 3, 763–775. CrossRef Rabaoui, A., Davy, M., Rossignol, S., & Ellouze, N. (2008). Using one-class SVMs and wavelets for audio surveillance. IEEE Transactions on Information Forensics and Security, 3, 763–775. CrossRef
go back to reference Scholkopf, B., & Smola, A. (2001). Learning with kernels. Cambridge: MIT Press. Scholkopf, B., & Smola, A. (2001). Learning with kernels. Cambridge: MIT Press.
go back to reference Schulz-Mir, H., Serre, T., Wolf, L., Bileschi, S., Riesenhuber, M., & Poggio, T. (2007). Robust object recognition with cortex-like mechanisms. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29, 411–426. CrossRef Schulz-Mir, H., Serre, T., Wolf, L., Bileschi, S., Riesenhuber, M., & Poggio, T. (2007). Robust object recognition with cortex-like mechanisms. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29, 411–426. CrossRef
go back to reference Souli, S., & Lachiri, Z. (2011). Environmental sounds classification based on visual features. In Lecture notes on computer science: Vol. 7042. Proc. of CIARP, Chile (pp. 459–466). Berlin: Springer. Souli, S., & Lachiri, Z. (2011). Environmental sounds classification based on visual features. In Lecture notes on computer science: Vol. 7042. Proc. of CIARP, Chile (pp. 459–466). Berlin: Springer.
go back to reference Vapnik, V. N. (1999). An overview of statistical learning theory. IEEE Transactions on Neural Networks, 10, 988–999. CrossRef Vapnik, V. N. (1999). An overview of statistical learning theory. IEEE Transactions on Neural Networks, 10, 988–999. CrossRef
go back to reference Vapnik, V., & Chapelle, O. (2000). Bounds on error expectation for support vector machines. Neural Computation, 12, 2013–2036. CrossRef Vapnik, V., & Chapelle, O. (2000). Bounds on error expectation for support vector machines. Neural Computation, 12, 2013–2036. CrossRef
go back to reference Wang, J.-C., Lee, H.-P., Wang, J.-F., & Lin, C.-B. (2008). Robust environmental sound recognition for home automation. IEEE Transactions on Automation Science and Engineering, 5, 25–31. CrossRef Wang, J.-C., Lee, H.-P., Wang, J.-F., & Lin, C.-B. (2008). Robust environmental sound recognition for home automation. IEEE Transactions on Automation Science and Engineering, 5, 25–31. CrossRef
go back to reference Xinyi, Z., Jianxiao, Y., & Qiang, H. (2009). Research of STRAIGHT spectrogram and difference subspace algorithm for speech recognition. In Int. congress on image and signal processing (CISP) (pp. 1–4). Xinyi, Z., Jianxiao, Y., & Qiang, H. (2009). Research of STRAIGHT spectrogram and difference subspace algorithm for speech recognition. In Int. congress on image and signal processing (CISP) (pp. 1–4).
go back to reference Yu, G., & Slotine, J. J. (2008). Fast wavelet-based visual classification. In Proc. of IEEE international conference on pattern recognition, ICPR, Tampa (pp. 1–5). Yu, G., & Slotine, J. J. (2008). Fast wavelet-based visual classification. In Proc. of IEEE international conference on pattern recognition, ICPR, Tampa (pp. 1–5).
go back to reference Yu, G., & Slotine, J. J. (2009). Audio classification from time-frequency texture. In Proc. IEEE ICASSP, Taipei (pp. 1677–1680). Yu, G., & Slotine, J. J. (2009). Audio classification from time-frequency texture. In Proc. IEEE ICASSP, Taipei (pp. 1677–1680).
go back to reference Yu, G., Mallat, S., & Bacry, E. (2008). Audio denoising by time-frequency block thresholding. IEEE Transactions on Signal Processing, 56, 1830–1839. MathSciNetCrossRef Yu, G., Mallat, S., & Bacry, E. (2008). Audio denoising by time-frequency block thresholding. IEEE Transactions on Signal Processing, 56, 1830–1839. MathSciNetCrossRef
Metadata
Title
Multiclass support vector machines for environmental sounds classification in visual domain based on log-Gabor filters
Authors
Souli Sameh
Zied Lachiri
Publication date
01-06-2013
Publisher
Springer US
Published in
International Journal of Speech Technology / Issue 2/2013
Print ISSN: 1381-2416
Electronic ISSN: 1572-8110
DOI
https://doi.org/10.1007/s10772-012-9174-0

Other articles of this Issue 2/2013

International Journal of Speech Technology 2/2013 Go to the issue