Skip to main content

2015 | OriginalPaper | Buchkapitel

Automatic Sound Recognition of Urban Environment Events

verfasst von : Theodoros Theodorou, Iosif Mporas, Nikos Fakotakis

Erschienen in: Speech and Computer

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The audio analysis of speaker’s surroundings has been a first step for several processing systems that enable speaker’s mobility though his daily life. These algorithms usually operate in a short-time analysis decomposing the incoming events in time and frequency domain. In this paper, an automatic sound recognizer is studied, which investigates audio events of interest from urban environment. Our experiments were conducted using a close set of audio events from which well known and commonly used audio descriptors were extracted and models were training using powerful machine learning algorithms. The best urban sound recognition performance was achieved by SVMs with accuracy equal to approximately 93 %.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
2.
Zurück zum Zitat Aucouturier, J.J., Defreville, B., Pachet, F.: The bag-of-frames approach to audio pattern recognition: a sufficient model for urban soundscapes but not for polyphonic music. J. Acoust. Soc. Am. 122(2), 881–891 (2007)CrossRef Aucouturier, J.J., Defreville, B., Pachet, F.: The bag-of-frames approach to audio pattern recognition: a sufficient model for urban soundscapes but not for polyphonic music. J. Acoust. Soc. Am. 122(2), 881–891 (2007)CrossRef
3.
Zurück zum Zitat Bartsch, M.A., Wakefield, G.H.: Audio thumbnailing of popular music using chroma-based representations. IEEE Trans. Multimedia 7(1), 96–104 (2005)CrossRef Bartsch, M.A., Wakefield, G.H.: Audio thumbnailing of popular music using chroma-based representations. IEEE Trans. Multimedia 7(1), 96–104 (2005)CrossRef
4.
Zurück zum Zitat Casey, M.: General sound classification and similarity in MPEG-7. Organised Sound 6(02), 153–164 (2001)CrossRef Casey, M.: General sound classification and similarity in MPEG-7. Organised Sound 6(02), 153–164 (2001)CrossRef
5.
Zurück zum Zitat Couvreur, L., Laniray, M.: Automatic noise recognition in urban environments based on artificial neural networks and hidden markov models. InterNoise, Prague, Czech Republic, pp. 1–8 (2004) Couvreur, L., Laniray, M.: Automatic noise recognition in urban environments based on artificial neural networks and hidden markov models. InterNoise, Prague, Czech Republic, pp. 1–8 (2004)
6.
Zurück zum Zitat Dogan, E., Sert, M., Yazici, A.: Content-based classification and segmentation of mixed-type audio by using mpeg-7 features. In:First International Conference on Advances in Multimedia, MMEDIA 2009, pp. 152–157. IEEE (2009) Dogan, E., Sert, M., Yazici, A.: Content-based classification and segmentation of mixed-type audio by using mpeg-7 features. In:First International Conference on Advances in Multimedia, MMEDIA 2009, pp. 152–157. IEEE (2009)
7.
Zurück zum Zitat Eyben, F., Wöllmer, M., Schuller, B.: Opensmile: the munich versatile and fast open-source audio feature extractor. In: Proceedings of the international conference on Multimedia, pp. 1459–1462. ACM (2010) Eyben, F., Wöllmer, M., Schuller, B.: Opensmile: the munich versatile and fast open-source audio feature extractor. In: Proceedings of the international conference on Multimedia, pp. 1459–1462. ACM (2010)
8.
Zurück zum Zitat Fernandez, L.P.S., Ruiz, A.R., de JM Juarez, J.: Urban noise permanent monitoring and pattern recognition. In: Proceedings of the European Conference of Communications-ECCOM, vol. 10, pp. 143–148 (2010) Fernandez, L.P.S., Ruiz, A.R., de JM Juarez, J.: Urban noise permanent monitoring and pattern recognition. In: Proceedings of the European Conference of Communications-ECCOM, vol. 10, pp. 143–148 (2010)
9.
Zurück zum Zitat Huang, R., Hansen, J.H.: Advances in unsupervised audio classification and segmentation for the broadcast news and NGSW corpora. IEEE Trans. Audio Speech Lang. Process. 14(3), 907–919 (2006)CrossRef Huang, R., Hansen, J.H.: Advances in unsupervised audio classification and segmentation for the broadcast news and NGSW corpora. IEEE Trans. Audio Speech Lang. Process. 14(3), 907–919 (2006)CrossRef
10.
Zurück zum Zitat Khunarsal, P., Lursinsap, C., Raicharoen, T.: Very short time environmental sound classification based on spectrogram pattern matching. Inf. Sci. 243, 57–74 (2013)CrossRef Khunarsal, P., Lursinsap, C., Raicharoen, T.: Very short time environmental sound classification based on spectrogram pattern matching. Inf. Sci. 243, 57–74 (2013)CrossRef
11.
Zurück zum Zitat Kim, H.G., Moreau, N., Sikora, T.: Audio classification based on MPEG-7 spectral basis representations. IEEE Trans. Circuits Syst. Video Technol. 14(5), 716–725 (2004)CrossRef Kim, H.G., Moreau, N., Sikora, T.: Audio classification based on MPEG-7 spectral basis representations. IEEE Trans. Circuits Syst. Video Technol. 14(5), 716–725 (2004)CrossRef
12.
Zurück zum Zitat Kinnunen, T., Saeidi, R., Leppänen, J., Saarinen, J.P.: Audio context recognition in variable mobile environments from short segments using speaker and language recognizers. In: The Speaker and Language Recognition Workshop, pp. 301–311 (2012) Kinnunen, T., Saeidi, R., Leppänen, J., Saarinen, J.P.: Audio context recognition in variable mobile environments from short segments using speaker and language recognizers. In: The Speaker and Language Recognition Workshop, pp. 301–311 (2012)
13.
Zurück zum Zitat Lee, K., Slaney, M.: Automatic chord recognition from audio using a HMM with supervised learning. In: ISMIR, pp. 133–137 (2006) Lee, K., Slaney, M.: Automatic chord recognition from audio using a HMM with supervised learning. In: ISMIR, pp. 133–137 (2006)
14.
Zurück zum Zitat Lu, H., Pan, W., Lane, N.D., Choudhury, T., Campbell, A.T.: Soundsense: scalable sound sensing for people-centric applications on mobile phones. In: Proceedings of the 7th international conference on Mobile systems, applications, and services, pp. 165–178. ACM (2009) Lu, H., Pan, W., Lane, N.D., Choudhury, T., Campbell, A.T.: Soundsense: scalable sound sensing for people-centric applications on mobile phones. In: Proceedings of the 7th international conference on Mobile systems, applications, and services, pp. 165–178. ACM (2009)
15.
Zurück zum Zitat Ntalampiras, S.: Universal background modeling for acoustic surveillance of urban traffic. Digital Signal Process. 31, 69–78 (2014)CrossRef Ntalampiras, S.: Universal background modeling for acoustic surveillance of urban traffic. Digital Signal Process. 31, 69–78 (2014)CrossRef
16.
Zurück zum Zitat Ntalampiras, S., Potamitis, I., Fakotakis, N.: Exploiting temporal feature integration for generalized sound recognition. EURASIP J. Adv. Sig. Process. 2009(1), 807162 (2009)CrossRef Ntalampiras, S., Potamitis, I., Fakotakis, N.: Exploiting temporal feature integration for generalized sound recognition. EURASIP J. Adv. Sig. Process. 2009(1), 807162 (2009)CrossRef
17.
Zurück zum Zitat Patsis, Y., Verhelst, W.: A speech/music/silence/garbage/classifier for searching and indexing broadcast news material. In: 19th International Workshop on Database and Expert Systems Application, DEXA 2008, pp. 585–589. IEEE (2008) Patsis, Y., Verhelst, W.: A speech/music/silence/garbage/classifier for searching and indexing broadcast news material. In: 19th International Workshop on Database and Expert Systems Application, DEXA 2008, pp. 585–589. IEEE (2008)
18.
Zurück zum Zitat Salamon, J., Jacoby, C., Bello, J.P.: A dataset and taxonomy for urban sound research. In: Proceedings of the ACM International Conference on Multimedia, pp. 1041–1044. ACM (2014) Salamon, J., Jacoby, C., Bello, J.P.: A dataset and taxonomy for urban sound research. In: Proceedings of the ACM International Conference on Multimedia, pp. 1041–1044. ACM (2014)
19.
Zurück zum Zitat Slaney, M.: Auditory toolbox. Interval Research Corporation. Technical report vol. 10 (1998) Slaney, M.: Auditory toolbox. Interval Research Corporation. Technical report vol. 10 (1998)
20.
Zurück zum Zitat Smith, J.W., Pijanowski, B.C.: Human and policy dimensions of soundscape ecology. Global Environ. Change 28, 63–74 (2014)CrossRef Smith, J.W., Pijanowski, B.C.: Human and policy dimensions of soundscape ecology. Global Environ. Change 28, 63–74 (2014)CrossRef
21.
Zurück zum Zitat Torija, A., Diego, P.R., Ramos-Ridao, A.: Ann-based m events. a too against envi environment (2011) Torija, A., Diego, P.R., Ramos-Ridao, A.: Ann-based m events. a too against envi environment (2011)
22.
Zurück zum Zitat Tran, H.D., Li, H.: Sound event recognition with probabilistic distance SVMs. IEEE Trans. Audio Speech Lang. Process. 19(6), 1556–1568 (2011)CrossRef Tran, H.D., Li, H.: Sound event recognition with probabilistic distance SVMs. IEEE Trans. Audio Speech Lang. Process. 19(6), 1556–1568 (2011)CrossRef
23.
Zurück zum Zitat Valero, X., Alías, F., Oldoni, D., Botteldooren, D.: Support vector machines and self-organizing maps for the recognition of sound events in urban soundscapes. In: 41st International Congress and Exposition on Noise Control Engineering (Inter-Noise-2012). Institute of Noise Control Engineering (2012) Valero, X., Alías, F., Oldoni, D., Botteldooren, D.: Support vector machines and self-organizing maps for the recognition of sound events in urban soundscapes. In: 41st International Congress and Exposition on Noise Control Engineering (Inter-Noise-2012). Institute of Noise Control Engineering (2012)
24.
Zurück zum Zitat Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann (2005) Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann (2005)
Metadaten
Titel
Automatic Sound Recognition of Urban Environment Events
verfasst von
Theodoros Theodorou
Iosif Mporas
Nikos Fakotakis
Copyright-Jahr
2015
DOI
https://doi.org/10.1007/978-3-319-23132-7_16