Skip to main content

2021 | OriginalPaper | Buchkapitel

Parameter Tuning for Wavelet-Based Sound Event Detection Using Neural Networks

verfasst von : Pallav Raval, Jabez Christopher

Erschienen in: Artificial Intelligence in Music, Sound, Art and Design

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Wavelet-based audio processing is used for sound event detection. The low-level audio features (timbral or temporal features) are found to be effective to differentiate between different sound events and that is why frequency processing algorithms have become popular in recent times. Wavelet based sound event detection is found effective to detect sudden onsets in audio signals because it offers unique advantages compared to traditional frequency-based sound event detection using machine learning approaches. In this work, wavelet transform is applied to the audio to extract audio features which can predict the occurrence of a sound event using a classical feedforward neural network. Additionally, this work attempts to identify the optimal wavelet parameters to enhance classification performance. 3 window sizes, 6 wavelet families, 4 wavelet levels, 3 decomposition levels and 2 classifier models are used for experimental analysis. The UrbanSound8k data is used and a classification accuracy up to 97% is obtained. Some major observations with regard to parameter-estimation are as follows: wavelet level and wavelet decomposition level should be low; it is desirable to have a large window; however, the window size is limited by the duration of the sound event. A window size greater than the duration of the sound event will decrease classification performance. Most of the wavelet families can classify the sound events; however, using Symlet, Daubechies, Reverse biorthogonal and Biorthogonal families will save computational resources (lesser epochs) because they yield better accuracy compared to Fejér-Korovkin and Coiflets. This work conveys that wavelet-based sound event detection seems promising, and can be extended to detect most of the common sounds and sudden events occurring at various environments.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Lavner, Y., Cohen, R., Ruinskiy, D., IJzerman, H.: Baby cry detection in domestic environment using deep learning. In: 2016 IEEE International Conference on the Science of Electrical Engineering (ICSEE), pp. 1–5. IEEE (2016) Lavner, Y., Cohen, R., Ruinskiy, D., IJzerman, H.: Baby cry detection in domestic environment using deep learning. In: 2016 IEEE International Conference on the Science of Electrical Engineering (ICSEE), pp. 1–5. IEEE (2016)
2.
Zurück zum Zitat Lilja, A.P., Raboshchuk, G., Nadeu, C.: A neural network approach for automatic detection of acoustic alarms. In: BIOSIGNALS, pp. 84–91 (2017) Lilja, A.P., Raboshchuk, G., Nadeu, C.: A neural network approach for automatic detection of acoustic alarms. In: BIOSIGNALS, pp. 84–91 (2017)
3.
Zurück zum Zitat Surampudi, N., Srirangan, M., Christopher, J.: Enhanced feature extraction approaches for detection of sound events. In: 2019 IEEE 9th International Conference on Advanced Computing (IACC), pp. 223–229 (2019) Surampudi, N., Srirangan, M., Christopher, J.: Enhanced feature extraction approaches for detection of sound events. In: 2019 IEEE 9th International Conference on Advanced Computing (IACC), pp. 223–229 (2019)
4.
Zurück zum Zitat Upadhyay, S.G., Bo-Hao, S., Lee, C.-C.: Attentive convolutional recurrent neural network using phoneme-level acoustic representation for rare sound event detection. Proc. Interspeech 2020, 3102–3106 (2020)CrossRef Upadhyay, S.G., Bo-Hao, S., Lee, C.-C.: Attentive convolutional recurrent neural network using phoneme-level acoustic representation for rare sound event detection. Proc. Interspeech 2020, 3102–3106 (2020)CrossRef
5.
Zurück zum Zitat Xu, Z., Yang, Y., Hauptmann, A.G.: A discriminative CNN video representation for event detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1798–1807 (2015) Xu, Z., Yang, Y., Hauptmann, A.G.: A discriminative CNN video representation for event detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1798–1807 (2015)
6.
Zurück zum Zitat Salamon, J., Bello, J.P.: Deep convolutional neural networks and data augmentation for environmental sound classification. IEEE Signal Process. Lett. 24(3), 279–283 (2017) Salamon, J., Bello, J.P.: Deep convolutional neural networks and data augmentation for environmental sound classification. IEEE Signal Process. Lett. 24(3), 279–283 (2017)
7.
Zurück zum Zitat Cho, K., Van Merriënboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. arXiv preprint arXiv:1409.1259 (2014) Cho, K., Van Merriënboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. arXiv preprint arXiv:​1409.​1259 (2014)
8.
Zurück zum Zitat Hayashi, T., Watanabe, S., Toda, T., Hori, T., Le Roux, J., Takeda, K.: Duration-controlled LSTM for polyphonic sound event detection. IEEE/ACM Trans. Audio Speech Lang. Process. 25(11), 2059–2070 (2017) Hayashi, T., Watanabe, S., Toda, T., Hori, T., Le Roux, J., Takeda, K.: Duration-controlled LSTM for polyphonic sound event detection. IEEE/ACM Trans. Audio Speech Lang. Process. 25(11), 2059–2070 (2017)
9.
Zurück zum Zitat Cakır, E., Parascandolo, G., Heittola, T., Huttunen, H., Virtanen, T.: Convolutional recurrent neural networks for polyphonic sound event detection. IEEE/ACM Trans. Audio Speech Lang. Process. 25(6), 1291–1303 (2017)CrossRef Cakır, E., Parascandolo, G., Heittola, T., Huttunen, H., Virtanen, T.: Convolutional recurrent neural networks for polyphonic sound event detection. IEEE/ACM Trans. Audio Speech Lang. Process. 25(6), 1291–1303 (2017)CrossRef
10.
Zurück zum Zitat Choi, K., Fazekas, G., Sandler, M., Cho, K.: Convolutional recurrent neural networks for music classification. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2392–2396. IEEE (2017) Choi, K., Fazekas, G., Sandler, M., Cho, K.: Convolutional recurrent neural networks for music classification. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2392–2396. IEEE (2017)
11.
Zurück zum Zitat Martín-Morató, I., Mesaros, A., Heittola, T., Virtanen, T., Cobos, M., Ferri, F.J.: Sound event envelope estimation in polyphonic mixtures. In: ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 935–939. IEEE (2019) Martín-Morató, I., Mesaros, A., Heittola, T., Virtanen, T., Cobos, M., Ferri, F.J.: Sound event envelope estimation in polyphonic mixtures. In: ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 935–939. IEEE (2019)
12.
Zurück zum Zitat Wan, Y., et al.: Precise temporal localization of sudden onsets in audio signals using the wavelet approach. In: Audio Engineering Society Convention 147. Audio Engineering Society (2019) Wan, Y., et al.: Precise temporal localization of sudden onsets in audio signals using the wavelet approach. In: Audio Engineering Society Convention 147. Audio Engineering Society (2019)
13.
Zurück zum Zitat Sifuzzaman, M., Rafiq Islam, M., Ali, M.Z.: Application of wavelet transform and its advantages compared to Fourier transform (2009) Sifuzzaman, M., Rafiq Islam, M., Ali, M.Z.: Application of wavelet transform and its advantages compared to Fourier transform (2009)
14.
Zurück zum Zitat Yu, G., Bacry, E., Mallat, S.: Audio signal denoising with complex wavelets and adaptive block attenuation. In: 2007 IEEE International Conference on Acoustics, Speech and Signal Processing-ICASSP 2007, vol. 3, pp. III-869. IEEE (2007) Yu, G., Bacry, E., Mallat, S.: Audio signal denoising with complex wavelets and adaptive block attenuation. In: 2007 IEEE International Conference on Acoustics, Speech and Signal Processing-ICASSP 2007, vol. 3, pp. III-869. IEEE (2007)
15.
Zurück zum Zitat Salamon, J., Christopher, J., Bello, J.P.: A dataset and taxonomy for urban sound research. In: Proceedings of the 22nd ACM international conference on Multimedia, pp. 1041–1044 (2014) Salamon, J., Christopher, J., Bello, J.P.: A dataset and taxonomy for urban sound research. In: Proceedings of the 22nd ACM international conference on Multimedia, pp. 1041–1044 (2014)
17.
Zurück zum Zitat Xu, R., Yun, T., Cao, L., Liu, Y.: Compression and recovery of 3D broad-leaved tree point clouds based on compressed sensing. Forests 11(3), 257 (2020)CrossRef Xu, R., Yun, T., Cao, L., Liu, Y.: Compression and recovery of 3D broad-leaved tree point clouds based on compressed sensing. Forests 11(3), 257 (2020)CrossRef
Metadaten
Titel
Parameter Tuning for Wavelet-Based Sound Event Detection Using Neural Networks
verfasst von
Pallav Raval
Jabez Christopher
Copyright-Jahr
2021
DOI
https://doi.org/10.1007/978-3-030-72914-1_16

Premium Partner