nach oben

Erschienen in:

2021 | OriginalPaper | Buchkapitel

Parameter Tuning for Wavelet-Based Sound Event Detection Using Neural Networks

verfasst von : Pallav Raval, Jabez Christopher

Erschienen in: Artificial Intelligence in Music, Sound, Art and Design

Verlag: Springer International Publishing

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Wavelet-based audio processing is used for sound event detection. The low-level audio features (timbral or temporal features) are found to be effective to differentiate between different sound events and that is why frequency processing algorithms have become popular in recent times. Wavelet based sound event detection is found effective to detect sudden onsets in audio signals because it offers unique advantages compared to traditional frequency-based sound event detection using machine learning approaches. In this work, wavelet transform is applied to the audio to extract audio features which can predict the occurrence of a sound event using a classical feedforward neural network. Additionally, this work attempts to identify the optimal wavelet parameters to enhance classification performance. 3 window sizes, 6 wavelet families, 4 wavelet levels, 3 decomposition levels and 2 classifier models are used for experimental analysis. The UrbanSound8k data is used and a classification accuracy up to 97% is obtained. Some major observations with regard to parameter-estimation are as follows: wavelet level and wavelet decomposition level should be low; it is desirable to have a large window; however, the window size is limited by the duration of the sound event. A window size greater than the duration of the sound event will decrease classification performance. Most of the wavelet families can classify the sound events; however, using Symlet, Daubechies, Reverse biorthogonal and Biorthogonal families will save computational resources (lesser epochs) because they yield better accuracy compared to Fejér-Korovkin and Coiflets. This work conveys that wavelet-based sound event detection seems promising, and can be extended to detect most of the common sounds and sudden events occurring at various environments.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel SerumRNN: Step by Step Audio VST Effect Programming

Nächstes Kapitel Raga Recognition in Indian Classical Music Using Deep Learning

Lavner, Y., Cohen, R., Ruinskiy, D., IJzerman, H.: Baby cry detection in domestic environment using deep learning. In: 2016 IEEE International Conference on the Science of Electrical Engineering (ICSEE), pp. 1–5. IEEE (2016)

Lilja, A.P., Raboshchuk, G., Nadeu, C.: A neural network approach for automatic detection of acoustic alarms. In: BIOSIGNALS, pp. 84–91 (2017)

Surampudi, N., Srirangan, M., Christopher, J.: Enhanced feature extraction approaches for detection of sound events. In: 2019 IEEE 9th International Conference on Advanced Computing (IACC), pp. 223–229 (2019)

Upadhyay, S.G., Bo-Hao, S., Lee, C.-C.: Attentive convolutional recurrent neural network using phoneme-level acoustic representation for rare sound event detection. Proc. Interspeech 2020, 3102–3106 (2020)CrossRef

Xu, Z., Yang, Y., Hauptmann, A.G.: A discriminative CNN video representation for event detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1798–1807 (2015)

Salamon, J., Bello, J.P.: Deep convolutional neural networks and data augmentation for environmental sound classification. IEEE Signal Process. Lett. 24(3), 279–283 (2017)

Cho, K., Van Merriënboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. arXiv preprint arXiv:1409.1259 (2014)

Hayashi, T., Watanabe, S., Toda, T., Hori, T., Le Roux, J., Takeda, K.: Duration-controlled LSTM for polyphonic sound event detection. IEEE/ACM Trans. Audio Speech Lang. Process. 25(11), 2059–2070 (2017)

Cakır, E., Parascandolo, G., Heittola, T., Huttunen, H., Virtanen, T.: Convolutional recurrent neural networks for polyphonic sound event detection. IEEE/ACM Trans. Audio Speech Lang. Process. 25(6), 1291–1303 (2017)CrossRef

10.

Choi, K., Fazekas, G., Sandler, M., Cho, K.: Convolutional recurrent neural networks for music classification. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2392–2396. IEEE (2017)

11.

Martín-Morató, I., Mesaros, A., Heittola, T., Virtanen, T., Cobos, M., Ferri, F.J.: Sound event envelope estimation in polyphonic mixtures. In: ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 935–939. IEEE (2019)

12.

Wan, Y., et al.: Precise temporal localization of sudden onsets in audio signals using the wavelet approach. In: Audio Engineering Society Convention 147. Audio Engineering Society (2019)

13.

Sifuzzaman, M., Rafiq Islam, M., Ali, M.Z.: Application of wavelet transform and its advantages compared to Fourier transform (2009)

14.

Yu, G., Bacry, E., Mallat, S.: Audio signal denoising with complex wavelets and adaptive block attenuation. In: 2007 IEEE International Conference on Acoustics, Speech and Signal Processing-ICASSP 2007, vol. 3, pp. III-869. IEEE (2007)

15.

Salamon, J., Christopher, J., Bello, J.P.: A dataset and taxonomy for urban sound research. In: Proceedings of the 22nd ACM international conference on Multimedia, pp. 1041–1044 (2014)

16.

Heittola, T., Mesaros, A., Virtanen, T.: TUT urban acoustic scenes 2018. Development dataset [Data set]. Zenodo (2018). http://doi.org/10.5281/zenodo.1228142

17.

Xu, R., Yun, T., Cao, L., Liu, Y.: Compression and recovery of 3D broad-leaved tree point clouds based on compressed sensing. Forests 11(3), 257 (2020)CrossRef

Titel: Parameter Tuning for Wavelet-Based Sound Event Detection Using Neural Networks
verfasst von: Pallav Raval
Jabez Christopher
Verlag: Springer International Publishing
Buch: Artificial Intelligence in Music, Sound, Art and Design
Print ISBN: 978-3-030-72913-4

Electronic ISBN: 978-3-030-72914-1

Copyright-Jahr: 2021
DOI: https://doi.org/10.1007/978-3-030-72914-1_16

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner