nach oben

Erschienen in:

2017 | OriginalPaper | Buchkapitel

Novel Phase Encoded Mel Filterbank Energies for Environmental Sound Classification

verfasst von : Rishabh N. Tak, Dharmesh M. Agrawal, Hemant A. Patil

Erschienen in: Pattern Recognition and Machine Intelligence

Verlag: Springer International Publishing

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

In Environment Sound Classification (ESC) task, only the magnitude spectrum is processed and the phase spectrum is ignored, which leads to degradation in the performance. In this paper, we propose to use phase encoded filterbank energies (PEFBEs) for ESC task. In proposed feature set, we have used Mel-filterbank, since it represents characteristics of human auditory processing. Here, we have used Convolutional Neural Network (CNN) as a pattern classifier. The experiments were performed on ESC-50 database. We found that our proposed PEFBEs feature set gives better results compared to the state-of-the-art Filterbank Energies (FBEs). In addition, score-level fusion of FBEs and proposed PEFBEs have been carried out, which leads to further relatively better performance than the individual feature set. Hence, the proposed PEFBEs captures the complementary information than FBEs alone.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel Effectiveness of Mel Scale-Based ESA-IFCC Features for Classification of Natural vs. Spoofed Speech

Nächstes Kapitel An Adaptive i-Vector Extraction for Speaker Verification with Short Utterance

Chollet, F.: Keras. https://github.com/fchollet/keras. Accessed on 26 Feb 2017

Davis, S., Mermelstein, P.: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Acoust. Speech Signal Process. 28(4), 357–366 (1980)CrossRef

Elizalde, B., Lei, H., Friedland, G., Peters, N.: An i-vector based approach for audio scene detection. In: IEEE AASP Challenge on Detection and Classification of Acoustic Scenes and Events (2013)

Eronen, A.J., Peltonen, V.T., Tuomi, J.T., Klapuri, A.P., Fagerlund, S., Sorsa, T., Lorho, G., Huopaniemi, J.: Audio-based context recognition. IEEE Trans. Audio Speech Lang. Process. 14(1), 321–329 (2006)CrossRef

Li, J., Dai, W., Metze, F., Qu, S., Das, S.: A comparision of deep learning methods for environmental sound detection. In: IEEE International Conference on Acoustics, Speech and Signal Process. (ICASSP), New Orleans, USA, pp. 126–130 (2017)

Piczak, K.J.: Environmental sound classification with convolutional neural networks. 25th International Workshop on Machine Learning for Signal Processing (MLSP), MA, USA, Boston, pp. 1–6 (2015)

Piczak, K.J.: ESC: Dataset for environmental sound classification. In: Proceedings of the 23rd International Conference on Multimedia, Brisbane, Australia, pp. 1015–1018 (2015)

Raitio, T., Juvela, L., Suni, A., Vainio, M., Alku, P.: Phase perception of the glottal excitation and its relevance in statistical parametric speech synthesis. Speech Commun. 81, 104–119 (2016)CrossRef

Salamon, J., Bello, J.P.: Deep convolutional neural networks and data augmentation for environmental sound classification. IEEE Signal Process. Lett. 24(3), 279–283 (2017)CrossRef

10.

Saratxaga, I., Sanchez, J., Wu, Z., Hernaez, I., Navas, E.: Synthetic speech detection using phase information. Speech Commun. 81, 30–41 (2016)CrossRef

11.

Seelamantula, C.S.: Phase-encoded speech spectrograms. In: INTERSPEECH, San Francisco, USA, pp. 1775–1779 (2016)

12.

Shenoy, B.A., Mulleti, S., Seelamantula, C.S.: Exact phase retrieval in principal shift-invariant spaces. IEEE Trans. Signal Process. 64(2), 406–416 (2016)CrossRefMathSciNet

13.

Tokozume, Y., Harada, T.: Learning environmental sound with end-to-end convolutional neural network. In: IEEE International Conference on Acoustics, Speech and Signal Process (ICASSP), New Orleans, USA, pp. 2721–2725 (2017)

14.

Wu, Z., Siong, C.E., Li, H.: Detecting converted speech and natural speech for anti-spoofing attack in speaker recognition. In: INTERSPEECH, Portland, Oregon, USA, pp. 1700–1703 (2012)

15.

Yegnanarayana, B., Saikia, D., Krishnan, T.: Significance of group delay functions in signal reconstruction from spectral magnitude or phase. IEEE Trans. Acoust. Speech Signal Process. 32(3), 610–623 (1984)CrossRef

16.

Zhizheng, Kinnunen, T., Evans, N.W.D., Yamagishi, J., Hanilçi, C., Sahidullah, M., Sizov, A.: ASVspoof 2015: the first automatic speaker verification spoofing and countermeasures challenge, In: INTERSPEECH, Dresden, Germany, pp. 2037–2041 (2015)

Titel: Novel Phase Encoded Mel Filterbank Energies for Environmental Sound Classification
verfasst von: Rishabh N. Tak
Dharmesh M. Agrawal
Hemant A. Patil
Verlag: Springer International Publishing
Buch: Pattern Recognition and Machine Intelligence
Print ISBN: 978-3-319-69899-1

Electronic ISBN: 978-3-319-69900-4

Copyright-Jahr: 2017
DOI: https://doi.org/10.1007/978-3-319-69900-4_40

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"