Skip to main content

2017 | OriginalPaper | Buchkapitel

Novel Phase Encoded Mel Filterbank Energies for Environmental Sound Classification

verfasst von : Rishabh N. Tak, Dharmesh M. Agrawal, Hemant A. Patil

Erschienen in: Pattern Recognition and Machine Intelligence

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In Environment Sound Classification (ESC) task, only the magnitude spectrum is processed and the phase spectrum is ignored, which leads to degradation in the performance. In this paper, we propose to use phase encoded filterbank energies (PEFBEs) for ESC task. In proposed feature set, we have used Mel-filterbank, since it represents characteristics of human auditory processing. Here, we have used Convolutional Neural Network (CNN) as a pattern classifier. The experiments were performed on ESC-50 database. We found that our proposed PEFBEs feature set gives better results compared to the state-of-the-art Filterbank Energies (FBEs). In addition, score-level fusion of FBEs and proposed PEFBEs have been carried out, which leads to further relatively better performance than the individual feature set. Hence, the proposed PEFBEs captures the complementary information than FBEs alone.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
2.
Zurück zum Zitat Davis, S., Mermelstein, P.: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Acoust. Speech Signal Process. 28(4), 357–366 (1980)CrossRef Davis, S., Mermelstein, P.: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Acoust. Speech Signal Process. 28(4), 357–366 (1980)CrossRef
3.
Zurück zum Zitat Elizalde, B., Lei, H., Friedland, G., Peters, N.: An i-vector based approach for audio scene detection. In: IEEE AASP Challenge on Detection and Classification of Acoustic Scenes and Events (2013) Elizalde, B., Lei, H., Friedland, G., Peters, N.: An i-vector based approach for audio scene detection. In: IEEE AASP Challenge on Detection and Classification of Acoustic Scenes and Events (2013)
4.
Zurück zum Zitat Eronen, A.J., Peltonen, V.T., Tuomi, J.T., Klapuri, A.P., Fagerlund, S., Sorsa, T., Lorho, G., Huopaniemi, J.: Audio-based context recognition. IEEE Trans. Audio Speech Lang. Process. 14(1), 321–329 (2006)CrossRef Eronen, A.J., Peltonen, V.T., Tuomi, J.T., Klapuri, A.P., Fagerlund, S., Sorsa, T., Lorho, G., Huopaniemi, J.: Audio-based context recognition. IEEE Trans. Audio Speech Lang. Process. 14(1), 321–329 (2006)CrossRef
5.
Zurück zum Zitat Li, J., Dai, W., Metze, F., Qu, S., Das, S.: A comparision of deep learning methods for environmental sound detection. In: IEEE International Conference on Acoustics, Speech and Signal Process. (ICASSP), New Orleans, USA, pp. 126–130 (2017) Li, J., Dai, W., Metze, F., Qu, S., Das, S.: A comparision of deep learning methods for environmental sound detection. In: IEEE International Conference on Acoustics, Speech and Signal Process. (ICASSP), New Orleans, USA, pp. 126–130 (2017)
6.
Zurück zum Zitat Piczak, K.J.: Environmental sound classification with convolutional neural networks. 25th International Workshop on Machine Learning for Signal Processing (MLSP), MA, USA, Boston, pp. 1–6 (2015) Piczak, K.J.: Environmental sound classification with convolutional neural networks. 25th International Workshop on Machine Learning for Signal Processing (MLSP), MA, USA, Boston, pp. 1–6 (2015)
7.
Zurück zum Zitat Piczak, K.J.: ESC: Dataset for environmental sound classification. In: Proceedings of the 23rd International Conference on Multimedia, Brisbane, Australia, pp. 1015–1018 (2015) Piczak, K.J.: ESC: Dataset for environmental sound classification. In: Proceedings of the 23rd International Conference on Multimedia, Brisbane, Australia, pp. 1015–1018 (2015)
8.
Zurück zum Zitat Raitio, T., Juvela, L., Suni, A., Vainio, M., Alku, P.: Phase perception of the glottal excitation and its relevance in statistical parametric speech synthesis. Speech Commun. 81, 104–119 (2016)CrossRef Raitio, T., Juvela, L., Suni, A., Vainio, M., Alku, P.: Phase perception of the glottal excitation and its relevance in statistical parametric speech synthesis. Speech Commun. 81, 104–119 (2016)CrossRef
9.
Zurück zum Zitat Salamon, J., Bello, J.P.: Deep convolutional neural networks and data augmentation for environmental sound classification. IEEE Signal Process. Lett. 24(3), 279–283 (2017)CrossRef Salamon, J., Bello, J.P.: Deep convolutional neural networks and data augmentation for environmental sound classification. IEEE Signal Process. Lett. 24(3), 279–283 (2017)CrossRef
10.
Zurück zum Zitat Saratxaga, I., Sanchez, J., Wu, Z., Hernaez, I., Navas, E.: Synthetic speech detection using phase information. Speech Commun. 81, 30–41 (2016)CrossRef Saratxaga, I., Sanchez, J., Wu, Z., Hernaez, I., Navas, E.: Synthetic speech detection using phase information. Speech Commun. 81, 30–41 (2016)CrossRef
11.
Zurück zum Zitat Seelamantula, C.S.: Phase-encoded speech spectrograms. In: INTERSPEECH, San Francisco, USA, pp. 1775–1779 (2016) Seelamantula, C.S.: Phase-encoded speech spectrograms. In: INTERSPEECH, San Francisco, USA, pp. 1775–1779 (2016)
12.
Zurück zum Zitat Shenoy, B.A., Mulleti, S., Seelamantula, C.S.: Exact phase retrieval in principal shift-invariant spaces. IEEE Trans. Signal Process. 64(2), 406–416 (2016)CrossRefMathSciNet Shenoy, B.A., Mulleti, S., Seelamantula, C.S.: Exact phase retrieval in principal shift-invariant spaces. IEEE Trans. Signal Process. 64(2), 406–416 (2016)CrossRefMathSciNet
13.
Zurück zum Zitat Tokozume, Y., Harada, T.: Learning environmental sound with end-to-end convolutional neural network. In: IEEE International Conference on Acoustics, Speech and Signal Process (ICASSP), New Orleans, USA, pp. 2721–2725 (2017) Tokozume, Y., Harada, T.: Learning environmental sound with end-to-end convolutional neural network. In: IEEE International Conference on Acoustics, Speech and Signal Process (ICASSP), New Orleans, USA, pp. 2721–2725 (2017)
14.
Zurück zum Zitat Wu, Z., Siong, C.E., Li, H.: Detecting converted speech and natural speech for anti-spoofing attack in speaker recognition. In: INTERSPEECH, Portland, Oregon, USA, pp. 1700–1703 (2012) Wu, Z., Siong, C.E., Li, H.: Detecting converted speech and natural speech for anti-spoofing attack in speaker recognition. In: INTERSPEECH, Portland, Oregon, USA, pp. 1700–1703 (2012)
15.
Zurück zum Zitat Yegnanarayana, B., Saikia, D., Krishnan, T.: Significance of group delay functions in signal reconstruction from spectral magnitude or phase. IEEE Trans. Acoust. Speech Signal Process. 32(3), 610–623 (1984)CrossRef Yegnanarayana, B., Saikia, D., Krishnan, T.: Significance of group delay functions in signal reconstruction from spectral magnitude or phase. IEEE Trans. Acoust. Speech Signal Process. 32(3), 610–623 (1984)CrossRef
16.
Zurück zum Zitat Zhizheng, Kinnunen, T., Evans, N.W.D., Yamagishi, J., Hanilçi, C., Sahidullah, M., Sizov, A.: ASVspoof 2015: the first automatic speaker verification spoofing and countermeasures challenge, In: INTERSPEECH, Dresden, Germany, pp. 2037–2041 (2015) Zhizheng, Kinnunen, T., Evans, N.W.D., Yamagishi, J., Hanilçi, C., Sahidullah, M., Sizov, A.: ASVspoof 2015: the first automatic speaker verification spoofing and countermeasures challenge, In: INTERSPEECH, Dresden, Germany, pp. 2037–2041 (2015)
Metadaten
Titel
Novel Phase Encoded Mel Filterbank Energies for Environmental Sound Classification
verfasst von
Rishabh N. Tak
Dharmesh M. Agrawal
Hemant A. Patil
Copyright-Jahr
2017
DOI
https://doi.org/10.1007/978-3-319-69900-4_40