Skip to main content

2017 | OriginalPaper | Buchkapitel

Effectiveness of Mel Scale-Based ESA-IFCC Features for Classification of Natural vs. Spoofed Speech

verfasst von : Madhu R. Kamble, Hemant A. Patil

Erschienen in: Pattern Recognition and Machine Intelligence

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The performance of biometric systems based on Automatic Speaker Verification (ASV) degrades due to spoofing attacks, generated using different speech synthesis (SS) and voice conversion (VC) techniques. Results of recent ASV spoof 2015 challenge indicate that spoof-aware features are a possible solution, rather than focusing on a powerful classifier. In this paper, we investigate the effect of various frequency scales (such as, ERB, Mel and linear) applied on a Gabor filterbank. The output of filterbank was used to exploit the contribution of instantaneous frequency (IF) in each subband energy via Teager Energy Operator-based Energy Separation Algorithm (TEO-ESA) to capture possible changes in spectral envelope of spoofed speech. The IF is computed from narrowband components of the speech signal and Discrete Cosine Transform (DCT) is applied on deviations in IF, which are referred to as Instantaneous Frequency Cosine Coefficients (IFCC). The classification results on static features shows an EER of 1.32% with Mel frequency scale and 1.87% with linear. The results with delta feature of linear frequency scale gets reduced further to 1.39% whereas, with Mel scale, it increased by 0.64% on development set of ASV spoof 2015 challenge database.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Davis, S., Mermelstein, P.: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Acoust. Speech Sig. Process. 28(4), 357–366 (1980)CrossRef Davis, S., Mermelstein, P.: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Acoust. Speech Sig. Process. 28(4), 357–366 (1980)CrossRef
2.
Zurück zum Zitat Dimitrios, D., Petros, M., Alexandros, P.: Auditory Teager energy Cepstrum coefficients for robust speech recognition. In: INTERSPEECH, pp. 3013–3016 (2005) Dimitrios, D., Petros, M., Alexandros, P.: Auditory Teager energy Cepstrum coefficients for robust speech recognition. In: INTERSPEECH, pp. 3013–3016 (2005)
3.
Zurück zum Zitat Kaiser, J.F.: On a simple algorithm to calculate the energy of a signal. In: International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 381–384, Albuquerque, New Mexico, USA (1990) Kaiser, J.F.: On a simple algorithm to calculate the energy of a signal. In: International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 381–384, Albuquerque, New Mexico, USA (1990)
4.
Zurück zum Zitat Kamble, M.R., Patil, H.A.: Novel energy separation based instantaneous frequency features for spoof speech detection. In: Accepted in European Signal Processing Conference (EUSIPCO), Kos Island, Greece, 28 August–2 September 2017 Kamble, M.R., Patil, H.A.: Novel energy separation based instantaneous frequency features for spoof speech detection. In: Accepted in European Signal Processing Conference (EUSIPCO), Kos Island, Greece, 28 August–2 September 2017
5.
Zurück zum Zitat Maragos, P., Kaiser, J.F., Quatieri, T.F.: Energy separation in signal modulations with application to speech analysis. IEEE Trans. Sig. Process. 41(10), 3024–3051 (1993)CrossRefMATH Maragos, P., Kaiser, J.F., Quatieri, T.F.: Energy separation in signal modulations with application to speech analysis. IEEE Trans. Sig. Process. 41(10), 3024–3051 (1993)CrossRefMATH
6.
Zurück zum Zitat Maragos, P., Kaiser, J.F., Quatieri, T.F.: On separating amplitude from frequency modulations using energy operators. In: International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 2, pp. 1–4, San Francisco, California, USA (1992) Maragos, P., Kaiser, J.F., Quatieri, T.F.: On separating amplitude from frequency modulations using energy operators. In: International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 2, pp. 1–4, San Francisco, California, USA (1992)
7.
Zurück zum Zitat Maragos, P., Kaiser, J.F., Quatieri, T.F.: On amplitude and frequency demodulation using energy operators. IEEE Trans. Sig. Process. 41(4), 1532–1550 (1993)CrossRefMATH Maragos, P., Kaiser, J.F., Quatieri, T.F.: On amplitude and frequency demodulation using energy operators. IEEE Trans. Sig. Process. 41(4), 1532–1550 (1993)CrossRefMATH
8.
Zurück zum Zitat Patil, H.A., Kamble, M.R., Patel, T.B., Soni, M.H.: Novel variable length Teager energy separation based if features for replay detection. In: INTERSPEECH (2017, accepted) Patil, H.A., Kamble, M.R., Patel, T.B., Soni, M.H.: Novel variable length Teager energy separation based if features for replay detection. In: INTERSPEECH (2017, accepted)
9.
Zurück zum Zitat Sailor, H.B., Kamble, M.R., Patil, H.A.: Unsupervised representation learning using convolutional restricted Boltzmann machine for spoof speech detection. In: INTERSPEECH (2017, accepted) Sailor, H.B., Kamble, M.R., Patil, H.A.: Unsupervised representation learning using convolutional restricted Boltzmann machine for spoof speech detection. In: INTERSPEECH (2017, accepted)
10.
Zurück zum Zitat Vijayan, K., Reddy, P.R., Murty, K.S.R.: Significance of analytic phase of speech signals in speaker verification. Speech Commun. 81, 54–71 (2016)CrossRef Vijayan, K., Reddy, P.R., Murty, K.S.R.: Significance of analytic phase of speech signals in speaker verification. Speech Commun. 81, 54–71 (2016)CrossRef
11.
Zurück zum Zitat Wu, Z., Evans, N., Kinnunen, T., Yamagishi, J., Alegre, F., Li, H.: Spoofing and countermeasures for speaker verification: a survey. Speech Commun. 66, 130–153 (2015)CrossRef Wu, Z., Evans, N., Kinnunen, T., Yamagishi, J., Alegre, F., Li, H.: Spoofing and countermeasures for speaker verification: a survey. Speech Commun. 66, 130–153 (2015)CrossRef
12.
Zurück zum Zitat Wu, Z., Kinnunen, T., Evans, N.W.D., Yamagishi, J., Hanilci, C., Sahidullah, M., Sizov, A.: ASVspoof 2015: the first automatic speaker verification spoofing and countermeasures challenge. In: INTERSPEECH, pp. 2037–2041, Dresden, Germany (2015) Wu, Z., Kinnunen, T., Evans, N.W.D., Yamagishi, J., Hanilci, C., Sahidullah, M., Sizov, A.: ASVspoof 2015: the first automatic speaker verification spoofing and countermeasures challenge. In: INTERSPEECH, pp. 2037–2041, Dresden, Germany (2015)
Metadaten
Titel
Effectiveness of Mel Scale-Based ESA-IFCC Features for Classification of Natural vs. Spoofed Speech
verfasst von
Madhu R. Kamble
Hemant A. Patil
Copyright-Jahr
2017
DOI
https://doi.org/10.1007/978-3-319-69900-4_39