Skip to main content
Erschienen in: International Journal of Speech Technology 1/2019

20.02.2019

Replay spoofing countermeasures using high spectro-temporal resolution features

verfasst von: K. N. R. K. Raju Alluri, Anil Kumar Vuppala

Erschienen in: International Journal of Speech Technology | Ausgabe 1/2019

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The easy implementation of replay attacks by a fraudster poses a severe threat to automatic speaker verification (ASV) technology than the other spoofing attacks like speech synthesis and voice conversion. Replay attacks refer to an attack by a fraudster to get illegitimate access to an ASV system by playing back the speech sample collected from genuine target speaker. The significant cues that can differentiate between genuine and replay recordings are channel characteristics. To capture these characteristics, one need to extract features from the spectrum, which should have high spectral and temporal resolutions. Zero time windowing (ZTW) analysis of speech is one such time-frequency analysis technique, which results in high spectral and temporal resolution spectrum at each sampling instant. In this study, new features are proposed by applying cepstral analysis to ZTW spectrum. Experiments are performed on two publicly available replay attack databases namely BTAS 2016 and ASVspoof 2017. The first set of experiments are conducted using Gaussian mixture models to evaluate the potential of proposed features. Performance of the proposed system in terms of half total error rate is 0.75% and in terms of equal error rate is 14.75% on BTAS 2016 and ASVspoof 2017 evaluation sets respectively. A score level fusion is performed by using proposed features with previously proposed single frequency filtering cepstral coefficients. This fused result outperformed the previously reported best results on these two datasets.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Alluri, K. R., Achanta, S., Kadiri, S. R., Gangashetty, S. V., & Vuppala, A. K. (2017a). Detection of replay attacks using single frequency filtering cepstral coefficients. In Proceedings of the Interspeech 2017 (pp. 2596–2600). Alluri, K. R., Achanta, S., Kadiri, S. R., Gangashetty, S. V., & Vuppala, A. K. (2017a). Detection of replay attacks using single frequency filtering cepstral coefficients. In Proceedings of the Interspeech 2017 (pp. 2596–2600).
Zurück zum Zitat Alluri, K. R., Achanta, S., Kadiri, S. R., Gangashetty, S. V., & Vuppala, A. K. (2017b). Sff anti-spoofer: Iiit-h submission for automatic speaker verification spoofing and countermeasures challenge 2017. In Proceedings of the Interspeech (pp. 107–111). Alluri, K. R., Achanta, S., Kadiri, S. R., Gangashetty, S. V., & Vuppala, A. K. (2017b). Sff anti-spoofer: Iiit-h submission for automatic speaker verification spoofing and countermeasures challenge 2017. In Proceedings of the Interspeech (pp. 107–111).
Zurück zum Zitat Aneeja, G., & Yegnanarayana, B. (2015). Single frequency filtering approach for discriminating speech and nonspeech. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 23(4), 705–717.CrossRef Aneeja, G., & Yegnanarayana, B. (2015). Single frequency filtering approach for discriminating speech and nonspeech. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 23(4), 705–717.CrossRef
Zurück zum Zitat Bayya, Y., & Gowda, D. N. (2013). Spectro-temporal analysis of speech signals using zero-time windowing and group delay function. Speech Communication, 55(6), 782–795.CrossRef Bayya, Y., & Gowda, D. N. (2013). Spectro-temporal analysis of speech signals using zero-time windowing and group delay function. Speech Communication, 55(6), 782–795.CrossRef
Zurück zum Zitat Brümmer, N., & de Villiers, E. (2013). The BOSARIS Toolkit: Theory, algorithms and code for surviving the New DCF. arXiv preprint arXiv:1304.2865. Brümmer, N., & de Villiers, E. (2013). The BOSARIS Toolkit: Theory, algorithms and code for surviving the New DCF. arXiv preprint arXiv:​1304.​2865.
Zurück zum Zitat Cai, W., Cai, D., Liu, W., Li, G., & Li, M. (2017). Countermeasures for automatic speaker verification replay spoofing attack : On data augmentation, feature representation, classification and fusion. In Proceedings of the Interspeech 2017 (pp. 17–21). Cai, W., Cai, D., Liu, W., Li, G., & Li, M. (2017). Countermeasures for automatic speaker verification replay spoofing attack : On data augmentation, feature representation, classification and fusion. In Proceedings of the Interspeech 2017 (pp. 17–21).
Zurück zum Zitat Chen, Z., Xie, Z., Zhang, W., & Xu, X. (2017). Resnet and model fusion for automatic spoofing detection. In Proceedings of the Interspeech 2017 (pp. 102–106). Chen, Z., Xie, Z., Zhang, W., & Xu, X. (2017). Resnet and model fusion for automatic spoofing detection. In Proceedings of the Interspeech 2017 (pp. 102–106).
Zurück zum Zitat Delgado, H., Todisco, M., Sahidullah, M., Evans, N., Kinnunen, T., Lee, K. A., & Yamagishi, J. (2018). Asvspoof 2017 version 2.0: Meta-data analysis and baseline enhancements. In Proceedings of the Odyssey 2018 the speaker and language recognition workshop (pp. 296–303). Delgado, H., Todisco, M., Sahidullah, M., Evans, N., Kinnunen, T., Lee, K. A., & Yamagishi, J. (2018). Asvspoof 2017 version 2.0: Meta-data analysis and baseline enhancements. In Proceedings of the Odyssey 2018 the speaker and language recognition workshop (pp. 296–303).
Zurück zum Zitat Ergünay, S. K., Khoury, E., Lazaridis, A., & Marcel, S. (2015). On the vulnerability of speaker verification to realistic voice spoofing. In Proceedings of the BTAS (pp. 1–6). Ergünay, S. K., Khoury, E., Lazaridis, A., & Marcel, S. (2015). On the vulnerability of speaker verification to realistic voice spoofing. In Proceedings of the BTAS (pp. 1–6).
Zurück zum Zitat Font, R., Espn, J. M., & Cano, M. J. (2017). Experimental analysis of features for replay attack detection results on the ASVspoof 2017 challenge. In Proceedings of the Interspeech 2017 (pp. 7–11). Font, R., Espn, J. M., & Cano, M. J. (2017). Experimental analysis of features for replay attack detection results on the ASVspoof 2017 challenge. In Proceedings of the Interspeech 2017 (pp. 7–11).
Zurück zum Zitat Furui, S. (1981). Cepstral analysis technique for automatic speaker verification. IEEE Transactions on Acoustics, Speech, and Signal Processing, 29(2), 254–272.CrossRef Furui, S. (1981). Cepstral analysis technique for automatic speaker verification. IEEE Transactions on Acoustics, Speech, and Signal Processing, 29(2), 254–272.CrossRef
Zurück zum Zitat Hanilçi, C. (2018). Data selection for i-vector based automatic speaker verification anti-spoofing. Digital Signal Processing, 72, 171–180.CrossRef Hanilçi, C. (2018). Data selection for i-vector based automatic speaker verification anti-spoofing. Digital Signal Processing, 72, 171–180.CrossRef
Zurück zum Zitat Jelil, S., Das, R. K., Prasanna, S. M., & Sinha, R. (2017). Spoof detection using source, instantaneous frequency and cepstral features. In Proceedings of the Interspeech 2017 (pp. 22–26). Jelil, S., Das, R. K., Prasanna, S. M., & Sinha, R. (2017). Spoof detection using source, instantaneous frequency and cepstral features. In Proceedings of the Interspeech 2017 (pp. 22–26).
Zurück zum Zitat Ji, Z., Li, Z.-Y., Li, P., An, M., Gao, S., Wu, D., & Zhao, F. (2017). Ensemble learning for countermeasure of audio replay spoofing attack in ASVspoof 2017. In Proceedings of the Interspeech 2017 (pp. 87–91). Ji, Z., Li, Z.-Y., Li, P., An, M., Gao, S., Wu, D., & Zhao, F. (2017). Ensemble learning for countermeasure of audio replay spoofing attack in ASVspoof 2017. In Proceedings of the Interspeech 2017 (pp. 87–91).
Zurück zum Zitat Kinnunen, T., Sahidullah, M., Delgado, H., Todisco, M., Evans, N.,Yamagishi, J., & Lee, K. A. (2017a). The ASVspoof 2017 challenge: Assessing the limits of replay spoofing attack detection. In Proceedings of the 18th annual conference of the international speech communication association (pp. 2–6). Kinnunen, T., Sahidullah, M., Delgado, H., Todisco, M., Evans, N.,Yamagishi, J., & Lee, K. A. (2017a). The ASVspoof 2017 challenge: Assessing the limits of replay spoofing attack detection. In Proceedings of the 18th annual conference of the international speech communication association (pp. 2–6).
Zurück zum Zitat Kinnunen, T., Sahidullah, M., Falcone, M., Costantini, L., Hautamaki, R. G., Thomsen, D. A. L., Sarkar, A. K., Tan, Z.-H., Delgado, H., & Todisco, M., et al. (2017b). RedDots replayed: A new replay spoofing attack corpus for text-dependent speaker verification research. In IEEE international conference on acoustics, speech and signal processing (ICASSP), New Orleans, LA, 2017 (pp. 5395–5399) Kinnunen, T., Sahidullah, M., Falcone, M., Costantini, L., Hautamaki, R. G., Thomsen, D. A. L., Sarkar, A. K., Tan, Z.-H., Delgado, H., & Todisco, M., et al. (2017b). RedDots replayed: A new replay spoofing attack corpus for text-dependent speaker verification research. In IEEE international conference on acoustics, speech and signal processing (ICASSP), New Orleans, LA, 2017 (pp. 5395–5399)
Zurück zum Zitat Kinnunen, T., Sahidullah, M., Kukanov, I., Delgado, H., Todisco, M., Sarkar, A. K., Thomsen, N. B., Hautamäki, V., Evans, N. W., & Tan, Z.-H. (2016). Utterance verification for text-dependent speaker recognition: A comparative assessment using the reddots corpus. In Proceedings of the Interspeech (pp. 430–434). Kinnunen, T., Sahidullah, M., Kukanov, I., Delgado, H., Todisco, M., Sarkar, A. K., Thomsen, N. B., Hautamäki, V., Evans, N. W., & Tan, Z.-H. (2016). Utterance verification for text-dependent speaker recognition: A comparative assessment using the reddots corpus. In Proceedings of the Interspeech (pp. 430–434).
Zurück zum Zitat Korshunov, P., & Marcel, S. (2016). Cross-database evaluation of audio-based spoofing detection systems. In Proceedings of the Interspeech (pp. 1705–1709). Korshunov, P., & Marcel, S. (2016). Cross-database evaluation of audio-based spoofing detection systems. In Proceedings of the Interspeech (pp. 1705–1709).
Zurück zum Zitat Korshunov, P., Marcel, S., Muckenhirn, H., Gonçalves, A., Mello, A. S., Violato, R. V., Simoes, F., Neto, M., de Assis Angeloni, M., Stuchi, J., et al. (2016). Overview of BTAS 2016 speaker anti-spoofing competition. In 2016 IEEE 8th international conference on biometrics theory, applications and systems (BTAS) (pp. 1–6). Korshunov, P., Marcel, S., Muckenhirn, H., Gonçalves, A., Mello, A. S., Violato, R. V., Simoes, F., Neto, M., de Assis Angeloni, M., Stuchi, J., et al. (2016). Overview of BTAS 2016 speaker anti-spoofing competition. In 2016 IEEE 8th international conference on biometrics theory, applications and systems (BTAS) (pp. 1–6).
Zurück zum Zitat Lavrentyeva, G., Novoselov, S., Malykh, E., Kozlov, A., Kudashev, O., & Shchemelinin, V. (2017). Audio replay attack detection with deep learning frameworks. In Proceedings of the Interspeech (pp. 82–86). Lavrentyeva, G., Novoselov, S., Malykh, E., Kozlov, A., Kudashev, O., & Shchemelinin, V. (2017). Audio replay attack detection with deep learning frameworks. In Proceedings of the Interspeech (pp. 82–86).
Zurück zum Zitat Li, L., Chen, Y., Wang, D., & Zheng, T. F. (2017). A study on replay attack and anti-spoofing for automatic speaker verification. In Proceedings of the Interspeech 2017 (pp. 92–96). Li, L., Chen, Y., Wang, D., & Zheng, T. F. (2017). A study on replay attack and anti-spoofing for automatic speaker verification. In Proceedings of the Interspeech 2017 (pp. 92–96).
Zurück zum Zitat Nagarsheth, P., Khoury, E., Patil, K., & Garland, M. (2017). Replay attack detection using DNN for channel discrimination. In Proceedings of the Interspeech 2017 (pp. 97–101). Nagarsheth, P., Khoury, E., Patil, K., & Garland, M. (2017). Replay attack detection using DNN for channel discrimination. In Proceedings of the Interspeech 2017 (pp. 97–101).
Zurück zum Zitat Pati, D., & Prasanna, S. M. (2013). A comparative study of explicit and implicit modelling of subsegmental speaker-specific excitation source information. Sadhana, 38(4), 591–620.MathSciNetCrossRefMATH Pati, D., & Prasanna, S. M. (2013). A comparative study of explicit and implicit modelling of subsegmental speaker-specific excitation source information. Sadhana, 38(4), 591–620.MathSciNetCrossRefMATH
Zurück zum Zitat Patil, H. A., Kamble, M. R., Patel, T. B., & Soni, M. H. (2017). Novel variable length teager energy separation based instantaneous frequency features for replay detection. In Proceedings of the Interspeech 2017 (pp. 12–16). Patil, H. A., Kamble, M. R., Patel, T. B., & Soni, M. H. (2017). Novel variable length teager energy separation based instantaneous frequency features for replay detection. In Proceedings of the Interspeech 2017 (pp. 12–16).
Zurück zum Zitat Paul, D., Sahidullah, M., & Saha, G. (2017). Generalization ofspoofing countermeasures: A case study with ASVspoof 2015 and BTAS 2016 corpora. In 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 2047–2051). Paul, D., Sahidullah, M., & Saha, G. (2017). Generalization ofspoofing countermeasures: A case study with ASVspoof 2015 and BTAS 2016 corpora. In 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 2047–2051).
Zurück zum Zitat Sahidullah, M., Kinnunen, T., & Hanilçi, C. (2015). Acomparison of features for synthetic speech detection. In Proceedings of the Interspeech (pp. 2087–2091). Sahidullah, M., Kinnunen, T., & Hanilçi, C. (2015). Acomparison of features for synthetic speech detection. In Proceedings of the Interspeech (pp. 2087–2091).
Zurück zum Zitat Sahidullah, M., Thomsen, D. A. L., Hautamäki, R. G., Kinnunen, T., Tan, Z.-H., Parts, R., et al. (2018). Robust voice liveness detection and speaker verification using throat microphones. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 26(1), 44–56.CrossRef Sahidullah, M., Thomsen, D. A. L., Hautamäki, R. G., Kinnunen, T., Tan, Z.-H., Parts, R., et al. (2018). Robust voice liveness detection and speaker verification using throat microphones. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 26(1), 44–56.CrossRef
Zurück zum Zitat Shang, W., & Stevenson, M. (2010). Score normalization in playbackattack detection. In 2010 IEEE international conference on acoustics, speech and signal processing (pp. 1678–1681). Shang, W., & Stevenson, M. (2010). Score normalization in playbackattack detection. In 2010 IEEE international conference on acoustics, speech and signal processing (pp. 1678–1681).
Zurück zum Zitat Shiota, S., Villavicencio, F., Yamagishi, J., Ono, N., Echizen, I., & Matsui, T. (2016). Voice liveness detection for speaker verification based on a tandem single/double-channel pop noise detector. In Proceedings of the Odyssey: Speaker language recognition workshop (Vol. 2016, pp. 259–263). Shiota, S., Villavicencio, F., Yamagishi, J., Ono, N., Echizen, I., & Matsui, T. (2016). Voice liveness detection for speaker verification based on a tandem single/double-channel pop noise detector. In Proceedings of the Odyssey: Speaker language recognition workshop (Vol. 2016, pp. 259–263).
Zurück zum Zitat Todisco, M., Delgado, H., & Evans, N. (2016). A new feature for automatic speaker verification anti-spoofing: Constant Q cepstral coefficients. In Proceedings of the Speaker Odyssey Workshop, Bilbao, Spain (Vol. 25, pp. 249–252). Todisco, M., Delgado, H., & Evans, N. (2016). A new feature for automatic speaker verification anti-spoofing: Constant Q cepstral coefficients. In Proceedings of the Speaker Odyssey Workshop, Bilbao, Spain (Vol. 25, pp. 249–252).
Zurück zum Zitat Todisco, M., Delgado, H., & Evans, N. (2017). Constant q cepstral coefficients: A spoofing countermeasure for automatic speaker verification. Computer Speech and Language, 45, 516–535.CrossRef Todisco, M., Delgado, H., & Evans, N. (2017). Constant q cepstral coefficients: A spoofing countermeasure for automatic speaker verification. Computer Speech and Language, 45, 516–535.CrossRef
Zurück zum Zitat Villalba, J., & Lleida, E. (2011a). Detecting replay attacks from far-field recordings on speaker verification systems. In Proceedings of the European workshop on biometrics and identity management (pp. 274–285). Villalba, J., & Lleida, E. (2011a). Detecting replay attacks from far-field recordings on speaker verification systems. In Proceedings of the European workshop on biometrics and identity management (pp. 274–285).
Zurück zum Zitat Villalba, J., & Lleida, E. (2011b). Preventing replay attacks onspeaker verification systems. In IEEE international carnahan conference on security technology (ICCST) (pp. 1–8). Villalba, J., & Lleida, E. (2011b). Preventing replay attacks onspeaker verification systems. In IEEE international carnahan conference on security technology (ICCST) (pp. 1–8).
Zurück zum Zitat Wang, X., Xiao, Y., & Zhu, X. (2017). Feature selection based on CQCCS for automatic speaker verification spoofing. In Proceedings of the Interspeech 2017 (pp. 32–36). Wang, X., Xiao, Y., & Zhu, X. (2017). Feature selection based on CQCCS for automatic speaker verification spoofing. In Proceedings of the Interspeech 2017 (pp. 32–36).
Zurück zum Zitat Wang, Z.-F., Wei, G., & He, Q.-H. (2011). Channel pattern noise based playback attack detection algorithm for speaker recognition. In International conference on machine learning and cybernetics, Guilin, 2011 (pp. 1708–1713). Wang, Z.-F., Wei, G., & He, Q.-H. (2011). Channel pattern noise based playback attack detection algorithm for speaker recognition. In International conference on machine learning and cybernetics, Guilin, 2011 (pp. 1708–1713).
Zurück zum Zitat Witkowski, M., Kacprzak, S, Elasko, P., Kowalczyk, K., & Gaka, J. (2017). Audio replay attack detection using high-frequency features. In Proceedings of the Interspeech 2017 (pp. 27–31). Witkowski, M., Kacprzak, S, Elasko, P., Kowalczyk, K., & Gaka, J. (2017). Audio replay attack detection using high-frequency features. In Proceedings of the Interspeech 2017 (pp. 27–31).
Zurück zum Zitat Wu, Z., Evans, N., Kinnunen, T., Yamagishi, J., Alegre, F., & Li, H. (2015). Spoofing and countermeasures for speaker verification: A survey. Speech Communication, 66, 130–153.CrossRef Wu, Z., Evans, N., Kinnunen, T., Yamagishi, J., Alegre, F., & Li, H. (2015). Spoofing and countermeasures for speaker verification: A survey. Speech Communication, 66, 130–153.CrossRef
Metadaten
Titel
Replay spoofing countermeasures using high spectro-temporal resolution features
verfasst von
K. N. R. K. Raju Alluri
Anil Kumar Vuppala
Publikationsdatum
20.02.2019
Verlag
Springer US
Erschienen in
International Journal of Speech Technology / Ausgabe 1/2019
Print ISSN: 1381-2416
Elektronische ISSN: 1572-8110
DOI
https://doi.org/10.1007/s10772-019-09602-z

Weitere Artikel der Ausgabe 1/2019

International Journal of Speech Technology 1/2019 Zur Ausgabe

Neuer Inhalt