Skip to main content
Top

2019 | OriginalPaper | Chapter

15. Introduction to Voice Presentation Attack Detection and Recent Advances

Authors : Md Sahidullah, Héctor Delgado, Massimiliano Todisco, Tomi Kinnunen, Nicholas Evans, Junichi Yamagishi, Kong-Aik Lee

Published in: Handbook of Biometric Anti-Spoofing

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Over the past few years, significant progress has been made in the field of presentation attack detection (PAD) for automatic speaker recognition (ASV). This includes the development of new speech corpora, standard evaluation protocols and advancements in front-end feature extraction and back-end classifiers. The use of standard databases and evaluation protocols has enabled for the first time the meaningful benchmarking of different PAD solutions. This chapter summarises the progress, with a focus on studies completed in the last 3 years. The article presents a summary of findings and lessons learned from two ASVspoof challenges, the first community-led benchmarking efforts. These show that ASV PAD remains an unsolved problem and that further attention is required to develop generalised PAD solutions which have potential to detect diverse and previously unseen spoofing attacks.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Appendix
Available only for authorised users
Literature
2.
go back to reference Hansen J, Hasan T (2015) Speaker recognition by machines and humans: a tutorial review. IEEE Signal Process Mag 32(6):74–99CrossRef Hansen J, Hasan T (2015) Speaker recognition by machines and humans: a tutorial review. IEEE Signal Process Mag 32(6):74–99CrossRef
3.
go back to reference ISO/IEC 30107: Information technology—biometric presentation attack detection. International Organization for Standardization (2016) ISO/IEC 30107: Information technology—biometric presentation attack detection. International Organization for Standardization (2016)
4.
go back to reference Kinnunen T, Sahidullah M, Kukanov I, Delgado H, Todisco M, Sarkar A, Thomsen N, Hautamäki V, Evans N, Tan ZH (2016) Utterance verification for text-dependent speaker recognition: a comparative assessment using the reddots corpus. In: Proceedings of Interspeech, pp 430–434 Kinnunen T, Sahidullah M, Kukanov I, Delgado H, Todisco M, Sarkar A, Thomsen N, Hautamäki V, Evans N, Tan ZH (2016) Utterance verification for text-dependent speaker recognition: a comparative assessment using the reddots corpus. In: Proceedings of Interspeech, pp 430–434
5.
go back to reference Shang, W, Stevenson, M. (2010). Score normalization in playback attack detection. In: Proceedings of ICASSP. IEEE, pp 1678–1681 Shang, W, Stevenson, M. (2010). Score normalization in playback attack detection. In: Proceedings of ICASSP. IEEE, pp 1678–1681
6.
go back to reference Wu Z, Evans N, Kinnunen T, Yamagishi J, Alegre F, Li H (2015) Spoofing and countermeasures for speaker verification: a survey. Speech Commun 66:130–153CrossRef Wu Z, Evans N, Kinnunen T, Yamagishi J, Alegre F, Li H (2015) Spoofing and countermeasures for speaker verification: a survey. Speech Commun 66:130–153CrossRef
7.
go back to reference Korshunov P, Marcel S, Muckenhirn H, Gonçalves A, Mello A, Violato R, Simoes F, Neto M, de Angeloni AM, Stuchi J, Dinkel H, Chen N, Qian Y, Paul D, Saha G, Sahidullah M. (2016). Overview of BTAS 2016 speaker anti-spoofing competition. In: 2016 IEEE 8th international conference on biometrics theory, applications and systems (BTAS), pp 1–6 (2016) Korshunov P, Marcel S, Muckenhirn H, Gonçalves A, Mello A, Violato R, Simoes F, Neto M, de Angeloni AM, Stuchi J, Dinkel H, Chen N, Qian Y, Paul D, Saha G, Sahidullah M. (2016). Overview of BTAS 2016 speaker anti-spoofing competition. In: 2016 IEEE 8th international conference on biometrics theory, applications and systems (BTAS), pp 1–6 (2016)
8.
go back to reference Evans N, Kinnunen T, Yamagishi J, Wu Z, Alegre F, DeLeon P (2014) Speaker recognition anti-spoofing. In: Marcel S, Li, SZ, Nixon M (eds) Handbook of biometric anti-spoofing. Springer Evans N, Kinnunen T, Yamagishi J, Wu Z, Alegre F, DeLeon P (2014) Speaker recognition anti-spoofing. In: Marcel S, Li, SZ, Nixon M (eds) Handbook of biometric anti-spoofing. Springer
9.
go back to reference Marcel S, Li SZ, Nixon M (eds) Handbook of biometric anti-spoofing: trusted biometrics under spoofing attacks. Springer (2014) Marcel S, Li SZ, Nixon M (eds) Handbook of biometric anti-spoofing: trusted biometrics under spoofing attacks. Springer (2014)
10.
go back to reference Farrús Cabeceran M, Wagner M, Erro D, Pericás H (2010) Automatic speaker recognition as a measurement of voice imitation and conversion. The Int J Speech Lang Law 1(17):119–142 Farrús Cabeceran M, Wagner M, Erro D, Pericás H (2010) Automatic speaker recognition as a measurement of voice imitation and conversion. The Int J Speech Lang Law 1(17):119–142
11.
go back to reference Perrot P, Aversano G, Chollet G (2007) Voice disguise and automatic detection: review and perspectives. Progress in nonlinear speech processing, pp. 101–117 Perrot P, Aversano G, Chollet G (2007) Voice disguise and automatic detection: review and perspectives. Progress in nonlinear speech processing, pp. 101–117
12.
go back to reference Zetterholm E (2007) Detection of speaker characteristics using voice imitation. In: Speaker Classification II. Springer, pp 192–205 Zetterholm E (2007) Detection of speaker characteristics using voice imitation. In: Speaker Classification II. Springer, pp 192–205
13.
go back to reference Lau Y, Wagner M, Tran D (2004) Vulnerability of speaker verification to voice mimicking. In: Proceedings of 2004 international symposium on intelligent multimedia, video and speech processing, 2004. IEEE, pp 145–148 Lau Y, Wagner M, Tran D (2004) Vulnerability of speaker verification to voice mimicking. In: Proceedings of 2004 international symposium on intelligent multimedia, video and speech processing, 2004. IEEE, pp 145–148
14.
go back to reference Lau Y, Tran D, Wagner M (2005) Testing voice mimicry with the YOHO speaker verification corpus. In: International conference on knowledge-based and intelligent information and engineering systems. Springer, pp 15–21 Lau Y, Tran D, Wagner M (2005) Testing voice mimicry with the YOHO speaker verification corpus. In: International conference on knowledge-based and intelligent information and engineering systems. Springer, pp 15–21
15.
go back to reference Mariéthoz J, Bengio S (2005) Can a professional imitator fool a GMM-based speaker verification system? Technical report, Idiap Research Institute Mariéthoz J, Bengio S (2005) Can a professional imitator fool a GMM-based speaker verification system? Technical report, Idiap Research Institute
16.
go back to reference Panjwani S, Prakash A (2014) Crowdsourcing attacks on biometric systems. In: Symposium on usable privacy and security (SOUPS 2014), pp 257–269 Panjwani S, Prakash A (2014) Crowdsourcing attacks on biometric systems. In: Symposium on usable privacy and security (SOUPS 2014), pp 257–269
17.
go back to reference Hautamäki R, Kinnunen T, Hautamäki V, Laukkanen AM (2015) Automatic versus human speaker verification: the case of voice mimicry. Speech Commun 72:13–31CrossRef Hautamäki R, Kinnunen T, Hautamäki V, Laukkanen AM (2015) Automatic versus human speaker verification: the case of voice mimicry. Speech Commun 72:13–31CrossRef
18.
go back to reference Ergunay S, Khoury E, Lazaridis A, Marcel S (2015) On the vulnerability of speaker verification to realistic voice spoofing. In: IEEE international conference on biometrics: theory, applications and systems, pp 1–8 Ergunay S, Khoury E, Lazaridis A, Marcel S (2015) On the vulnerability of speaker verification to realistic voice spoofing. In: IEEE international conference on biometrics: theory, applications and systems, pp 1–8
19.
go back to reference Lindberg J, Blomberg M (1999) Vulnerability in speaker verification-a study of technical impostor techniques. Proceedings of the European conference on speech communication and technology 3:1211–1214 Lindberg J, Blomberg M (1999) Vulnerability in speaker verification-a study of technical impostor techniques. Proceedings of the European conference on speech communication and technology 3:1211–1214
20.
go back to reference Villalba J, Lleida E (2010) Speaker verification performance degradation against spoofing and tampering attacks. In: FALA 10 workshop, pp 131–134 Villalba J, Lleida E (2010) Speaker verification performance degradation against spoofing and tampering attacks. In: FALA 10 workshop, pp 131–134
21.
go back to reference Wang ZF, Wei G, He QH (2011) Channel pattern noise based playback attack detection algorithm for speaker recognition. In: 2011 International conference on machine learning and cybernetics, vol 4, pp 1708–1713 Wang ZF, Wei G, He QH (2011) Channel pattern noise based playback attack detection algorithm for speaker recognition. In: 2011 International conference on machine learning and cybernetics, vol 4, pp 1708–1713
22.
go back to reference Villalba J, Lleida E (2011) Preventing replay attacks on speaker verification systems. In: 2011 IEEE International Carnahan Conference on Security Technology (ICCST). IEEE, pp 1–8 Villalba J, Lleida E (2011) Preventing replay attacks on speaker verification systems. In: 2011 IEEE International Carnahan Conference on Security Technology (ICCST). IEEE, pp 1–8
23.
go back to reference Gałka J, Grzywacz M, Samborski R (2015) Playback attack detection for text-dependent speaker verification over telephone channels. Speech Commun 67:143–153CrossRef Gałka J, Grzywacz M, Samborski R (2015) Playback attack detection for text-dependent speaker verification over telephone channels. Speech Commun 67:143–153CrossRef
24.
go back to reference Taylor P (2009) Text-to-speech synthesis. Cambridge University Press Taylor P (2009) Text-to-speech synthesis. Cambridge University Press
25.
go back to reference Klatt DH (1980) Software for a cascade/parallel formant synthesizer. J Acoust Soc Am 67:971–995CrossRef Klatt DH (1980) Software for a cascade/parallel formant synthesizer. J Acoust Soc Am 67:971–995CrossRef
26.
go back to reference Moulines E, Charpentier F (1990) Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones. Speech Commun 9:453–467CrossRef Moulines E, Charpentier F (1990) Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones. Speech Commun 9:453–467CrossRef
27.
go back to reference Hunt A, Black AW (1996) Unit selection in a concatenative speech synthesis system using a large speech database. In: Proceedings ICASSP, pp 373–376 Hunt A, Black AW (1996) Unit selection in a concatenative speech synthesis system using a large speech database. In: Proceedings ICASSP, pp 373–376
28.
go back to reference Breen A, Jackson P (1998) A phonologically motivated method of selecting nonuniform units. In: Proceedings of ICSLP, pp 2735–2738 Breen A, Jackson P (1998) A phonologically motivated method of selecting nonuniform units. In: Proceedings of ICSLP, pp 2735–2738
29.
go back to reference Donovan RE, Eide EM (1998) The IBM trainable speech synthesis system. In: Proceedings of ICSLP, pp 1703–1706 Donovan RE, Eide EM (1998) The IBM trainable speech synthesis system. In: Proceedings of ICSLP, pp 1703–1706
30.
go back to reference Beutnagel B, Conkie A, Schroeter J, Stylianou Y, Syrdal A (1999) The AT&T Next-Gen TTS system. In: Proceedigns of joint ASA, EAA and DAEA meeting, pp 15–19CrossRef Beutnagel B, Conkie A, Schroeter J, Stylianou Y, Syrdal A (1999) The AT&T Next-Gen TTS system. In: Proceedigns of joint ASA, EAA and DAEA meeting, pp 15–19CrossRef
31.
go back to reference Coorman G, Fackrell J, Rutten P, Coile B (2000) Segment selection in the L & H realspeak laboratory TTS system. In: Proceedings of ICSLP, pp 395–398 Coorman G, Fackrell J, Rutten P, Coile B (2000) Segment selection in the L & H realspeak laboratory TTS system. In: Proceedings of ICSLP, pp 395–398
32.
go back to reference Yoshimura T, Tokuda K, Masuko T, Kobayashi T, Kitamura T (1999) Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis. In: Proceedings of Eurospeech, pp 2347–2350 Yoshimura T, Tokuda K, Masuko T, Kobayashi T, Kitamura T (1999) Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis. In: Proceedings of Eurospeech, pp 2347–2350
33.
go back to reference Ling ZH, Wu YJ, Wang YP, Qin L, Wang RH (2006) USTC system for Blizzard Challenge 2006 an improved HMM-based speech synthesis method. In: Proceedings of the Blizzard challenge workshop Ling ZH, Wu YJ, Wang YP, Qin L, Wang RH (2006) USTC system for Blizzard Challenge 2006 an improved HMM-based speech synthesis method. In: Proceedings of the Blizzard challenge workshop
34.
go back to reference Black A (2006) CLUSTERGEN: a statistical parametric synthesizer using trajectory modeling. In: Proceedings of Interspeech, pp 1762–1765 Black A (2006) CLUSTERGEN: a statistical parametric synthesizer using trajectory modeling. In: Proceedings of Interspeech, pp 1762–1765
35.
go back to reference Zen H, Toda T, Nakamura M, Tokuda K (2007) Details of the Nitech HMM-based speech synthesis system for the Blizzard challenge 2005. IEICE Trans Inf Syst E90-D(1):325–333CrossRef Zen H, Toda T, Nakamura M, Tokuda K (2007) Details of the Nitech HMM-based speech synthesis system for the Blizzard challenge 2005. IEICE Trans Inf Syst E90-D(1):325–333CrossRef
36.
go back to reference Zen H, Tokuda K, Black AW (2009) Statistical parametric speech synthesis. Speech Commun 51(11):1039–1064CrossRef Zen H, Tokuda K, Black AW (2009) Statistical parametric speech synthesis. Speech Commun 51(11):1039–1064CrossRef
37.
go back to reference Yamagishi J, Kobayashi T, Nakano Y, Ogata K, Isogai J (2009) Analysis of speaker adaptation algorithms for HMM-based speech synthesis and a constrained SMAPLR adaptation algorithm. IEEE Trans Speech Audio Lang Process 17(1), 66–83 (2009)CrossRef Yamagishi J, Kobayashi T, Nakano Y, Ogata K, Isogai J (2009) Analysis of speaker adaptation algorithms for HMM-based speech synthesis and a constrained SMAPLR adaptation algorithm. IEEE Trans Speech Audio Lang Process 17(1), 66–83 (2009)CrossRef
38.
go back to reference Leggetter CJ, Woodland PC (1995) Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models. Comput Speech Lang 9:171–185CrossRef Leggetter CJ, Woodland PC (1995) Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models. Comput Speech Lang 9:171–185CrossRef
39.
go back to reference Woodland PC (2001) Speaker adaptation for continuous density HMMs: a review. In: Proceedings of ISCA workshop on adaptation methods for speech recognition, p 119 Woodland PC (2001) Speaker adaptation for continuous density HMMs: a review. In: Proceedings of ISCA workshop on adaptation methods for speech recognition, p 119
40.
go back to reference Ze H, Senior A, Schuster M (2013) Statistical parametric speech synthesis using deep neural networks. In: Proceedings of ICASSP, pp 7962–7966 Ze H, Senior A, Schuster M (2013) Statistical parametric speech synthesis using deep neural networks. In: Proceedings of ICASSP, pp 7962–7966
41.
go back to reference Ling ZH, Deng L, Yu D (2013) Modeling spectral envelopes using restricted boltzmann machines and deep belief networks for statistical parametric speech synthesis. IEEE Trans Audio Speech Lang Process 21(10):2129–2139CrossRef Ling ZH, Deng L, Yu D (2013) Modeling spectral envelopes using restricted boltzmann machines and deep belief networks for statistical parametric speech synthesis. IEEE Trans Audio Speech Lang Process 21(10):2129–2139CrossRef
42.
go back to reference Fan Y, Qian Y, Xie FL, Soong F (2014) TTS synthesis with bidirectional LSTM based recurrent neural networks. In: Proceedings of Interspeech, pp 1964–1968 Fan Y, Qian Y, Xie FL, Soong F (2014) TTS synthesis with bidirectional LSTM based recurrent neural networks. In: Proceedings of Interspeech, pp 1964–1968
43.
go back to reference Zen H, Sak H (2015) Unidirectional long short-term memory recurrent neural network with recurrent output layer for low-latency speech synthesis. In: Proceedings of ICASSP, pp 4470–4474 Zen H, Sak H (2015) Unidirectional long short-term memory recurrent neural network with recurrent output layer for low-latency speech synthesis. In: Proceedings of ICASSP, pp 4470–4474
44.
go back to reference Wu Z, King S (2016) Investigating gated recurrent networks for speech synthesis. In: Proceedings of ICASSP, pp 5140–5144 (2016) Wu Z, King S (2016) Investigating gated recurrent networks for speech synthesis. In: Proceedings of ICASSP, pp 5140–5144 (2016)
45.
go back to reference Wang X, Takaki S, Yamagishi J (2016) Investigating very deep highway networks for parametric speech synthesis. In: 9th ISCA speech synthesis workshop, pp 166–171 Wang X, Takaki S, Yamagishi J (2016) Investigating very deep highway networks for parametric speech synthesis. In: 9th ISCA speech synthesis workshop, pp 166–171
46.
go back to reference Wang X, Takaki S, Yamagishi J (2018) Investigating very deep highway networks for parametric speech synthesis. Speech Commun 96:1–9CrossRef Wang X, Takaki S, Yamagishi J (2018) Investigating very deep highway networks for parametric speech synthesis. Speech Commun 96:1–9CrossRef
47.
go back to reference Wang X, Takaki S, Yamagishi J (2017) An autoregressive recurrent mixture density network for parametric speech synthesis. In: Proceedings of ICASSP, pp 4895–4899 Wang X, Takaki S, Yamagishi J (2017) An autoregressive recurrent mixture density network for parametric speech synthesis. In: Proceedings of ICASSP, pp 4895–4899
48.
go back to reference Wang X, Takaki S, Yamagishi J (2017) An RNN-based quantized F0 model with multi-tier feedback links for text-to-speech synthesis. In: Proceedings of Interspeech, pp 1059–1063 (2017) Wang X, Takaki S, Yamagishi J (2017) An RNN-based quantized F0 model with multi-tier feedback links for text-to-speech synthesis. In: Proceedings of Interspeech, pp 1059–1063 (2017)
49.
go back to reference Saito, Y., Takamichi, S., Saruwatari, H.: Training algorithm to deceive anti-spoofing verification for DNN-based speech synthesis. In: Proc. ICASSP, pp 4900–4904 (2017) Saito, Y., Takamichi, S., Saruwatari, H.: Training algorithm to deceive anti-spoofing verification for DNN-based speech synthesis. In: Proc. ICASSP, pp 4900–4904 (2017)
50.
go back to reference Saito Y, Takamichi S, Saruwatari H (2018) Statistical parametric speech synthesis incorporating generative adversarial networks. IEEE/ACM Trans Audio Speech Lang Process 26(1):84–96CrossRef Saito Y, Takamichi S, Saruwatari H (2018) Statistical parametric speech synthesis incorporating generative adversarial networks. IEEE/ACM Trans Audio Speech Lang Process 26(1):84–96CrossRef
51.
go back to reference Kaneko T, Kameoka H, Hojo N, Ijima Y, Hiramatsu K, Kashino K (2017) Generative adversarial network-based postfilter for statistical parametric speech synthesis. In: Proceedings of ICASSP, pp 4910–4914 Kaneko T, Kameoka H, Hojo N, Ijima Y, Hiramatsu K, Kashino K (2017) Generative adversarial network-based postfilter for statistical parametric speech synthesis. In: Proceedings of ICASSP, pp 4910–4914
52.
go back to reference Van Oord D, Dieleman A, Zen S, Simonyan H, Vinyals K, Graves O, Kalchbrenner A, Senior N, Kavukcuoglu AK (2016) Wavenet: a generative model for raw audio. arXiv:1609.03499 Van Oord D, Dieleman A, Zen S, Simonyan H, Vinyals K, Graves O, Kalchbrenner A, Senior N, Kavukcuoglu AK (2016) Wavenet: a generative model for raw audio. arXiv:​1609.​03499
53.
go back to reference Mehri S, Kumar K, Gulrajani I, Kumar R, Jain S, Sotelo J, Courville A, Bengio Y (2016) Samplernn: an unconditional end-to-end neural audio generation model. arXiv:1612.07837 Mehri S, Kumar K, Gulrajani I, Kumar R, Jain S, Sotelo J, Courville A, Bengio Y (2016) Samplernn: an unconditional end-to-end neural audio generation model. arXiv:​1612.​07837
54.
go back to reference Wang Y, Skerry-Ryan R, Stanton D, Wu Y, Weiss R, Jaitly N, Yang Z, Xiao Y, Chen Z, Bengio S, Le Q, Agiomyrgiannakis Y, Clark R, Saurous R (2017) Tacotron: towards end-to-end speech synthesis. In: Proceedings of Interspeech, pp 4006–4010 Wang Y, Skerry-Ryan R, Stanton D, Wu Y, Weiss R, Jaitly N, Yang Z, Xiao Y, Chen Z, Bengio S, Le Q, Agiomyrgiannakis Y, Clark R, Saurous R (2017) Tacotron: towards end-to-end speech synthesis. In: Proceedings of Interspeech, pp 4006–4010
55.
go back to reference Gibiansky A, Arik S, Diamos G, Miller J, Peng K, Ping W, Raiman J, Zhou Y (2017) Deep voice 2: multi-speaker neural text-to-speech. In: Advances in neural information processing systems, pp 2966–2974 Gibiansky A, Arik S, Diamos G, Miller J, Peng K, Ping W, Raiman J, Zhou Y (2017) Deep voice 2: multi-speaker neural text-to-speech. In: Advances in neural information processing systems, pp 2966–2974
56.
go back to reference Shen J, Schuster M, Jaitly N, Skerry-Ryan R, Saurous R, Weiss R, Pang R, Agiomyrgiannakis Y, Wu Y, Zhang Y, Wang Y, Chen Z, Yang Z (2018) Natural tts synthesis by conditioning wavenet on mel spectrogram predictions. In: Proceedigns of ICASSP Shen J, Schuster M, Jaitly N, Skerry-Ryan R, Saurous R, Weiss R, Pang R, Agiomyrgiannakis Y, Wu Y, Zhang Y, Wang Y, Chen Z, Yang Z (2018) Natural tts synthesis by conditioning wavenet on mel spectrogram predictions. In: Proceedigns of ICASSP
57.
go back to reference King S (2014) Measuring a decade of progress in text-to-speech. Loquens 1(1):006CrossRef King S (2014) Measuring a decade of progress in text-to-speech. Loquens 1(1):006CrossRef
58.
go back to reference King S, Wihlborg L, Guo W (2017) The blizzard challenge 2017. In: Proceedings of Blizzard Challenge Workshop, Stockholm, Sweden King S, Wihlborg L, Guo W (2017) The blizzard challenge 2017. In: Proceedings of Blizzard Challenge Workshop, Stockholm, Sweden
59.
go back to reference Foomany F, Hirschfield A, Ingleby M (2009) Toward a dynamic framework for security evaluation of voice verification systems. In: 2009 IEEE toronto international conference science and technology for humanity (TIC-STH), pp 22–27 Foomany F, Hirschfield A, Ingleby M (2009) Toward a dynamic framework for security evaluation of voice verification systems. In: 2009 IEEE toronto international conference science and technology for humanity (TIC-STH), pp 22–27
60.
go back to reference Masuko T, Hitotsumatsu T, Tokuda K, Kobayashi T (1999) On the security of HMM-based speaker verification systems against imposture using synthetic speech. In: Proceedings of EUROSPEECH Masuko T, Hitotsumatsu T, Tokuda K, Kobayashi T (1999) On the security of HMM-based speaker verification systems against imposture using synthetic speech. In: Proceedings of EUROSPEECH
61.
go back to reference Matsui T, Furui S (1995) Likelihood normalization for speaker verification using a phoneme- and speaker-independent model. Speech Commun 17(1–2):109–116CrossRef Matsui T, Furui S (1995) Likelihood normalization for speaker verification using a phoneme- and speaker-independent model. Speech Commun 17(1–2):109–116CrossRef
62.
go back to reference Masuko T, Tokuda K, Kobayashi T, Imai S (1996) Speech synthesis using HMMs with dynamic features. In: Proceedings of ICASSP Masuko T, Tokuda K, Kobayashi T, Imai S (1996) Speech synthesis using HMMs with dynamic features. In: Proceedings of ICASSP
63.
go back to reference Masuko T, Tokuda K, Kobayashi T, Imai S (1997) Voice characteristics conversion for HMM-based speech synthesis system. In: Proceedings of ICASSP Masuko T, Tokuda K, Kobayashi T, Imai S (1997) Voice characteristics conversion for HMM-based speech synthesis system. In: Proceedings of ICASSP
64.
go back to reference De Leon PL, Pucher M, Yamagishi J, Hernaez I, Saratxaga I (2012) Evaluation of speaker verification security and detection of HMM-based synthetic speech. IEEE Trans Audio Speech Lang Process 20(8):2280–2290CrossRef De Leon PL, Pucher M, Yamagishi J, Hernaez I, Saratxaga I (2012) Evaluation of speaker verification security and detection of HMM-based synthetic speech. IEEE Trans Audio Speech Lang Process 20(8):2280–2290CrossRef
65.
go back to reference Galou G (2011) Synthetic voice forgery in the forensic context: a short tutorial. In: Forensic speech and audio analysis working group (ENFSI-FSAAWG), pp 1–3 Galou G (2011) Synthetic voice forgery in the forensic context: a short tutorial. In: Forensic speech and audio analysis working group (ENFSI-FSAAWG), pp 1–3
67.
go back to reference Satoh T, Masuko T, Kobayashi T, Tokuda K (2001) A robust speaker verification system against imposture using an HMM-based speech synthesis system. In: Proceedings of Eurospeech (2001) Satoh T, Masuko T, Kobayashi T, Tokuda K (2001) A robust speaker verification system against imposture using an HMM-based speech synthesis system. In: Proceedings of Eurospeech (2001)
68.
go back to reference Chen LW, Guo W, Dai LR (2010) Speaker verification against synthetic speech. In: 2010 7th International symposium on Chinese spoken language processing (ISCSLP), pp 309–312 Chen LW, Guo W, Dai LR (2010) Speaker verification against synthetic speech. In: 2010 7th International symposium on Chinese spoken language processing (ISCSLP), pp 309–312
69.
go back to reference Quatieri TF (2002) Discrete-time speech signal processing: principles and practice. Prentice-Hall, Inc Quatieri TF (2002) Discrete-time speech signal processing: principles and practice. Prentice-Hall, Inc
70.
go back to reference Wu Z, Chng E, Li H (2012) Detecting converted speech and natural speech for anti-spoofing attack in speaker recognition. In: Proceedings of Interspeech Wu Z, Chng E, Li H (2012) Detecting converted speech and natural speech for anti-spoofing attack in speaker recognition. In: Proceedings of Interspeech
71.
go back to reference Ogihara A, Unno H, Shiozakai A (2005) Discrimination method of synthetic speech using pitch frequency against synthetic speech falsification. IEICE Trans Fund Electron Commun Comput Sci 88(1):280–286CrossRef Ogihara A, Unno H, Shiozakai A (2005) Discrimination method of synthetic speech using pitch frequency against synthetic speech falsification. IEICE Trans Fund Electron Commun Comput Sci 88(1):280–286CrossRef
72.
go back to reference De Leon P, Stewart B, Yamagishi J (2012) Synthetic speech discrimination using pitch pattern statistics derived from image analysis. In: Proceedings of Interspeech 2012. Portland, Oregon, USA De Leon P, Stewart B, Yamagishi J (2012) Synthetic speech discrimination using pitch pattern statistics derived from image analysis. In: Proceedings of Interspeech 2012. Portland, Oregon, USA
73.
go back to reference Stylianou Y (2009) Voice transformation: a survey. In: Proceedings of ICASSP, pp 3585–3588 Stylianou Y (2009) Voice transformation: a survey. In: Proceedings of ICASSP, pp 3585–3588
74.
go back to reference Pellom B, Hansen J (1999) An experimental study of speaker verification sensitivity to computer voice-altered imposters. In: Proceedings of ICASSP, vol 2, pp 837–840 Pellom B, Hansen J (1999) An experimental study of speaker verification sensitivity to computer voice-altered imposters. In: Proceedings of ICASSP, vol 2, pp 837–840
75.
go back to reference Mohammadi S, Kain A (2017) An overview of voice conversion systems. Speech Commun 88:65–82CrossRef Mohammadi S, Kain A (2017) An overview of voice conversion systems. Speech Commun 88:65–82CrossRef
76.
go back to reference Abe M, Nakamura S, Shikano K, Kuwabara H (1988) Voice conversion through vector quantization. In: Proceedigns of ICASSP, pp 655–658 Abe M, Nakamura S, Shikano K, Kuwabara H (1988) Voice conversion through vector quantization. In: Proceedigns of ICASSP, pp 655–658
77.
go back to reference Arslan L (1999) Speaker transformation algorithm using segmental codebooks (STASC). Speech Commun 28(3):211–226CrossRef Arslan L (1999) Speaker transformation algorithm using segmental codebooks (STASC). Speech Commun 28(3):211–226CrossRef
78.
go back to reference Kain A, Macon M (1998) Spectral voice conversion for text-to-speech synthesis. In: Proceedings of ICASSP vol 1, pp 285–288 Kain A, Macon M (1998) Spectral voice conversion for text-to-speech synthesis. In: Proceedings of ICASSP vol 1, pp 285–288
79.
go back to reference Stylianou Y, Cappé O, Moulines E (1998) Continuous probabilistic transform for voice conversion. IEEE Trans Speech Audio Process 6(2):131–142CrossRef Stylianou Y, Cappé O, Moulines E (1998) Continuous probabilistic transform for voice conversion. IEEE Trans Speech Audio Process 6(2):131–142CrossRef
80.
go back to reference Toda T, Black A, Tokuda K (2007) Voice conversion based on maximum-likelihood estimation of spectral parameter trajectory. IEEE Trans Audio Speech Lang Process 15(8):2222–2235CrossRef Toda T, Black A, Tokuda K (2007) Voice conversion based on maximum-likelihood estimation of spectral parameter trajectory. IEEE Trans Audio Speech Lang Process 15(8):2222–2235CrossRef
81.
go back to reference Kobayashi K, Toda T, Neubig G, Sakti S, Nakamura S (2014) Statistical singing voice conversion with direct waveform modification based on the spectrum differential. In: Proceedings of Interspeech Kobayashi K, Toda T, Neubig G, Sakti S, Nakamura S (2014) Statistical singing voice conversion with direct waveform modification based on the spectrum differential. In: Proceedings of Interspeech
82.
go back to reference Popa V, Silen H, Nurminen J, Gabbouj M (2012) Local linear transformation for voice conversion. In: Proceedigns of ICASSP. IEEE, pp 4517–4520 Popa V, Silen H, Nurminen J, Gabbouj M (2012) Local linear transformation for voice conversion. In: Proceedigns of ICASSP. IEEE, pp 4517–4520
83.
go back to reference Chen Y, Chu M, Chang E, Liu J, Liu R (2003) Voice conversion with smoothed GMM and MAP adaptation. In: Proceedings of EUROSPEECH, pp 2413–2416 Chen Y, Chu M, Chang E, Liu J, Liu R (2003) Voice conversion with smoothed GMM and MAP adaptation. In: Proceedings of EUROSPEECH, pp 2413–2416
84.
go back to reference Hwang HT, Tsao Y, Wang HM, Wang YR, Chen SH (2012) A study of mutual information for GMM-based spectral conversion. In: Proceedigns of Interspeech Hwang HT, Tsao Y, Wang HM, Wang YR, Chen SH (2012) A study of mutual information for GMM-based spectral conversion. In: Proceedigns of Interspeech
85.
go back to reference Helander E, Virtanen T, Nurminen J, Gabbouj M (2010) Voice conversion using partial least squares regression. IEEE Trans Audio Speech Lang Process 18(5):912–921CrossRef Helander E, Virtanen T, Nurminen J, Gabbouj M (2010) Voice conversion using partial least squares regression. IEEE Trans Audio Speech Lang Process 18(5):912–921CrossRef
86.
go back to reference Pilkington N, Zen H, Gales M (2011) Gaussian process experts for voice conversion. In: Proceedings of Interspeech Pilkington N, Zen H, Gales M (2011) Gaussian process experts for voice conversion. In: Proceedings of Interspeech
87.
go back to reference Saito D, Yamamoto K, Minematsu N, Hirose K (2011) One-to-many voice conversion based on tensor representation of speaker space. In: Proceedings of Interspeech, pp 653–656 Saito D, Yamamoto K, Minematsu N, Hirose K (2011) One-to-many voice conversion based on tensor representation of speaker space. In: Proceedings of Interspeech, pp 653–656
88.
go back to reference Zen H, Nankaku Y, Tokuda K (2011) Continuous stochastic feature mapping based on trajectory HMMs. IEEE Trans Audio Speech Lang Process 19(2):417–430CrossRef Zen H, Nankaku Y, Tokuda K (2011) Continuous stochastic feature mapping based on trajectory HMMs. IEEE Trans Audio Speech Lang Process 19(2):417–430CrossRef
89.
go back to reference Wu Z, Kinnunen T, Chng E, Li H (2012) Mixture of factor analyzers using priors from non-parallel speech for voice conversion. IEEE Signal Process Lett 19(12)CrossRef Wu Z, Kinnunen T, Chng E, Li H (2012) Mixture of factor analyzers using priors from non-parallel speech for voice conversion. IEEE Signal Process Lett 19(12)CrossRef
90.
go back to reference Saito D, Watanabe S, Nakamura A, Minematsu N (2012) Statistical voice conversion based on noisy channel model. IEEE Trans Audio Speech Lang Process 20(6):1784–1794CrossRef Saito D, Watanabe S, Nakamura A, Minematsu N (2012) Statistical voice conversion based on noisy channel model. IEEE Trans Audio Speech Lang Process 20(6):1784–1794CrossRef
91.
go back to reference Song P, Bao Y, Zhao L, Zou C (2011) Voice conversion using support vector regression. Electron Lett 47(18):1045–1046CrossRef Song P, Bao Y, Zhao L, Zou C (2011) Voice conversion using support vector regression. Electron Lett 47(18):1045–1046CrossRef
92.
go back to reference Helander E, Silén H, Virtanen T, Gabbouj M (2012) Voice conversion using dynamic kernel partial least squares regression. IEEE Trans Audio Speech Lang Process 20(3):806–817CrossRef Helander E, Silén H, Virtanen T, Gabbouj M (2012) Voice conversion using dynamic kernel partial least squares regression. IEEE Trans Audio Speech Lang Process 20(3):806–817CrossRef
93.
go back to reference Wu Z, Chng E, Li H (2013) Conditional restricted boltzmann machine for voice conversion. In: The first IEEE China summit and international conference on signal and information processing (ChinaSIP). IEEE Wu Z, Chng E, Li H (2013) Conditional restricted boltzmann machine for voice conversion. In: The first IEEE China summit and international conference on signal and information processing (ChinaSIP). IEEE
94.
go back to reference Narendranath M, Murthy H, Rajendran S, Yegnanarayana B (1995) Transformation of formants for voice conversion using artificial neural networks. Speech Commun 16(2):207–216CrossRef Narendranath M, Murthy H, Rajendran S, Yegnanarayana B (1995) Transformation of formants for voice conversion using artificial neural networks. Speech Commun 16(2):207–216CrossRef
95.
go back to reference Desai S, Raghavendra E, Yegnanarayana B, Black A, Prahallad K (2009) Voice conversion using artificial neural networks. In: Proceedings of ICASSP. IEEE, pp 3893–3896 Desai S, Raghavendra E, Yegnanarayana B, Black A, Prahallad K (2009) Voice conversion using artificial neural networks. In: Proceedings of ICASSP. IEEE, pp 3893–3896
96.
go back to reference Saito Y, Takamichi S, Saruwatari H (2017) Voice conversion using input-to-output highway networks. IEICE Transactions on Inf Syst E100.D(8):1925–1928CrossRef Saito Y, Takamichi S, Saruwatari H (2017) Voice conversion using input-to-output highway networks. IEICE Transactions on Inf Syst E100.D(8):1925–1928CrossRef
97.
go back to reference Nakashika T, Takiguchi T, Ariki Y (2015) Voice conversion using RNN pre-trained by recurrent temporal restricted boltzmann machines. IEEE/ACM Trans Audio Speech Lang Process (TASLP) 23(3):580–587CrossRef Nakashika T, Takiguchi T, Ariki Y (2015) Voice conversion using RNN pre-trained by recurrent temporal restricted boltzmann machines. IEEE/ACM Trans Audio Speech Lang Process (TASLP) 23(3):580–587CrossRef
98.
go back to reference Sun L, Kang S, Li K, Meng H (2015) Voice conversion using deep bidirectional long short-term memory based recurrent neural networks. In: Proceedings of ICASSP, pp 4869–4873 Sun L, Kang S, Li K, Meng H (2015) Voice conversion using deep bidirectional long short-term memory based recurrent neural networks. In: Proceedings of ICASSP, pp 4869–4873
99.
go back to reference Sundermann D, Ney H (2003) VTLN-based voice conversion. In: Proceedings of the 3rd IEEE international symposium on signal processing and information technology, 2003. ISSPIT 2003. IEEE Sundermann D, Ney H (2003) VTLN-based voice conversion. In: Proceedings of the 3rd IEEE international symposium on signal processing and information technology, 2003. ISSPIT 2003. IEEE
100.
go back to reference Erro D, Moreno A, Bonafonte A (2010) Voice conversion based on weighted frequency warping. IEEE Trans Audio Speech Lang Process 18(5):922–931CrossRef Erro D, Moreno A, Bonafonte A (2010) Voice conversion based on weighted frequency warping. IEEE Trans Audio Speech Lang Process 18(5):922–931CrossRef
101.
go back to reference Erro D, Navas E, Hernaez I (2013) Parametric voice conversion based on bilinear frequency warping plus amplitude scaling. IEEE Trans Audio Speech Lang Process 21(3):556–566CrossRef Erro D, Navas E, Hernaez I (2013) Parametric voice conversion based on bilinear frequency warping plus amplitude scaling. IEEE Trans Audio Speech Lang Process 21(3):556–566CrossRef
102.
go back to reference Hsu CC, Hwang HT, Wu YC, Tsao Y, Wang HM (2017) Voice conversion from unaligned corpora using variational autoencoding wasserstein generative adversarial networks. In: Proceedings of Interspeech, vol 2017, pp 3364–3368 Hsu CC, Hwang HT, Wu YC, Tsao Y, Wang HM (2017) Voice conversion from unaligned corpora using variational autoencoding wasserstein generative adversarial networks. In: Proceedings of Interspeech, vol 2017, pp 3364–3368
103.
go back to reference Miyoshi H, Saito Y, Takamichi S, Saruwatari H (2017) Voice conversion using sequence-to-sequence learning of context posterior probabilities. Proceedings of Interspeech, vol 2017, pp 1268–1272 Miyoshi H, Saito Y, Takamichi S, Saruwatari H (2017) Voice conversion using sequence-to-sequence learning of context posterior probabilities. Proceedings of Interspeech, vol 2017, pp 1268–1272
104.
go back to reference Fang F, Yamagishi J, Echizen I, Lorenzo-Trueba J (2018) High-quality nonparallel voice conversion based on cycle-consistent adversarial network. In: Proceedings of ICASSP 2018 Fang F, Yamagishi J, Echizen I, Lorenzo-Trueba J (2018) High-quality nonparallel voice conversion based on cycle-consistent adversarial network. In: Proceedings of ICASSP 2018
105.
go back to reference Kobayashi K, Hayashi T, Tamamori A, Toda T (2017) Statistical voice conversion with wavenet-based waveform generation. In: Proceedings of Interspeech, pp 1138–1142 Kobayashi K, Hayashi T, Tamamori A, Toda T (2017) Statistical voice conversion with wavenet-based waveform generation. In: Proceedings of Interspeech, pp 1138–1142
106.
go back to reference Gillet B, King S (2003) Transforming F0 contours. In: Proceedings of EUROSPEECH, pp 101–104 (2003) Gillet B, King S (2003) Transforming F0 contours. In: Proceedings of EUROSPEECH, pp 101–104 (2003)
107.
go back to reference Wu CH, Hsia CC, Liu TH, Wang JF (2006) Voice conversion using duration-embedded bi-HMMs for expressive speech synthesis. IEEE Trans Audio Speech Lang Process 14(4):1109–1116CrossRef Wu CH, Hsia CC, Liu TH, Wang JF (2006) Voice conversion using duration-embedded bi-HMMs for expressive speech synthesis. IEEE Trans Audio Speech Lang Process 14(4):1109–1116CrossRef
108.
go back to reference Helander E, Nurminen J (2007) A novel method for prosody prediction in voice conversion. In: Proceedings of ICASSP, vol 4. IEEE, pp IV–509 Helander E, Nurminen J (2007) A novel method for prosody prediction in voice conversion. In: Proceedings of ICASSP, vol 4. IEEE, pp IV–509
109.
go back to reference Wu Z, Kinnunen T, Chng E, Li H (2010) Text-independent F0 transformation with non-parallel data for voice conversion. In: Proceedings of Interspeech Wu Z, Kinnunen T, Chng E, Li H (2010) Text-independent F0 transformation with non-parallel data for voice conversion. In: Proceedings of Interspeech
110.
go back to reference Lolive D, Barbot N, Boeffard O (2008) Pitch and duration transformation with non-parallel data. Speech Prosody 2008:111–114 Lolive D, Barbot N, Boeffard O (2008) Pitch and duration transformation with non-parallel data. Speech Prosody 2008:111–114
111.
go back to reference Toda T, Chen LH, Saito D, Villavicencio F, Wester M, Wu Z, Yamagishi J (2016) The voice conversion challenge 2016. In: Proceedings of Interspeech, pp 1632–1636 Toda T, Chen LH, Saito D, Villavicencio F, Wester M, Wu Z, Yamagishi J (2016) The voice conversion challenge 2016. In: Proceedings of Interspeech, pp 1632–1636
112.
go back to reference Wester M, Wu Z, Yamagishi J (2016) Analysis of the voice conversion challenge 2016 evaluation results. In: Proceedings of Interspeech, pp 1637–1641 Wester M, Wu Z, Yamagishi J (2016) Analysis of the voice conversion challenge 2016 evaluation results. In: Proceedings of Interspeech, pp 1637–1641
113.
go back to reference Perrot P, Aversano G, Blouet R, Charbit M, Chollet G (2005) Voice forgery using ALISP: indexation in a client memory. In: Proceedings of ICASSP, vol 1. IEEE, pp 17–20 Perrot P, Aversano G, Blouet R, Charbit M, Chollet G (2005) Voice forgery using ALISP: indexation in a client memory. In: Proceedings of ICASSP, vol 1. IEEE, pp 17–20
114.
go back to reference Matrouf D, Bonastre JF, Fredouille C (2006) Effect of speech transformation on impostor acceptance. In: Proceedings of ICASSP, vol 1. IEEE, pp I–I Matrouf D, Bonastre JF, Fredouille C (2006) Effect of speech transformation on impostor acceptance. In: Proceedings of ICASSP, vol 1. IEEE, pp I–I
115.
go back to reference Kinnunen T, Wu Z, Lee K, Sedlak F, Chng E, Li H (2012) Vulnerability of speaker verification systems against voice conversion spoofing attacks: the case of telephone speech. In: Proceedings of ICASSP. IEEE, pp 4401–4404 Kinnunen T, Wu Z, Lee K, Sedlak F, Chng E, Li H (2012) Vulnerability of speaker verification systems against voice conversion spoofing attacks: the case of telephone speech. In: Proceedings of ICASSP. IEEE, pp 4401–4404
116.
go back to reference Sundermann D, Hoge H, Bonafonte A, Ney H, Black A, Narayanan S (2006) Text-independent voice conversion based on unit selection. In: Proceedings of ICASSP, vol 1, pp I–I Sundermann D, Hoge H, Bonafonte A, Ney H, Black A, Narayanan S (2006) Text-independent voice conversion based on unit selection. In: Proceedings of ICASSP, vol 1, pp I–I
117.
go back to reference Wu Z, Larcher A, Lee K, Chng E, Kinnunen T, Li H (2013) Vulnerability evaluation of speaker verification under voice conversion spoofing: the effect of text constraints. In: Proceedings of Interspeech, Lyon, France (2013) Wu Z, Larcher A, Lee K, Chng E, Kinnunen T, Li H (2013) Vulnerability evaluation of speaker verification under voice conversion spoofing: the effect of text constraints. In: Proceedings of Interspeech, Lyon, France (2013)
118.
go back to reference Alegre F, Vipperla R, Evans N, Fauve B (2012) On the vulnerability of automatic speaker recognition to spoofing attacks with artificial signals. In: 2012 EURASIP conference on european conference on signal processing (EUSIPCO) Alegre F, Vipperla R, Evans N, Fauve B (2012) On the vulnerability of automatic speaker recognition to spoofing attacks with artificial signals. In: 2012 EURASIP conference on european conference on signal processing (EUSIPCO)
119.
go back to reference De Leon PL, Hernaez I, Saratxaga I, Pucher M, Yamagishi J (2011) Detection of synthetic speech for the problem of imposture. In: Proceedings of ICASSP, Dallas, USA, pp 4844–4847 De Leon PL, Hernaez I, Saratxaga I, Pucher M, Yamagishi J (2011) Detection of synthetic speech for the problem of imposture. In: Proceedings of ICASSP, Dallas, USA, pp 4844–4847
120.
go back to reference Wu Z, Kinnunen T, Chng E, Li H, Ambikairajah E (2012) A study on spoofing attack in state-of-the-art speaker verification: the telephone speech case. In: Proceedings of Asia-Pacific signal information processing association annual summit and conference (APSIPA ASC). IEEE, pp 1–5 Wu Z, Kinnunen T, Chng E, Li H, Ambikairajah E (2012) A study on spoofing attack in state-of-the-art speaker verification: the telephone speech case. In: Proceedings of Asia-Pacific signal information processing association annual summit and conference (APSIPA ASC). IEEE, pp 1–5
121.
go back to reference Alegre F, Vipperla R, Evans,N (2012) Spoofing countermeasures for the protection of automatic speaker recognition systems against attacks with artificial signals. In: Proceedings of Interspeech Alegre F, Vipperla R, Evans,N (2012) Spoofing countermeasures for the protection of automatic speaker recognition systems against attacks with artificial signals. In: Proceedings of Interspeech
122.
go back to reference Alegre F, Amehraye A, Evans N (2013) Spoofing countermeasures to protect automatic speaker verification from voice conversion. In: Proceedings of ICASSP Alegre F, Amehraye A, Evans N (2013) Spoofing countermeasures to protect automatic speaker verification from voice conversion. In: Proceedings of ICASSP
123.
go back to reference Wu Z, Xiao X, Chng E, Li H (2013) Synthetic speech detection using temporal modulation feature. In: Proceedings of ICASSP Wu Z, Xiao X, Chng E, Li H (2013) Synthetic speech detection using temporal modulation feature. In: Proceedings of ICASSP
124.
go back to reference Alegre F, Vipperla R, Amehraye A, Evans N (2013) A new speaker verification spoofing countermeasure based on local binary patterns. In: Proceedings of Interspeech, Lyon, France Alegre F, Vipperla R, Amehraye A, Evans N (2013) A new speaker verification spoofing countermeasure based on local binary patterns. In: Proceedings of Interspeech, Lyon, France
125.
go back to reference Wu Z, Kinnunen T, Evans N, Yamagishi J, Hanilçi C, Sahidullah M, Sizov A (2015) ASVspoof 2015: the first automatic speaker verification spoofing and countermeasures challenge. In: Proceedings of Interspeech Wu Z, Kinnunen T, Evans N, Yamagishi J, Hanilçi C, Sahidullah M, Sizov A (2015) ASVspoof 2015: the first automatic speaker verification spoofing and countermeasures challenge. In: Proceedings of Interspeech
126.
go back to reference Kinnunen T, Sahidullah M, Delgado H, Todisco M, Evans N, Yamagishi J, Lee K (2017) The ASVspoof 2017 challenge: assessing the limits of replay spoofing attack detection. In: INTERSPEECH Kinnunen T, Sahidullah M, Delgado H, Todisco M, Evans N, Yamagishi J, Lee K (2017) The ASVspoof 2017 challenge: assessing the limits of replay spoofing attack detection. In: INTERSPEECH
127.
go back to reference Wu Z, Khodabakhsh A, Demiroglu C, Yamagishi J, Saito D, Toda T, King S (2015) SAS: a speaker verification spoofing database containing diverse attacks. In: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) Wu Z, Khodabakhsh A, Demiroglu C, Yamagishi J, Saito D, Toda T, King S (2015) SAS: a speaker verification spoofing database containing diverse attacks. In: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)
129.
go back to reference Patel T, Patil H (2015) Combining evidences from mel cepstral, cochlear filter cepstral and instantaneous frequency features for detection of natural vs. spoofed speech. In: Proceedings of Interspeech Patel T, Patil H (2015) Combining evidences from mel cepstral, cochlear filter cepstral and instantaneous frequency features for detection of natural vs. spoofed speech. In: Proceedings of Interspeech
130.
go back to reference Novoselov S, Kozlov A, Lavrentyeva G, Simonchik K, Shchemelinin V (2016) STC anti-spoofing systems for the ASVspoof 2015 challenge. In: Proceedings of IEEE international conference on acoustics, speech, and signal processing (ICASSP), pp 5475–5479 Novoselov S, Kozlov A, Lavrentyeva G, Simonchik K, Shchemelinin V (2016) STC anti-spoofing systems for the ASVspoof 2015 challenge. In: Proceedings of IEEE international conference on acoustics, speech, and signal processing (ICASSP), pp 5475–5479
131.
go back to reference Chen N, Qian Y, Dinkel H, Chen B, Yu K (2015) Robust deep feature for spoofing detection-the SJTU system for ASVspoof 2015 challenge. In: Proceedings of Interspeech Chen N, Qian Y, Dinkel H, Chen B, Yu K (2015) Robust deep feature for spoofing detection-the SJTU system for ASVspoof 2015 challenge. In: Proceedings of Interspeech
132.
go back to reference Xiao X, Tian X, Du S, Xu H, Chng E, Li H (2015) Spoofing speech detection using high dimensional magnitude and phase features: the NTU approach for ASVspoof 2015 challenge. In: Proceedings of Interspeech Xiao X, Tian X, Du S, Xu H, Chng E, Li H (2015) Spoofing speech detection using high dimensional magnitude and phase features: the NTU approach for ASVspoof 2015 challenge. In: Proceedings of Interspeech
133.
go back to reference Alam M, Kenny P, Bhattacharya G, Stafylakis T (2015) Development of CRIM system for the automatic speaker verification spoofing and countermeasures challenge 2015. In: Proceedings of Interspeech Alam M, Kenny P, Bhattacharya G, Stafylakis T (2015) Development of CRIM system for the automatic speaker verification spoofing and countermeasures challenge 2015. In: Proceedings of Interspeech
134.
go back to reference Wu Z, Yamagishi J, Kinnunen T, Hanilçi C, Sahidullah M, Sizov A, Evans N, Todisco M, Delgado H (2017) Asvspoof: the automatic speaker verification spoofing and countermeasures challenge. IEEE J Sel Top Signal Process 11(4):588–604CrossRef Wu Z, Yamagishi J, Kinnunen T, Hanilçi C, Sahidullah M, Sizov A, Evans N, Todisco M, Delgado H (2017) Asvspoof: the automatic speaker verification spoofing and countermeasures challenge. IEEE J Sel Top Signal Process 11(4):588–604CrossRef
135.
go back to reference Delgado H, Todisco M, Sahidullah M, Evans N, Kinnunen T, Lee K, Yamagishi J (2018) ASVspoof 2017 version 2.0: meta-data analysis and baseline enhancements. In: Proceedings of Odyssey 2018 the speaker and language recognition workshop, pp 296–303 Delgado H, Todisco M, Sahidullah M, Evans N, Kinnunen T, Lee K, Yamagishi J (2018) ASVspoof 2017 version 2.0: meta-data analysis and baseline enhancements. In: Proceedings of Odyssey 2018 the speaker and language recognition workshop, pp 296–303
136.
go back to reference Todisco M, Delgado H, Evans N (2016) A new feature for automatic speaker verification anti-spoofing: constant Q cepstral coefficients. In: Proceedings of Odyssey: the speaker and language recognition workshop, Bilbao, Spain, pp 283–290 Todisco M, Delgado H, Evans N (2016) A new feature for automatic speaker verification anti-spoofing: constant Q cepstral coefficients. In: Proceedings of Odyssey: the speaker and language recognition workshop, Bilbao, Spain, pp 283–290
137.
go back to reference Todisco M, Delgado H, Evans N (2017) Constant Q cepstral coefficients: a spoofing countermeasure for automatic speaker verification. Comput Speech Lang 45:516–535CrossRef Todisco M, Delgado H, Evans N (2017) Constant Q cepstral coefficients: a spoofing countermeasure for automatic speaker verification. Comput Speech Lang 45:516–535CrossRef
138.
go back to reference Lavrentyeva G, Novoselov S, Malykh E, Kozlov A, Kudashev O, Shchemelinin V (2017) Audio replay attack detection with deep learning frameworks. In: Proceedings of Interspeech, pp 82–86 Lavrentyeva G, Novoselov S, Malykh E, Kozlov A, Kudashev O, Shchemelinin V (2017) Audio replay attack detection with deep learning frameworks. In: Proceedings of Interspeech, pp 82–86
139.
go back to reference Ji Z, Li Z, Li P, An M, Gao S, Wu D, Zhao F (2017) Ensemble learning for countermeasure of audio replay spoofing attack in ASVspoof2017. In: Proceedings of Interspeech, pp 87–91 Ji Z, Li Z, Li P, An M, Gao S, Wu D, Zhao F (2017) Ensemble learning for countermeasure of audio replay spoofing attack in ASVspoof2017. In: Proceedings of Interspeech, pp 87–91
140.
go back to reference Li L, Chen Y, Wang D, Zheng T (2017) A study on replay attack and anti-spoofing for automatic speaker verification. In: Proceedings of Interspeech, pp 92–96 Li L, Chen Y, Wang D, Zheng T (2017) A study on replay attack and anti-spoofing for automatic speaker verification. In: Proceedings of Interspeech, pp 92–96
141.
go back to reference Patil H, Kamble M, Patel T, Soni M (2017) Novel variable length teager energy separation based instantaneous frequency features for replay detection. In: Proceedings of Interspeech, pp 12–16 Patil H, Kamble M, Patel T, Soni M (2017) Novel variable length teager energy separation based instantaneous frequency features for replay detection. In: Proceedings of Interspeech, pp 12–16
142.
go back to reference Chen Z, Xie Z, Zhang W, Xu X (2017) ResNet and model fusion for automatic spoofing detection. In: Proceedings of Interspeech, pp 102–106 Chen Z, Xie Z, Zhang W, Xu X (2017) ResNet and model fusion for automatic spoofing detection. In: Proceedings of Interspeech, pp 102–106
143.
go back to reference Wu Z, Gao S, Cling E, Li H (2014) A study on replay attack and anti-spoofing for text-dependent speaker verification. In: Proceedings of Asia-Pacific signal information processing association annual summit and conference (APSIPA ASC). IEEE, pp 1–5 Wu Z, Gao S, Cling E, Li H (2014) A study on replay attack and anti-spoofing for text-dependent speaker verification. In: Proceedings of Asia-Pacific signal information processing association annual summit and conference (APSIPA ASC). IEEE, pp 1–5
144.
go back to reference Li Q (2009) An auditory-based transform for audio signal processing. In: 2009 IEEE workshop on applications of signal processing to audio and acoustics. IEEE, pp 181–184 Li Q (2009) An auditory-based transform for audio signal processing. In: 2009 IEEE workshop on applications of signal processing to audio and acoustics. IEEE, pp 181–184
145.
go back to reference Davis S, Mermelstein P (1980) Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans Acoust Speech Signal Process 28(4):357–366CrossRef Davis S, Mermelstein P (1980) Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans Acoust Speech Signal Process 28(4):357–366CrossRef
146.
go back to reference Sahidullah M, Kinnunen T, Hanilçi C (2015) A comparison of features for synthetic speech detection. In: Proceedings of Interspeech. ISCA, pp 2087–2091 Sahidullah M, Kinnunen T, Hanilçi C (2015) A comparison of features for synthetic speech detection. In: Proceedings of Interspeech. ISCA, pp 2087–2091
147.
go back to reference Brown J (1991) Calculation of a constant Q spectral transform. J Acoust Soc Am 89(1):425–434CrossRef Brown J (1991) Calculation of a constant Q spectral transform. J Acoust Soc Am 89(1):425–434CrossRef
148.
go back to reference Alam M, Kenny P (2017) Spoofing detection employing infinite impulse response—constant Q transform-based feature representations. In: Proceedings of European signal processing conference (EUSIPCO) Alam M, Kenny P (2017) Spoofing detection employing infinite impulse response—constant Q transform-based feature representations. In: Proceedings of European signal processing conference (EUSIPCO)
149.
go back to reference Cancela P, Rocamora M, López E (2009) An efficient multi-resolution spectral transform for music analysis. In: Proceedings of international society for music information retrieval conference, pp 309–314 Cancela P, Rocamora M, López E (2009) An efficient multi-resolution spectral transform for music analysis. In: Proceedings of international society for music information retrieval conference, pp 309–314
151.
go back to reference Goodfellow I, Bengio Y, Courville A, Bengio Y (2016) Deep learning. MIT Press, CambridgeMATH Goodfellow I, Bengio Y, Courville A, Bengio Y (2016) Deep learning. MIT Press, CambridgeMATH
152.
go back to reference Tian Y, Cai M, He L, Liu J (2015) Investigation of bottleneck features and multilingual deep neural networks for speaker verification. In: Proceedings of Interspeech, pp 1151–1155 Tian Y, Cai M, He L, Liu J (2015) Investigation of bottleneck features and multilingual deep neural networks for speaker verification. In: Proceedings of Interspeech, pp 1151–1155
153.
go back to reference Richardson F, Reynolds D, Dehak N (2015) Deep neural network approaches to speaker and language recognition. IEEE Signal Process Lett 22(10):1671–1675CrossRef Richardson F, Reynolds D, Dehak N (2015) Deep neural network approaches to speaker and language recognition. IEEE Signal Process Lett 22(10):1671–1675CrossRef
154.
go back to reference Hinton G, Deng L, Yu D, Dahl GE, Mohamed RA, Jaitly N, Senior A, Vanhoucke V, Nguyen P, Sainath TN, Kingsbury B (2012) Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Signal Process Mag 29(6):82–97CrossRef Hinton G, Deng L, Yu D, Dahl GE, Mohamed RA, Jaitly N, Senior A, Vanhoucke V, Nguyen P, Sainath TN, Kingsbury B (2012) Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Signal Process Mag 29(6):82–97CrossRef
155.
go back to reference Alam M, Kenny P, Gupta V, Stafylakis T (2016) Spoofing detection on the ASVspoof2015 challenge corpus employing deep neural networks. In: Proceedings of Odyssey: the Speaker and Language Recognition Workshop, Bilbao, Spain, pp 270–276 Alam M, Kenny P, Gupta V, Stafylakis T (2016) Spoofing detection on the ASVspoof2015 challenge corpus employing deep neural networks. In: Proceedings of Odyssey: the Speaker and Language Recognition Workshop, Bilbao, Spain, pp 270–276
156.
go back to reference Qian Y, Chen N, Yu K (2016) Deep features for automatic spoofing detection. Speech Commun 85:43–52CrossRef Qian Y, Chen N, Yu K (2016) Deep features for automatic spoofing detection. Speech Commun 85:43–52CrossRef
157.
go back to reference Yu H, Tan ZH, Zhang Y, Ma Z, Guo J (2017) DNN filter bank cepstral coefficients for spoofing detection. IEEE Access 5:4779–4787CrossRef Yu H, Tan ZH, Zhang Y, Ma Z, Guo J (2017) DNN filter bank cepstral coefficients for spoofing detection. IEEE Access 5:4779–4787CrossRef
161.
go back to reference Pal M, Paul D, Saha G (2018) Synthetic speech detection using fundamental frequency variation and spectral features. Comput Speech Lang 48:31–50CrossRef Pal M, Paul D, Saha G (2018) Synthetic speech detection using fundamental frequency variation and spectral features. Comput Speech Lang 48:31–50CrossRef
162.
go back to reference Laskowski K, Heldner M, Edlund J (2008) The fundamental frequency variation spectrum. Proc FONETIK 2008:29–32 Laskowski K, Heldner M, Edlund J (2008) The fundamental frequency variation spectrum. Proc FONETIK 2008:29–32
163.
go back to reference Saratxaga I, Sanchez J, Wu Z, Hernaez I, Navas E (2016) Synthetic speech detection using phase information. Speech Commun 81:30–41CrossRef Saratxaga I, Sanchez J, Wu Z, Hernaez I, Navas E (2016) Synthetic speech detection using phase information. Speech Commun 81:30–41CrossRef
164.
go back to reference Wang L, Nakagawa S, Zhang Z, Yoshida Y, Kawakami Y (2017) Spoofing speech detection using modified relative phase information. IEEE J Sel Top Signal Process 11(4):660–670CrossRef Wang L, Nakagawa S, Zhang Z, Yoshida Y, Kawakami Y (2017) Spoofing speech detection using modified relative phase information. IEEE J Sel Top Signal Process 11(4):660–670CrossRef
165.
go back to reference Chakroborty S, Saha G (2009) Improved text-independent speaker identification using fused MFCC & IMFCC feature sets based on Gaussian filter. Int J Signal Process 5(1):11–19 Chakroborty S, Saha G (2009) Improved text-independent speaker identification using fused MFCC & IMFCC feature sets based on Gaussian filter. Int J Signal Process 5(1):11–19
166.
go back to reference Wu X, He R, Sun Z, Tan T (2018) A light CNN for deep face representation with noisy labels. IEEE Trans Inf Forensics Secur 13(11):2884–2896CrossRef Wu X, He R, Sun Z, Tan T (2018) A light CNN for deep face representation with noisy labels. IEEE Trans Inf Forensics Secur 13(11):2884–2896CrossRef
168.
go back to reference Paul D, Pal M, Saha G (2016) Novel speech features for improved detection of spoofing attacks. In: Proceedings of annual IEEE India conference (INDICON) Paul D, Pal M, Saha G (2016) Novel speech features for improved detection of spoofing attacks. In: Proceedings of annual IEEE India conference (INDICON)
169.
go back to reference Dehak N, Kenny P, Dehak R, Dumouchel P, Ouellet P (2011) Front-end factor analysis for speaker verification. IEEE Trans Audio Speech Lang Process 19(4):788–798CrossRef Dehak N, Kenny P, Dehak R, Dumouchel P, Ouellet P (2011) Front-end factor analysis for speaker verification. IEEE Trans Audio Speech Lang Process 19(4):788–798CrossRef
170.
go back to reference Khoury E, Kinnunen T, Sizov A, Wu Z, Marcel S (2014) Introducing i-vectors for joint anti-spoofing and speaker verification. In: Proceedings of Interspeech Khoury E, Kinnunen T, Sizov A, Wu Z, Marcel S (2014) Introducing i-vectors for joint anti-spoofing and speaker verification. In: Proceedings of Interspeech
171.
go back to reference Sizov A, Khoury E, Kinnunen T, Wu Z, Marcel S (2015) Joint speaker verification and antispoofing in the i-vector space. IEEE Trans Inf Forensics Secur 10(4):821–832CrossRef Sizov A, Khoury E, Kinnunen T, Wu Z, Marcel S (2015) Joint speaker verification and antispoofing in the i-vector space. IEEE Trans Inf Forensics Secur 10(4):821–832CrossRef
172.
go back to reference Hanilçi C (2018) Data selection for i-vector based automatic speaker verification anti-spoofing. Digit Signal Process 72:171–180CrossRef Hanilçi C (2018) Data selection for i-vector based automatic speaker verification anti-spoofing. Digit Signal Process 72:171–180CrossRef
173.
go back to reference Tian X, Wu Z, Xiao X, Chng E, Li H (2016) Spoofing detection from a feature representation perspective. In: Proceedings of IEEE international conference on acoustics, speech, and signal processing (ICASSP), pp 2119–2123 Tian X, Wu Z, Xiao X, Chng E, Li H (2016) Spoofing detection from a feature representation perspective. In: Proceedings of IEEE international conference on acoustics, speech, and signal processing (ICASSP), pp 2119–2123
174.
go back to reference Yu H, Tan ZH, Ma Z, Martin R, Guo J (2018) Spoofing detection in automatic speaker verification systems using dnn classifiers and dynamic acoustic features. IEEE Trans Neural Netw Learn Syst PP(99):1–12 Yu H, Tan ZH, Ma Z, Martin R, Guo J (2018) Spoofing detection in automatic speaker verification systems using dnn classifiers and dynamic acoustic features. IEEE Trans Neural Netw Learn Syst PP(99):1–12
175.
go back to reference Dinkel H, Chen N, Qian Y, Yu K (2017) End-to-end spoofing detection with raw waveform cldnns. In: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp 4860–4864 Dinkel H, Chen N, Qian Y, Yu K (2017) End-to-end spoofing detection with raw waveform cldnns. In: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp 4860–4864
176.
go back to reference Sainath T, Weiss R, Senior A, Wilson K, Vinyals O (2015) Learning the speech front-end with raw waveform CLDNNs. In: Proceedigns of Interspeech Sainath T, Weiss R, Senior A, Wilson K, Vinyals O (2015) Learning the speech front-end with raw waveform CLDNNs. In: Proceedigns of Interspeech
177.
go back to reference Zhang C, Yu C, Hansen JHL (2017) An investigation of deep-learning frameworks for speaker verification antispoofing. IEEE J Sel Top Signal Process 11(4):684–694CrossRef Zhang C, Yu C, Hansen JHL (2017) An investigation of deep-learning frameworks for speaker verification antispoofing. IEEE J Sel Top Signal Process 11(4):684–694CrossRef
178.
go back to reference Muckenhirn H, Magimai-Doss M, Marcel S (2017) End-to-end convolutional neural network-based voice presentation attack detection. In: 2017 IEEE international joint conference on biometrics (IJCB), pp 335–341 Muckenhirn H, Magimai-Doss M, Marcel S (2017) End-to-end convolutional neural network-based voice presentation attack detection. In: 2017 IEEE international joint conference on biometrics (IJCB), pp 335–341
179.
go back to reference Chen S, Ren K, Piao S, Wang C, Wang Q, Weng J, Su L, Mohaisen A (2017) You can hear but you cannot steal: Defending against voice impersonation attacks on smartphones. In: 2017 IEEE 37th international conference on distributed computing systems (ICDCS). IEEE, pp 183–195 Chen S, Ren K, Piao S, Wang C, Wang Q, Weng J, Su L, Mohaisen A (2017) You can hear but you cannot steal: Defending against voice impersonation attacks on smartphones. In: 2017 IEEE 37th international conference on distributed computing systems (ICDCS). IEEE, pp 183–195
180.
go back to reference Shiota S, Villavicencio F, Yamagishi J, Ono N, Echizen I, Matsui T (2015) Voice liveness detection algorithms based on pop noise caused by human breath for automatic speaker verification. In: Proceedings of Interspeech Shiota S, Villavicencio F, Yamagishi J, Ono N, Echizen I, Matsui T (2015) Voice liveness detection algorithms based on pop noise caused by human breath for automatic speaker verification. In: Proceedings of Interspeech
181.
go back to reference Shiota S, Villavicencio F, Yamagishi J, Ono N, Echizen I, Matsui T (2016) Voice liveness detection for speaker verification based on a tandem single/double-channel pop noise detector. In: ODYSSEY Shiota S, Villavicencio F, Yamagishi J, Ono N, Echizen I, Matsui T (2016) Voice liveness detection for speaker verification based on a tandem single/double-channel pop noise detector. In: ODYSSEY
182.
go back to reference Sahidullah M, Thomsen D, Hautamäki R, Kinnunen T, Tan ZH, Parts R, Pitkänen M (2018) Robust voice liveness detection and speaker verification using throat microphones. IEEE/ACM Trans Audio Speech Lang Process 26(1):44–56CrossRef Sahidullah M, Thomsen D, Hautamäki R, Kinnunen T, Tan ZH, Parts R, Pitkänen M (2018) Robust voice liveness detection and speaker verification using throat microphones. IEEE/ACM Trans Audio Speech Lang Process 26(1):44–56CrossRef
183.
go back to reference Elko G, Meyer J, Backer S, Peissig J (2007) Electronic pop protection for microphones. In: 2007 IEEE workshop on applications of signal processing to audio and acoustics. IEEE, pp 46–49 Elko G, Meyer J, Backer S, Peissig J (2007) Electronic pop protection for microphones. In: 2007 IEEE workshop on applications of signal processing to audio and acoustics. IEEE, pp 46–49
184.
go back to reference Zhang L, Tan S, Yang J, Chen Y (2016) Voicelive: a phoneme localization based liveness detection for voice authentication on smartphones. In: Proceedings of the 2016 ACM SIGSAC conference on computer and communications security. ACM, pp 1080–1091 Zhang L, Tan S, Yang J, Chen Y (2016) Voicelive: a phoneme localization based liveness detection for voice authentication on smartphones. In: Proceedings of the 2016 ACM SIGSAC conference on computer and communications security. ACM, pp 1080–1091
185.
go back to reference Zhang L, Tan S, Yang J (2017) Hearing your voice is not enough: An articulatory gesture based liveness detection for voice authentication. In: Proceedings of the 2017 ACM SIGSAC conference on computer and communications security. ACM, pp 57–71 Zhang L, Tan S, Yang J (2017) Hearing your voice is not enough: An articulatory gesture based liveness detection for voice authentication. In: Proceedings of the 2017 ACM SIGSAC conference on computer and communications security. ACM, pp 57–71
186.
go back to reference Hanilçi C, Kinnunen T, Sahidullah M, Sizov A (2016) Spoofing detection goes noisy: an analysis of synthetic speech detection in the presence of additive noise. Speech Commun 85:83–97CrossRef Hanilçi C, Kinnunen T, Sahidullah M, Sizov A (2016) Spoofing detection goes noisy: an analysis of synthetic speech detection in the presence of additive noise. Speech Commun 85:83–97CrossRef
187.
go back to reference Yu H, Sarkar A, Thomsen D, Tan ZH, Ma Z, Guo J (2016) Effect of multi-condition training and speech enhancement methods on spoofing detection. In: Proceedings of international workshop on sensing, processing and learning for intelligent machines (SPLINE) Yu H, Sarkar A, Thomsen D, Tan ZH, Ma Z, Guo J (2016) Effect of multi-condition training and speech enhancement methods on spoofing detection. In: Proceedings of international workshop on sensing, processing and learning for intelligent machines (SPLINE)
188.
go back to reference Tian X, Wu Z, Xiao X, Chng E, Li H (2016) An investigation of spoofing speech detection under additive noise and reverberant conditions. In: Proceedings of Interspeech (2016) Tian X, Wu Z, Xiao X, Chng E, Li H (2016) An investigation of spoofing speech detection under additive noise and reverberant conditions. In: Proceedings of Interspeech (2016)
189.
go back to reference Delgado H, Todisco M, Evans N, Sahidullah M, Liu W, Alegre F, Kinnunen T, Fauve B (2017) Impact of bandwidth and channel variation on presentation attack detection for speaker verification. In: 2017 International conference of the biometrics special interest group (BIOSIG), pp 1–6 Delgado H, Todisco M, Evans N, Sahidullah M, Liu W, Alegre F, Kinnunen T, Fauve B (2017) Impact of bandwidth and channel variation on presentation attack detection for speaker verification. In: 2017 International conference of the biometrics special interest group (BIOSIG), pp 1–6
190.
go back to reference Qian Y, Chen N, Dinkel H, Wu Z (2017) Deep feature engineering for noise robust spoofing detection. IEEE/ACM Trans Audio Speech Lang Process 25(10):1942–1955CrossRef Qian Y, Chen N, Dinkel H, Wu Z (2017) Deep feature engineering for noise robust spoofing detection. IEEE/ACM Trans Audio Speech Lang Process 25(10):1942–1955CrossRef
191.
go back to reference Korshunov P, Marcel S (2016) Cross-database evaluation of audio-based spoofing detection systems. In: Proceedings of Interspeech Korshunov P, Marcel S (2016) Cross-database evaluation of audio-based spoofing detection systems. In: Proceedings of Interspeech
192.
go back to reference Paul D, Sahidullah M, Saha G (2017) Generalization of spoofing countermeasures: a case study with ASVspoof 2015 and BTAS 2016 corpora. In: Proceedigns of IEEE international conference on acoustics, speech, and signal processing (ICASSP). IEEE pp 2047–2051 Paul D, Sahidullah M, Saha G (2017) Generalization of spoofing countermeasures: a case study with ASVspoof 2015 and BTAS 2016 corpora. In: Proceedigns of IEEE international conference on acoustics, speech, and signal processing (ICASSP). IEEE pp 2047–2051
193.
go back to reference Lorenzo-Trueba J, Fang F, Wang X, Echizen I, Yamagishi J, Kinnunen T (2018) Can we steal your vocal identity from the Internet?: Initial investigation of cloning Obama’s voice using GAN, WaveNet and low-quality found data. In: Proceedings of Odyssey: the speaker and language recognition workshop Lorenzo-Trueba J, Fang F, Wang X, Echizen I, Yamagishi J, Kinnunen T (2018) Can we steal your vocal identity from the Internet?: Initial investigation of cloning Obama’s voice using GAN, WaveNet and low-quality found data. In: Proceedings of Odyssey: the speaker and language recognition workshop
194.
go back to reference Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In: Advances in neural information processing systems, pp 2672–2680 Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In: Advances in neural information processing systems, pp 2672–2680
195.
196.
go back to reference Sahidullah M, Delgado H, Todisco M, Yu H, Kinnunen T, Evans N, Tan ZH (2016) Integrated spoofing countermeasures and automatic speaker verification: an evaluation on ASVspoof 2015. In: Proceedings of Interspeech Sahidullah M, Delgado H, Todisco M, Yu H, Kinnunen T, Evans N, Tan ZH (2016) Integrated spoofing countermeasures and automatic speaker verification: an evaluation on ASVspoof 2015. In: Proceedings of Interspeech
197.
go back to reference Muckenhirn H, Korshunov P, Magimai-Doss M, Marcel S (2017) Long-term spectral statistics for voice presentation attack detection. IEEE/ACM Trans Audio Speech Lang Process 25(11):2098–2111CrossRef Muckenhirn H, Korshunov P, Magimai-Doss M, Marcel S (2017) Long-term spectral statistics for voice presentation attack detection. IEEE/ACM Trans Audio Speech Lang Process 25(11):2098–2111CrossRef
198.
go back to reference Sarkar A, Sahidullah M, Tan ZH, Kinnunen T (2017) Improving speaker verification performance in presence of spoofing attacks using out-of-domain spoofed data. In: Proceedings of Interspeech Sarkar A, Sahidullah M, Tan ZH, Kinnunen T (2017) Improving speaker verification performance in presence of spoofing attacks using out-of-domain spoofed data. In: Proceedings of Interspeech
199.
go back to reference Kinnunen T, Lee K, Delgado H, Evans N, Todisco M, Sahidullah M, Yamagishi J, Reynolds D (2018) t-DCF: a detection cost function for the tandem assessment of spoofing countermeasures and automatic speaker verification. In: Proceedings of Odyssey: the speaker and language recognition workshop Kinnunen T, Lee K, Delgado H, Evans N, Todisco M, Sahidullah M, Yamagishi J, Reynolds D (2018) t-DCF: a detection cost function for the tandem assessment of spoofing countermeasures and automatic speaker verification. In: Proceedings of Odyssey: the speaker and language recognition workshop
200.
go back to reference Todisco M, Delgado H, Lee K, Sahidullah M, Evans N, Kinnunen T, Yamagishi J (2018) Integrated presentation attack detection and automatic speaker verification: common features and Gaussian back-end fusion. In: Proceedings of Interspeech Todisco M, Delgado H, Lee K, Sahidullah M, Evans N, Kinnunen T, Yamagishi J (2018) Integrated presentation attack detection and automatic speaker verification: common features and Gaussian back-end fusion. In: Proceedings of Interspeech
201.
go back to reference Wu Z, De Leon P, Demiroglu C, Khodabakhsh A, King S, Ling ZH, Saito D, Stewart B, Toda T, Wester M, Yamagishi Y (2016) Anti-spoofing for text-independent speaker verification: an initial database, comparison of countermeasures, and human performance. IEEE/ACM Trans Audio Speech Lang Process 24(4):768–783CrossRef Wu Z, De Leon P, Demiroglu C, Khodabakhsh A, King S, Ling ZH, Saito D, Stewart B, Toda T, Wester M, Yamagishi Y (2016) Anti-spoofing for text-independent speaker verification: an initial database, comparison of countermeasures, and human performance. IEEE/ACM Trans Audio Speech Lang Process 24(4):768–783CrossRef
Metadata
Title
Introduction to Voice Presentation Attack Detection and Recent Advances
Authors
Md Sahidullah
Héctor Delgado
Massimiliano Todisco
Tomi Kinnunen
Nicholas Evans
Junichi Yamagishi
Kong-Aik Lee
Copyright Year
2019
DOI
https://doi.org/10.1007/978-3-319-92627-8_15