Skip to main content

2014 | OriginalPaper | Buchkapitel

7. Speaker Recognition Anti-spoofing

verfasst von : Nicholas Evans, Tomi Kinnunen, Junichi Yamagishi, Zhizheng Wu, Federico Alegre, Phillip De Leon

Erschienen in: Handbook of Biometric Anti-Spoofing

Verlag: Springer London

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Progress in the development of spoofing countermeasures for automatic speaker recognition is less advanced than equivalent work related to other biometric modalities. This chapter outlines the potential for even state-of-the-art automatic speaker recognition systems to be spoofed. While the use of a multitude of different datasets, protocols and metrics complicates the meaningful comparison of different vulnerabilities, we review previous work related to impersonation, replay, speech synthesis and voice conversion spoofing attacks. The article also presents an analysis of the early work to develop spoofing countermeasures. The literature shows that there is significant potential for automatic speaker verification systems to be spoofed, that significant further work is required to develop generalised countermeasures, that there is a need for standard datasets, evaluation protocols and metrics and that greater emphasis should be placed on text-dependent scenarios.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
4
In practice samples labelled as spoofing attacks cannot be fully discarded since so doing would unduly influence false reject and false acceptance rates calculated as a percentage of all accesses.
 
Literatur
1.
Zurück zum Zitat Evans N, Kinnunen T, Yamagishi J (2013) Spoofing and countermeasures for automatic speaker verification. In: Proceedings of interspeech, annual conference of the international speech communication association, Lyon, France Evans N, Kinnunen T, Yamagishi J (2013) Spoofing and countermeasures for automatic speaker verification. In: Proceedings of interspeech, annual conference of the international speech communication association, Lyon, France
2.
Zurück zum Zitat Pelecanos J, Sridharan S (2001) Feature warping for robust speaker verification. In: Proceedings of Odyssey 2001: the speaker and language recognition workshop, Crete, Greece, pp 213–218 Pelecanos J, Sridharan S (2001) Feature warping for robust speaker verification. In: Proceedings of Odyssey 2001: the speaker and language recognition workshop, Crete, Greece, pp 213–218
3.
Zurück zum Zitat Shriberg E, Ferrer L, Kajarekar S, Venkataraman A, Stolcke A (2005) Modeling prosodic feature sequences for speaker recognition. Speech Commun 46(3–4):455–472CrossRef Shriberg E, Ferrer L, Kajarekar S, Venkataraman A, Stolcke A (2005) Modeling prosodic feature sequences for speaker recognition. Speech Commun 46(3–4):455–472CrossRef
4.
Zurück zum Zitat Dehak N, Kenny P, Dumouchel P (2007) Modeling prosodic features with joint factor analysis for speaker verification. IEEE Trans Audio Speech Lang Process 15(7):2095–2103CrossRef Dehak N, Kenny P, Dumouchel P (2007) Modeling prosodic features with joint factor analysis for speaker verification. IEEE Trans Audio Speech Lang Process 15(7):2095–2103CrossRef
5.
Zurück zum Zitat Siddiq S, Kinnunen T, Vainio M, Werner S (2012) Intonational speaker verification: a study on parameters and performance under noisy conditions. In: Proceedings of IEEE international conference on acoustics, speech and signal process (ICASSP), Kyoto, Japan, pp 4777–4780 Siddiq S, Kinnunen T, Vainio M, Werner S (2012) Intonational speaker verification: a study on parameters and performance under noisy conditions. In: Proceedings of IEEE international conference on acoustics, speech and signal process (ICASSP), Kyoto, Japan, pp 4777–4780
6.
Zurück zum Zitat Kockmann M, Ferrer L, Burget L, Cěrnocký J (2011) i-vector fusion of prosodic and cepstral features for speaker verification. In: Proceedings of interspeech, annual conference of the international speech communication association, Florence, Italy, pp 265–268 Kockmann M, Ferrer L, Burget L, Cěrnocký J (2011) i-vector fusion of prosodic and cepstral features for speaker verification. In: Proceedings of interspeech, annual conference of the international speech communication association, Florence, Italy, pp 265–268
7.
Zurück zum Zitat Kinnunen T, Li H (2010) An overview of text-independent speaker recognition: from features to supervectors. Speech Commun 52(1):12–40CrossRef Kinnunen T, Li H (2010) An overview of text-independent speaker recognition: from features to supervectors. Speech Commun 52(1):12–40CrossRef
8.
Zurück zum Zitat Reynolds D, Rose R (1995) Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Trans Speech Audio Process 3:72–83CrossRef Reynolds D, Rose R (1995) Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Trans Speech Audio Process 3:72–83CrossRef
9.
Zurück zum Zitat Reynolds DA, Quatieri TF, Dunn RB (2000) Speaker verification using adapted Gaussian mixture models. Digital Signal Process 10(1):19–41CrossRef Reynolds DA, Quatieri TF, Dunn RB (2000) Speaker verification using adapted Gaussian mixture models. Digital Signal Process 10(1):19–41CrossRef
10.
Zurück zum Zitat Campbell WM, Sturim DE, Reynolds DA (2006) Support vector machines using GMM supervectors for speaker verification. IEEE Signal Process Lett 13(5):308–311CrossRef Campbell WM, Sturim DE, Reynolds DA (2006) Support vector machines using GMM supervectors for speaker verification. IEEE Signal Process Lett 13(5):308–311CrossRef
11.
Zurück zum Zitat Solomonoff A, Campbell W, Boardman I (2005) Advances in channel compensation for SVM speaker recognition. In: Proceedings of IEEE international conference on acoustics, speech and signal process (ICASSP), pp 629–632, Philadelphia, USA Solomonoff A, Campbell W, Boardman I (2005) Advances in channel compensation for SVM speaker recognition. In: Proceedings of IEEE international conference on acoustics, speech and signal process (ICASSP), pp 629–632, Philadelphia, USA
12.
Zurück zum Zitat Burget L, Matějka P, Schwarz P, Glembek O, Černocký J (2007) Analysis of feature extraction and channel compensation in a GMM speaker recognition system. IEEE Trans Audio Speech Lang Process 15(7):1979–1986CrossRef Burget L, Matějka P, Schwarz P, Glembek O, Černocký J (2007) Analysis of feature extraction and channel compensation in a GMM speaker recognition system. IEEE Trans Audio Speech Lang Process 15(7):1979–1986CrossRef
13.
Zurück zum Zitat Hatch AO, Kajarekar S, Stolcke A (2006) Within-class covariance normalization for svm-based speaker recognition. In: Proceedings of IEEE international conference on spoken language process (ICSLP), pp 1471–1474 Hatch AO, Kajarekar S, Stolcke A (2006) Within-class covariance normalization for svm-based speaker recognition. In: Proceedings of IEEE international conference on spoken language process (ICSLP), pp 1471–1474
14.
Zurück zum Zitat Kenny, P (2006) Joint factor analysis of speaker and session variability: theory and algorithms. technical report CRIM-06/08-14 Kenny, P (2006) Joint factor analysis of speaker and session variability: theory and algorithms. technical report CRIM-06/08-14
15.
Zurück zum Zitat Kenny P, Boulianne G, Ouellet P, Dumouchel P (2007) Speaker and session variability in GMM-based speaker verification. IEEE Trans Audio Speech Lang Process 15(4):1448–1460CrossRef Kenny P, Boulianne G, Ouellet P, Dumouchel P (2007) Speaker and session variability in GMM-based speaker verification. IEEE Trans Audio Speech Lang Process 15(4):1448–1460CrossRef
16.
Zurück zum Zitat Kenny P, Ouellet P, Dehak N, Gupta V, Dumouchel P (2008) A study of inter-speaker variability in speaker verification. IEEE Trans Audio Speech Lang Process 16(5):980–988CrossRef Kenny P, Ouellet P, Dehak N, Gupta V, Dumouchel P (2008) A study of inter-speaker variability in speaker verification. IEEE Trans Audio Speech Lang Process 16(5):980–988CrossRef
17.
Zurück zum Zitat Dehak N, Kenny P, Dehak R, Dumouchel P, Ouellet P (2011) Front-end factor analysis for speaker verification. IEEE Trans Audio Speech Lang Process 19(4):788–798CrossRef Dehak N, Kenny P, Dehak R, Dumouchel P, Ouellet P (2011) Front-end factor analysis for speaker verification. IEEE Trans Audio Speech Lang Process 19(4):788–798CrossRef
18.
Zurück zum Zitat Li P, Fu Y, Mohammed U, Elder JH, Prince SJ (2012) Probabilistic models for inference about identity. IEEE Trans Pattern Anal Mach Intell 34(1):144–157CrossRef Li P, Fu Y, Mohammed U, Elder JH, Prince SJ (2012) Probabilistic models for inference about identity. IEEE Trans Pattern Anal Mach Intell 34(1):144–157CrossRef
19.
Zurück zum Zitat Garcia-Romero D, Espy-Wilson CY (2011) Analysis of i-vector length normalization in speaker recognition systems. In: Proceedings of interspeech, annual conference of the international speech communication association, Florence, Italy, pp 249–252 Garcia-Romero D, Espy-Wilson CY (2011) Analysis of i-vector length normalization in speaker recognition systems. In: Proceedings of interspeech, annual conference of the international speech communication association, Florence, Italy, pp 249–252
20.
Zurück zum Zitat Kinnunen T, Wu ZZ, Lee KA, Sedlak F, Chng ES, Li H (2012) Vulnerability of speaker verification systems against voice conversion spoofing attacks: the case of telephone speech. In: Proceedings of IEEE international conference on acoustics speech and signal process (ICASSP), pp 4401–4404 Kinnunen T, Wu ZZ, Lee KA, Sedlak F, Chng ES, Li H (2012) Vulnerability of speaker verification systems against voice conversion spoofing attacks: the case of telephone speech. In: Proceedings of IEEE international conference on acoustics speech and signal process (ICASSP), pp 4401–4404
21.
Zurück zum Zitat Saeidi R et al (2013) I4U submission to NIST SRE 2012: a large-scale collaborative effort for noise-robust speaker verification. In: Proceedings of interspeech, annual conference of the international speech communication association, Lyon, France Saeidi R et al (2013) I4U submission to NIST SRE 2012: a large-scale collaborative effort for noise-robust speaker verification. In: Proceedings of interspeech, annual conference of the international speech communication association, Lyon, France
22.
Zurück zum Zitat Brümmer N, Burget L, Černocký J, Glembek O, Grézl F, Karafiát M, Leeuwen D, Matějka P, Schwartz P, Strasheim A (2007) Fusion of heterogeneous speaker recognition systems in the STBU submission for the NIST speaker recognition evaluation 2006. IEEE Trans Audio Speech Lang Process 15(7):2072–2084CrossRef Brümmer N, Burget L, Černocký J, Glembek O, Grézl F, Karafiát M, Leeuwen D, Matějka P, Schwartz P, Strasheim A (2007) Fusion of heterogeneous speaker recognition systems in the STBU submission for the NIST speaker recognition evaluation 2006. IEEE Trans Audio Speech Lang Process 15(7):2072–2084CrossRef
23.
Zurück zum Zitat Hautamäki V, Kinnunen T, Sedlák F, Lee KA, Ma B, Li H (2013) Sparse classifier fusion for speaker verification. IEEE Trans Audio Speech Lang Process 21(8):1622–1631CrossRef Hautamäki V, Kinnunen T, Sedlák F, Lee KA, Ma B, Li H (2013) Sparse classifier fusion for speaker verification. IEEE Trans Audio Speech Lang Process 21(8):1622–1631CrossRef
24.
Zurück zum Zitat Akhtar Z, Fumera G, Marcialis GL, Roli F (2012) Evaluation of serial and parallel multibiometric systems under spoong attacks. In: Proceedings of 5th Int. Conference on biometrics (ICB 2012), pp 283–288, New Delhi, India Akhtar Z, Fumera G, Marcialis GL, Roli F (2012) Evaluation of serial and parallel multibiometric systems under spoong attacks. In: Proceedings of 5th Int. Conference on biometrics (ICB 2012), pp 283–288, New Delhi, India
25.
Zurück zum Zitat Lau YW, Wagner M, Tran D (2004) Vulnerability of speaker verification to voice mimicking. In: Proceedings of 2004 international symposium on Intelligent multimedia, video and speech processing, 2004. IEEE, pp 145–148 Lau YW, Wagner M, Tran D (2004) Vulnerability of speaker verification to voice mimicking. In: Proceedings of 2004 international symposium on Intelligent multimedia, video and speech processing, 2004. IEEE, pp 145–148
26.
Zurück zum Zitat Lau Y, Tran D, Wagner M (2005) Testing voice mimicry with the yoho speaker verification corpus. Knowledge-based intelligent information and engineering systems. Springer, Berlin, p 907 Lau Y, Tran D, Wagner M (2005) Testing voice mimicry with the yoho speaker verification corpus. Knowledge-based intelligent information and engineering systems. Springer, Berlin, p 907
27.
Zurück zum Zitat Mariéthoz J, Bengio S (2005) Can a professional imitator fool a GMM-based speaker verification system? IDIAP Research Report 05–61 Mariéthoz J, Bengio S (2005) Can a professional imitator fool a GMM-based speaker verification system? IDIAP Research Report 05–61
29.
Zurück zum Zitat Zetterholm E, Blomberg M, Elenius D (2004) A comparison between human perception and a speaker verification system score of a voice imitation. In: Proceedings of tenth australian international conference on speech science and technology, Macquarie University, Sydney, Australia, pp 393–397 Zetterholm E, Blomberg M, Elenius D (2004) A comparison between human perception and a speaker verification system score of a voice imitation. In: Proceedings of tenth australian international conference on speech science and technology, Macquarie University, Sydney, Australia, pp 393–397
30.
Zurück zum Zitat Farrús M, Wagner M, Anguita J, Hernando J (2008) How vulnerable are prosodic features to professional imitators? In: The speaker and language recognition workshop (Odyssey 2008), Stellenbosch, South Africa Farrús M, Wagner M, Anguita J, Hernando J (2008) How vulnerable are prosodic features to professional imitators? In: The speaker and language recognition workshop (Odyssey 2008), Stellenbosch, South Africa
31.
Zurück zum Zitat Kitamura T (2008) Acoustic analysis of imitated voice produced by a professional impersonator. In: Proceedings of interspeech, annual conference of the international speech communication association, Brisbane, Australia, pp 813–816 Kitamura T (2008) Acoustic analysis of imitated voice produced by a professional impersonator. In: Proceedings of interspeech, annual conference of the international speech communication association, Brisbane, Australia, pp 813–816
32.
Zurück zum Zitat Perrot P, Aversano G, Blouet R, Charbit M, Chollet G (2005) Voice forgery using ALISP: indexation in a client memory. In: Proceedings of IEEE international conference on acoustics, speech and signal process (ICASSP), vol 1, pp 17–20 Perrot P, Aversano G, Blouet R, Charbit M, Chollet G (2005) Voice forgery using ALISP: indexation in a client memory. In: Proceedings of IEEE international conference on acoustics, speech and signal process (ICASSP), vol 1, pp 17–20
33.
Zurück zum Zitat Lindberg J, Blomberg M et al (1999) Vulnerability in speaker verification-a study of technical impostor techniques. Proc Eur Conf speech Commun Technol 3:1211–1214 Lindberg J, Blomberg M et al (1999) Vulnerability in speaker verification-a study of technical impostor techniques. Proc Eur Conf speech Commun Technol 3:1211–1214
34.
Zurück zum Zitat Villalba J, Lleida E (2010) Speaker verification performance degradation against spoofing and tampering attacks. In: FALA 10 workshop, pp 131–134 Villalba J, Lleida E (2010) Speaker verification performance degradation against spoofing and tampering attacks. In: FALA 10 workshop, pp 131–134
35.
Zurück zum Zitat Wang ZF, Wei G, He QH (2011) Channel pattern noise based playback attack detection algorithm for speaker recognition. Int Conf Mach Learn Cybern (ICMLC) 4:1708–1713 Wang ZF, Wei G, He QH (2011) Channel pattern noise based playback attack detection algorithm for speaker recognition. Int Conf Mach Learn Cybern (ICMLC) 4:1708–1713
36.
Zurück zum Zitat Shang W, Stevenson M (2010) Score normalization in playback attack detection. In: Proceedings of IEEE international conference on acoustics, speech and signal process (ICASSP), pp 1678–1681 Shang W, Stevenson M (2010) Score normalization in playback attack detection. In: Proceedings of IEEE international conference on acoustics, speech and signal process (ICASSP), pp 1678–1681
37.
Zurück zum Zitat Villalba J, Lleida E (2011) Preventing replay attacks on speaker verification systems. In: Proceedings of the IEEE international carnahan conference on security technology, (ICCST) 2011, pp 1–8 Villalba J, Lleida E (2011) Preventing replay attacks on speaker verification systems. In: Proceedings of the IEEE international carnahan conference on security technology, (ICCST) 2011, pp 1–8
38.
Zurück zum Zitat Klatt DH (1980) Software for a cascade/parallel formant synthesizer. J Acoust Soc Am 67:971–995CrossRef Klatt DH (1980) Software for a cascade/parallel formant synthesizer. J Acoust Soc Am 67:971–995CrossRef
39.
Zurück zum Zitat Moulines E, Charpentier F (1990) Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones. Speech Commun 9:453–467CrossRef Moulines E, Charpentier F (1990) Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones. Speech Commun 9:453–467CrossRef
40.
Zurück zum Zitat Hunt A, Black AW (1996) Unit selection in a concatenative speech synthesis system using a large speech database. In: Proceedings of IEEE international conference on acoustics, speech and signal process (ICASSP), pp 373–376 Hunt A, Black AW (1996) Unit selection in a concatenative speech synthesis system using a large speech database. In: Proceedings of IEEE international conference on acoustics, speech and signal process (ICASSP), pp 373–376
41.
Zurück zum Zitat Breen A, Jackson P (1998) A phonologically motivated method of selecting nonuniform units. In: Proceedings of IEEE international conference on spoken language process (ICSLP), pp 2735–2738 Breen A, Jackson P (1998) A phonologically motivated method of selecting nonuniform units. In: Proceedings of IEEE international conference on spoken language process (ICSLP), pp 2735–2738
42.
Zurück zum Zitat Donovan RE, Eide EM (1998) The IBM trainable speech synthesis system. In: Proceedings of IEEE international conference on spoken language process (ICSLP), pp 1703–1706 Donovan RE, Eide EM (1998) The IBM trainable speech synthesis system. In: Proceedings of IEEE international conference on spoken language process (ICSLP), pp 1703–1706
43.
Zurück zum Zitat Beutnagel B, Conkie A, Schroeter J, Stylianou Y, Syrdal A (1999) The AT&T next-gen TTS system. In: Proceedings of joint ASA, EAA and DAEA meeting, pp 15–19 Beutnagel B, Conkie A, Schroeter J, Stylianou Y, Syrdal A (1999) The AT&T next-gen TTS system. In: Proceedings of joint ASA, EAA and DAEA meeting, pp 15–19
44.
Zurück zum Zitat Coorman G, Fackrell J, Rutten P, Coile B (2000) Segment selection in the L & H realspeak laboratory TTS system. In: Proceedings of international conference on speech and language processing, pp 395–398 Coorman G, Fackrell J, Rutten P, Coile B (2000) Segment selection in the L & H realspeak laboratory TTS system. In: Proceedings of international conference on speech and language processing, pp 395–398
45.
Zurück zum Zitat Yoshimura T, Tokuda K, Masuko T, Kobayashi T, Kitamura T (1999) Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis. In: Proceedings of Eurospeech, ESCA European conference on speech communication and technology, pp 2347–2350 Yoshimura T, Tokuda K, Masuko T, Kobayashi T, Kitamura T (1999) Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis. In: Proceedings of Eurospeech, ESCA European conference on speech communication and technology, pp 2347–2350
46.
Zurück zum Zitat Ling ZH, Wu YJ, Wang YP, Qin L, Wang RH (2006) USTC system for blizzard challenge 2006 an improved HMM-based speech synthesis method. In: Proceedings of the blizzard challenge workshop Ling ZH, Wu YJ, Wang YP, Qin L, Wang RH (2006) USTC system for blizzard challenge 2006 an improved HMM-based speech synthesis method. In: Proceedings of the blizzard challenge workshop
47.
Zurück zum Zitat Black AW (2006) CLUSTERGEN: a statistical parametric synthesizer using trajectory modeling. In: Proceedings of interspeech, annual conference of the international speech communication association, pp 1762–1765 Black AW (2006) CLUSTERGEN: a statistical parametric synthesizer using trajectory modeling. In: Proceedings of interspeech, annual conference of the international speech communication association, pp 1762–1765
48.
Zurück zum Zitat Zen H, Toda T, Nakamura M, Tokuda K (2007) Details of the Nitech HMM-based speech synthesis system for the Blizzard Challenge 2005. IEICE Trans Inf Syst E90–D(1):325–333CrossRef Zen H, Toda T, Nakamura M, Tokuda K (2007) Details of the Nitech HMM-based speech synthesis system for the Blizzard Challenge 2005. IEICE Trans Inf Syst E90–D(1):325–333CrossRef
50.
Zurück zum Zitat Yamagishi J, Kobayashi T, Nakano Y, Ogata K, Isogai J (2009) Analysis of speaker adaptation algorithms for HMM-based speech synthesis and a constrained SMAPLR adaptation algorithm. IEEE Trans Speech Audio Lang Process 17(1):66–83CrossRef Yamagishi J, Kobayashi T, Nakano Y, Ogata K, Isogai J (2009) Analysis of speaker adaptation algorithms for HMM-based speech synthesis and a constrained SMAPLR adaptation algorithm. IEEE Trans Speech Audio Lang Process 17(1):66–83CrossRef
51.
Zurück zum Zitat Leggetter CJ, Woodland PC (1995) Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models. Comput Speech Lang 9:171–185CrossRef Leggetter CJ, Woodland PC (1995) Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models. Comput Speech Lang 9:171–185CrossRef
52.
Zurück zum Zitat Woodland PC (2001) Speaker adaptation for continuous density HMMs: A review. In: Proceedings of ISCA workshop on adaptation methods for speech recognition, p 119 Woodland PC (2001) Speaker adaptation for continuous density HMMs: A review. In: Proceedings of ISCA workshop on adaptation methods for speech recognition, p 119
53.
Zurück zum Zitat Foomany F, Hirschfield A, Ingleby M (2009) Toward a dynamic framework for security evaluation of voice verification systems. In: IEEE toronto international conference on science and technology for humanity (TIC-STH), pp 22–27. doi:10.1109/TIC-STH.2009.5444499 Foomany F, Hirschfield A, Ingleby M (2009) Toward a dynamic framework for security evaluation of voice verification systems. In: IEEE toronto international conference on science and technology for humanity (TIC-STH), pp 22–27. doi:10.​1109/​TIC-STH.​2009.​5444499
54.
Zurück zum Zitat Masuko T, Hitotsumatsu T, Tokuda K, Kobayashi T (1999) On the security of HMM-based speaker verification systems against imposture using synthetic speech. In: Proceedings of Eurospeech, ESCA European conference on speech communication and technology Masuko T, Hitotsumatsu T, Tokuda K, Kobayashi T (1999) On the security of HMM-based speaker verification systems against imposture using synthetic speech. In: Proceedings of Eurospeech, ESCA European conference on speech communication and technology
55.
Zurück zum Zitat Matsui T, Furui S (1995) Likelihood normalization for speaker verification using a phoneme- and speaker-independent model. Speech Commun 17(1–2):109–116CrossRef Matsui T, Furui S (1995) Likelihood normalization for speaker verification using a phoneme- and speaker-independent model. Speech Commun 17(1–2):109–116CrossRef
56.
Zurück zum Zitat Masuko T, Tokuda K, Kobayashi T, Imai S (1996) Speech synthesis using HMMs with dynamic features. In: Proceedings of IEEE international conference on acoustics, speech and signal process (ICASSP) Masuko T, Tokuda K, Kobayashi T, Imai S (1996) Speech synthesis using HMMs with dynamic features. In: Proceedings of IEEE international conference on acoustics, speech and signal process (ICASSP)
57.
Zurück zum Zitat Masuko T, Tokuda K, Kobayashi T, Imai S (1997) Voice characteristics conversion for HMM-based speech synthesis system. In: Proceedings of IEEE international conference on acoustics, speech and signal process (ICASSP) Masuko T, Tokuda K, Kobayashi T, Imai S (1997) Voice characteristics conversion for HMM-based speech synthesis system. In: Proceedings of IEEE international conference on acoustics, speech and signal process (ICASSP)
58.
Zurück zum Zitat De Leon PL, Pucher M, Yamagishi J, Hernaez I, Saratxaga I (2012) Evaluation of speaker verification security and detection of HMM-based synthetic speech. IEEE Trans Audio Speech Lang Process 20(8):2280–2290. doi:10.1109/TASL.2012.2201472 CrossRef De Leon PL, Pucher M, Yamagishi J, Hernaez I, Saratxaga I (2012) Evaluation of speaker verification security and detection of HMM-based synthetic speech. IEEE Trans Audio Speech Lang Process 20(8):2280–2290. doi:10.​1109/​TASL.​2012.​2201472 CrossRef
59.
Zurück zum Zitat Galou, G (2011) Synthetic voice forgery in the forensic context: a short tutorial. In: Forensic speech and audio analysis working group (ENFSI-FSAAWG), pp 1–3 Galou, G (2011) Synthetic voice forgery in the forensic context: a short tutorial. In: Forensic speech and audio analysis working group (ENFSI-FSAAWG), pp 1–3
60.
Zurück zum Zitat Satoh T, Masuko T, Kobayashi T, Tokuda K (2001) A robust speaker verification system against imposture using an HMM-based speech synthesis system. In: Proceedings of Eurospeech, ESCA European conference on speech technology Satoh T, Masuko T, Kobayashi T, Tokuda K (2001) A robust speaker verification system against imposture using an HMM-based speech synthesis system. In: Proceedings of Eurospeech, ESCA European conference on speech technology
61.
Zurück zum Zitat Chen LW, Guo W, Dai LR (2010) Speaker verification against synthetic speech. In: Proceedings of 7th international symposium on chinese spoken language processing (ISCSLP), pp 309–312 (29 Nov–3 Dec 2010). doi:10.1109/ISCSLP.2010.5684887 Chen LW, Guo W, Dai LR (2010) Speaker verification against synthetic speech. In: Proceedings of 7th international symposium on chinese spoken language processing (ISCSLP), pp 309–312 (29 Nov–3 Dec 2010). doi:10.​1109/​ISCSLP.​2010.​5684887
62.
Zurück zum Zitat Quatieri TF (2002) Discrete-time speech signal processing principles and practice. Prentice-hall, Inc Quatieri TF (2002) Discrete-time speech signal processing principles and practice. Prentice-hall, Inc
63.
Zurück zum Zitat Wu Z, Chng ES, Li H (2012) Detecting converted speech and natural speech for anti-spoofing attack in speaker recognition. In: Proceedings of interspeech, annual conference of the international speech communication association Wu Z, Chng ES, Li H (2012) Detecting converted speech and natural speech for anti-spoofing attack in speaker recognition. In: Proceedings of interspeech, annual conference of the international speech communication association
64.
Zurück zum Zitat Ogihara A, Unno H, Shiozakai A (2005) Discrimination method of synthetic speech using pitch frequency against synthetic speech falsification. IEICE Trans Fundam Electron Commun Comput Sci 88(1):280–286CrossRef Ogihara A, Unno H, Shiozakai A (2005) Discrimination method of synthetic speech using pitch frequency against synthetic speech falsification. IEICE Trans Fundam Electron Commun Comput Sci 88(1):280–286CrossRef
65.
Zurück zum Zitat De Leon PL, Stewart B, Yamagishi J (2012) Synthetic speech discrimination using pitch pattern statistics derived from image analysis. In: Proceedings of interspeech, annual conference of the international speech communication association, Portland, Oregon, USA De Leon PL, Stewart B, Yamagishi J (2012) Synthetic speech discrimination using pitch pattern statistics derived from image analysis. In: Proceedings of interspeech, annual conference of the international speech communication association, Portland, Oregon, USA
66.
Zurück zum Zitat Stylianou Y (2009) Voice transformation: a survey. In: Proceedings of IEEE international conference on acoustics speech and signal process (ICASSP), pp 3585–3588 Stylianou Y (2009) Voice transformation: a survey. In: Proceedings of IEEE international conference on acoustics speech and signal process (ICASSP), pp 3585–3588
67.
Zurück zum Zitat Pellom BL, Hansen JH (1999) An experimental study of speaker verification sensitivity to computer voice-altered imposters. In: Proceedings of IEEE international conference on acoustics, speech and signal process (ICASSP), vol 2, pp 837–840 Pellom BL, Hansen JH (1999) An experimental study of speaker verification sensitivity to computer voice-altered imposters. In: Proceedings of IEEE international conference on acoustics, speech and signal process (ICASSP), vol 2, pp 837–840
68.
Zurück zum Zitat Abe M, Nakamura S, Shikano K, Kuwabara H (1988) Voice conversion through vector quantization. In: Proceedings of IEEE international conference on acoustics, speech and signal process (ICASSP), pp 655–658 Abe M, Nakamura S, Shikano K, Kuwabara H (1988) Voice conversion through vector quantization. In: Proceedings of IEEE international conference on acoustics, speech and signal process (ICASSP), pp 655–658
69.
Zurück zum Zitat Arslan LM (1999) Speaker transformation algorithm using segmental codebooks (STASC). Speech Commun 28(3):211–226CrossRef Arslan LM (1999) Speaker transformation algorithm using segmental codebooks (STASC). Speech Commun 28(3):211–226CrossRef
70.
Zurück zum Zitat Kain A, Macon MW (1998) Spectral voice conversion for text-to-speech synthesis. In: Proceedings of IEEE international conference on acoustics, speech and signal process (ICASSP), vol 1, pp 285–288 Kain A, Macon MW (1998) Spectral voice conversion for text-to-speech synthesis. In: Proceedings of IEEE international conference on acoustics, speech and signal process (ICASSP), vol 1, pp 285–288
71.
Zurück zum Zitat Stylianou Y, Cappé O, Moulines E (1998) Continuous probabilistic transform for voice conversion. IEEE Trans Speech Audio Process 6(2):131–142CrossRef Stylianou Y, Cappé O, Moulines E (1998) Continuous probabilistic transform for voice conversion. IEEE Trans Speech Audio Process 6(2):131–142CrossRef
72.
Zurück zum Zitat Toda T, Black AW, Tokuda K (2007) Voice conversion based on maximum-likelihood estimation of spectral parameter trajectory. IEEE Trans Audio Speech Lang Process 15(8):2222–2235CrossRef Toda T, Black AW, Tokuda K (2007) Voice conversion based on maximum-likelihood estimation of spectral parameter trajectory. IEEE Trans Audio Speech Lang Process 15(8):2222–2235CrossRef
73.
Zurück zum Zitat Popa V, Silen H, Nurminen J, Gabbouj M (2012) Local linear transformation for voice conversion. In: Proceedings of IEEE international conference on acoustics, speech and signal process (ICASSP), pp 4517–4520 Popa V, Silen H, Nurminen J, Gabbouj M (2012) Local linear transformation for voice conversion. In: Proceedings of IEEE international conference on acoustics, speech and signal process (ICASSP), pp 4517–4520
74.
Zurück zum Zitat Chen Y, Chu M, Chang E, Liu J, Liu R (2003) Voice conversion with smoothed GMM and MAP adaptation. In: Proceedings of Eurospeech, ESCA European conference on speech communication and technology, pp 2413–2416 Chen Y, Chu M, Chang E, Liu J, Liu R (2003) Voice conversion with smoothed GMM and MAP adaptation. In: Proceedings of Eurospeech, ESCA European conference on speech communication and technology, pp 2413–2416
75.
Zurück zum Zitat Hwang HT, Tsao Y, Wang HM, Wang YR, Chen SH (2012) A study of mutual information for GMM-based spectral conversion. In: Proceedings of Interspeech, annual conference of the international speech communication association Hwang HT, Tsao Y, Wang HM, Wang YR, Chen SH (2012) A study of mutual information for GMM-based spectral conversion. In: Proceedings of Interspeech, annual conference of the international speech communication association
76.
Zurück zum Zitat Helander E, Virtanen T, Nurminen J, Gabbouj M (2010) Voice conversion using partial least squares regression. IEEE Trans Audio Speech Lang Process 18(5):912–921CrossRef Helander E, Virtanen T, Nurminen J, Gabbouj M (2010) Voice conversion using partial least squares regression. IEEE Trans Audio Speech Lang Process 18(5):912–921CrossRef
77.
Zurück zum Zitat Pilkington NC, Zen H, Gales MJ (2011) Gaussian process experts for voice conversion. In: Twelfth annual conference of the international speech communication association Pilkington NC, Zen H, Gales MJ (2011) Gaussian process experts for voice conversion. In: Twelfth annual conference of the international speech communication association
78.
Zurück zum Zitat Saito D, Yamamoto K, Minematsu N, Hirose K (2011) One-to-many voice conversion based on tensor representation of speaker space. In: Proceedings of Interspeech, annual conference of the international speech communication association, pp 653–656 Saito D, Yamamoto K, Minematsu N, Hirose K (2011) One-to-many voice conversion based on tensor representation of speaker space. In: Proceedings of Interspeech, annual conference of the international speech communication association, pp 653–656
79.
Zurück zum Zitat Zen H, Nankaku Y, Tokuda K (2011) Continuous stochastic feature mapping based on trajectory HMMs. IEEE Trans Audio Speech Lang Process 19(2):417–430CrossRef Zen H, Nankaku Y, Tokuda K (2011) Continuous stochastic feature mapping based on trajectory HMMs. IEEE Trans Audio Speech Lang Process 19(2):417–430CrossRef
80.
Zurück zum Zitat Wu Z, Kinnunen T, Chng ES, Li H (2012) Mixture of factor analyzers using priors from non-parallel speech for voice conversion. IEEE Signal Process Lett 19(12):914–917CrossRef Wu Z, Kinnunen T, Chng ES, Li H (2012) Mixture of factor analyzers using priors from non-parallel speech for voice conversion. IEEE Signal Process Lett 19(12):914–917CrossRef
81.
Zurück zum Zitat Saito D, Watanabe S, Nakamura A, Minematsu N (2012) Statistical voice conversion based on noisy channel model. IEEE Trans Audio Speech Lang Process 20(6):1784–1794CrossRef Saito D, Watanabe S, Nakamura A, Minematsu N (2012) Statistical voice conversion based on noisy channel model. IEEE Trans Audio Speech Lang Process 20(6):1784–1794CrossRef
82.
Zurück zum Zitat Narendranath M, Murthy HA, Rajendran S, Yegnanarayana B (1995) Transformation of formants for voice conversion using artificial neural networks. Speech commun 16(2):207–216CrossRef Narendranath M, Murthy HA, Rajendran S, Yegnanarayana B (1995) Transformation of formants for voice conversion using artificial neural networks. Speech commun 16(2):207–216CrossRef
83.
Zurück zum Zitat Desai S, Raghavendra EV, Yegnanarayana B, Black AW, Prahallad K (2009) Voice conversion using artificial neural networks. In: Proceedings of IEEE international conference on acoustics, speech and signal process (ICASSP), pp 3893–3896 Desai S, Raghavendra EV, Yegnanarayana B, Black AW, Prahallad K (2009) Voice conversion using artificial neural networks. In: Proceedings of IEEE international conference on acoustics, speech and signal process (ICASSP), pp 3893–3896
84.
Zurück zum Zitat Song P, Bao Y, Zhao L, Zou C (2011) Voice conversion using support vector regression. Electron Lett 47(18):1045–1046CrossRef Song P, Bao Y, Zhao L, Zou C (2011) Voice conversion using support vector regression. Electron Lett 47(18):1045–1046CrossRef
85.
Zurück zum Zitat Helander E, Silén H, Virtanen T, Gabbouj M (2012) Voice conversion using dynamic kernel partial least squares regression. IEEE Trans Audio Speech Lang Process 20(3):806–817CrossRef Helander E, Silén H, Virtanen T, Gabbouj M (2012) Voice conversion using dynamic kernel partial least squares regression. IEEE Trans Audio Speech Lang Process 20(3):806–817CrossRef
86.
Zurück zum Zitat Wu Z, Chng ES, Li H (2013) Conditional restricted boltzmann machine for voice conversion. In: The first IEEE china summit and international conference on signal and information processing (ChinaSIP) Wu Z, Chng ES, Li H (2013) Conditional restricted boltzmann machine for voice conversion. In: The first IEEE china summit and international conference on signal and information processing (ChinaSIP)
87.
Zurück zum Zitat Sundermann D, Ney H (2003) VTLN-based voice conversion. In: Proceedings of the 3rd IEEE international symposium on signal processing and information technology, 2003. ISSPIT 2003, pp 556–559 Sundermann D, Ney H (2003) VTLN-based voice conversion. In: Proceedings of the 3rd IEEE international symposium on signal processing and information technology, 2003. ISSPIT 2003, pp 556–559
88.
Zurück zum Zitat Erro D, Moreno A, Bonafonte A (2010) Voice conversion based on weighted frequency warping. IEEE Trans Audio Speech Lang Process 18(5):922–931CrossRef Erro D, Moreno A, Bonafonte A (2010) Voice conversion based on weighted frequency warping. IEEE Trans Audio Speech Lang Process 18(5):922–931CrossRef
89.
Zurück zum Zitat Erro D, Navas E, Hernaez I (2013) Parametric voice conversion based on bilinear frequency warping plus amplitude scaling. IEEE Trans Audio Speech Lang Process 21(3):556–566CrossRef Erro D, Navas E, Hernaez I (2013) Parametric voice conversion based on bilinear frequency warping plus amplitude scaling. IEEE Trans Audio Speech Lang Process 21(3):556–566CrossRef
90.
Zurück zum Zitat Gillet B, King S (2003) Transforming F0 contours. In: Proceedings of Eurospeech, ESCA European conference on speech communication and technology, pp 101–104 Gillet B, King S (2003) Transforming F0 contours. In: Proceedings of Eurospeech, ESCA European conference on speech communication and technology, pp 101–104
91.
Zurück zum Zitat Wu CH, Hsia CC, Liu TH, Wang JF (2006) Voice conversion using duration-embedded bi-HMMs for expressive speech synthesis. IEEE Trans Audio Speech Lang Process 14(4):1109–1116CrossRef Wu CH, Hsia CC, Liu TH, Wang JF (2006) Voice conversion using duration-embedded bi-HMMs for expressive speech synthesis. IEEE Trans Audio Speech Lang Process 14(4):1109–1116CrossRef
92.
Zurück zum Zitat Helander EE, Nurminen J (2007) A novel method for prosody prediction in voice conversion. In: Proceedings of IEEE international conference on acoustics, speech and signal process (ICASSP), pp IV-509 Helander EE, Nurminen J (2007) A novel method for prosody prediction in voice conversion. In: Proceedings of IEEE international conference on acoustics, speech and signal process (ICASSP), pp IV-509
93.
Zurück zum Zitat Wu ZZ, Kinnunen T, Chng ES, Li H (2010) Text-independent F0 transformation with non-parallel data for voice conversion. In: Eleventh annual conference of the international speech communication association Wu ZZ, Kinnunen T, Chng ES, Li H (2010) Text-independent F0 transformation with non-parallel data for voice conversion. In: Eleventh annual conference of the international speech communication association
94.
Zurück zum Zitat Lolive D, Barbot N, Boeffard O (2008) Pitch and duration transformation with non-parallel data. Speech prosody 2008:111–114 Lolive D, Barbot N, Boeffard O (2008) Pitch and duration transformation with non-parallel data. Speech prosody 2008:111–114
95.
Zurück zum Zitat Sundermann D, Hoge H, Bonafonte A, Ney H, Black A, Narayanan S (2006) Text-independent voice conversion based on unit selection. In: Proceedings of IEEE international conference on acoustics, speech and signal process (ICASSP), vol 1, pp I-I Sundermann D, Hoge H, Bonafonte A, Ney H, Black A, Narayanan S (2006) Text-independent voice conversion based on unit selection. In: Proceedings of IEEE international conference on acoustics, speech and signal process (ICASSP), vol 1, pp I-I
96.
Zurück zum Zitat Wu Z, Larcher A, Lee KA, Chng ES, Kinnunen T, Li H (2013) Vulnerability evaluation of speaker verication under voice conversion spoong: the effect of text constraints. In: Proceedings of interspeech, annual conference of the international speech communication association, Lyon, France Wu Z, Larcher A, Lee KA, Chng ES, Kinnunen T, Li H (2013) Vulnerability evaluation of speaker verication under voice conversion spoong: the effect of text constraints. In: Proceedings of interspeech, annual conference of the international speech communication association, Lyon, France
97.
Zurück zum Zitat Matrouf D, Bonastre JF, Fredouille C (2006) Effect of speech transformation on impostor acceptance. In: Proceedings of IEEE international conference on acoustics, speech and signal process (ICASSP), vol 1, pp I-I Matrouf D, Bonastre JF, Fredouille C (2006) Effect of speech transformation on impostor acceptance. In: Proceedings of IEEE international conference on acoustics, speech and signal process (ICASSP), vol 1, pp I-I
98.
Zurück zum Zitat Alegre F, Vipperla R, Evans N, Fauve B (2012) On the vulnerability of automatic speaker recognition to spoofing attacks with artificial signals. In: Proceedings of EURASIP Euro signal processing conference (EUSIPCO) Alegre F, Vipperla R, Evans N, Fauve B (2012) On the vulnerability of automatic speaker recognition to spoofing attacks with artificial signals. In: Proceedings of EURASIP Euro signal processing conference (EUSIPCO)
99.
Zurück zum Zitat Wu Z, Kinnunen T, Chng ES, Li H, Ambikairajah E (2012) A study on spoofing attack in state-of-the-art speaker verification: the telephone speech case. In: Signal and information processing association annual summit and conference (APSIPA ASC), 2012 Asia-Pacific, pp 1–5 Wu Z, Kinnunen T, Chng ES, Li H, Ambikairajah E (2012) A study on spoofing attack in state-of-the-art speaker verification: the telephone speech case. In: Signal and information processing association annual summit and conference (APSIPA ASC), 2012 Asia-Pacific, pp 1–5
100.
Zurück zum Zitat De Leon PL, Hernaez I, Saratxaga I, Pucher M, Yamagishi J (2011) Detection of synthetic speech for the problem of imposture. In: Proceedings of IEEE international conference on acoustic, speech and signal process (ICASSP), pp 4844–4847, Dallas, USA De Leon PL, Hernaez I, Saratxaga I, Pucher M, Yamagishi J (2011) Detection of synthetic speech for the problem of imposture. In: Proceedings of IEEE international conference on acoustic, speech and signal process (ICASSP), pp 4844–4847, Dallas, USA
101.
Zurück zum Zitat Alegre F, Vipperla R, Evans N, et al (2012) Spoofing countermeasures for the protection of automatic speaker recognition systems against attacks with artificial signals. In: Proceedings of interspeech, annual conference of the international speech communication association Alegre F, Vipperla R, Evans N, et al (2012) Spoofing countermeasures for the protection of automatic speaker recognition systems against attacks with artificial signals. In: Proceedings of interspeech, annual conference of the international speech communication association
102.
Zurück zum Zitat Alegre F, Amehraye A, Evans N (2013) Spoofing countermeasures to protect automatic speaker verification from voice conversion. In: Proceedings of IEEE international conference on acoustic, speech and signal process (ICASSP) Alegre F, Amehraye A, Evans N (2013) Spoofing countermeasures to protect automatic speaker verification from voice conversion. In: Proceedings of IEEE international conference on acoustic, speech and signal process (ICASSP)
103.
Zurück zum Zitat Wu Z, Xiao X, Chng ES, Li H (2013) Synthetic speech detection using temporal modulation feature. In: Proceedings of IEEE international conference on acoustic, speech and signal process (ICASSP) Wu Z, Xiao X, Chng ES, Li H (2013) Synthetic speech detection using temporal modulation feature. In: Proceedings of IEEE international conference on acoustic, speech and signal process (ICASSP)
104.
Zurück zum Zitat Alegre F, Vipperla R, Amehraye A, Evans N (2013) A new speaker verification spoofing countermeasure based on local binary patterns. In: Proceedings of interspeech, annual conference of the international speech communication association, Lyon, France Alegre F, Vipperla R, Amehraye A, Evans N (2013) A new speaker verification spoofing countermeasure based on local binary patterns. In: Proceedings of interspeech, annual conference of the international speech communication association, Lyon, France
105.
Zurück zum Zitat Hautamki RG, Kinnunen T, Hautamki V, Leino T, Laukkanen AM (2013) I-vectors meet imitators: on vulnerability of speaker verification systems against voice mimicry. In: Proceedings of interspeech, annual conference of the international speech communication association Hautamki RG, Kinnunen T, Hautamki V, Leino T, Laukkanen AM (2013) I-vectors meet imitators: on vulnerability of speaker verification systems against voice mimicry. In: Proceedings of interspeech, annual conference of the international speech communication association
106.
Zurück zum Zitat Martin A, Doddington G, Kamm T, Ordowski M, Przybocki M (1997) The DET curve in assessment of detection task performance. In: Proceedings of Eurospeech, ESCA European conference on speech communication and technology, pp 1895–1898 Martin A, Doddington G, Kamm T, Ordowski M, Przybocki M (1997) The DET curve in assessment of detection task performance. In: Proceedings of Eurospeech, ESCA European conference on speech communication and technology, pp 1895–1898
107.
Zurück zum Zitat Alegre F, Amehraye A, Evans N (2013) A one-class classification approach to generalised speaker verification spoofing countermeasures using local binary patterns. In: Proceedings of international conference on biometrics: theory, applications and systems (BTAS), Washington DC, USA Alegre F, Amehraye A, Evans N (2013) A one-class classification approach to generalised speaker verification spoofing countermeasures using local binary patterns. In: Proceedings of international conference on biometrics: theory, applications and systems (BTAS), Washington DC, USA
Metadaten
Titel
Speaker Recognition Anti-spoofing
verfasst von
Nicholas Evans
Tomi Kinnunen
Junichi Yamagishi
Zhizheng Wu
Federico Alegre
Phillip De Leon
Copyright-Jahr
2014
Verlag
Springer London
DOI
https://doi.org/10.1007/978-1-4471-6524-8_7