Skip to main content
Top

2016 | OriginalPaper | Chapter

5. Automatic Speaker Verification Performance Under Channel Distortions

Author : Laura Fernández Gallardo

Published in: Human and Automatic Speaker Recognition over Telecommunication Channels

Publisher: Springer Singapore

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Voice biometrics are frequently exposed to channel degradations of transmitted speech, which cause speaker recognition systems to perform poorly. Particularly, there may exist a severe mismatch between enrolment and test utterances when each of the transmissions presents different characteristics, causing an undesired increase of within-speaker variability. In this chapter, three typical automatic speaker verification systems are built and evaluated with speech data presenting different channel degradations in NB and in WB, in matched and in mismatched conditions. The three systems are GMM-UBM [218], JFA [151], and i-vectors [53]. The latter has been proven to offer state-of-the-art performance under the most recent NIST SRE conditions and is commonly employed in today’s speaker recognition applications [194, 229]. These systems have been generally tested employing extensive data released by the Linguistic Data Consortium (LDC), however limited to NB [149, 218, 229], and with mostly each speaker recorded over only one phone channel, which makes it difficult to choose appropriate training data in some NIST evaluations[148]. Unlike the purpose of the NIST challenges, the intention of this chapter is not to outperform existing speaker recognisers. Its objective is instead to offer a comparison among text-independent speaker verification results under different telephone channel distortions. The bandwidth, codec, and packet loss rate of speech transmissions varied among experiments. Commonly used speaker verification systems are employed and the benefits of extending the bandwidth evaluated. It is also examined in the second part of this chapter which distortions should be contained within the development data for an improved system performance.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Footnotes
1
Included in ITU-T G.191, Software tools for speech and audio coding standardization (2010). Source code available at http://​www.​itu.​int/​rec/​T-REC-G.​191/​_​page.​print, last accessed 7th July 2014.
 
2
The indicated bitrates at which the codecs operate correspond in every case to kbit/s. Hereafter, kbit/s will be omitted from the bitrate accompanying the codec name.
 
3
The G.723.1 is able to operate at a bitrate as low as 5.3 kbit/s, at the expense of requiring more processing power compared to the G.711 and resulting in lower quality speech.
 
4
The Speex codec was applied in NB and in WB mode. The indicated bitrates correspond to quality 10 in a scale of 0 to 10, described as “Completely transparent for voice, good quality music” for each bandwidth. More details can be found in http://​www.​speex.​org/​docs/​manual/​speex-manual.​pdf, last accessed 23rd October 2014.
 
5
David Vandyke (David.Vandyke@canberra.edu.au) was a Ph.D. candidate at the time the experiments of this chapter were conducted. His doctoral studies were supervised by Professor Michael Wagner at the University of Canberra, Australia.
 
Metadata
Title
Automatic Speaker Verification Performance Under Channel Distortions
Author
Laura Fernández Gallardo
Copyright Year
2016
Publisher
Springer Singapore
DOI
https://doi.org/10.1007/978-981-287-727-7_5