Skip to main content
Top

2012 | OriginalPaper | Chapter

6. Speaker Identification over Narrowband VoIP Networks

Authors : Hemant A. Patil, Ph.D., Aaron E. Cohen, Ph.D., Keshab K. Parhi, Ph.D.

Published in: Forensic Speaker Recognition

Publisher: Springer New York

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Automatic Speaker Recognition (ASR) has been an active area of research for the past four decades with speech collected mostly in research laboratory environments. However, due to growing applications and possible misuses of Voice over Internet Protocol (VoIP) networks, there is a need to employ robust ASR systems over VoIP networks, especially within the context of internet security and law enforcement activities. There is, however, little systematic study on analyzing effects of several artifacts of VoIP (such as speech codec, packet loss, packet reordering, network jitter and foreign-cross talk or echo) on performance of an ASR system. This chapter investigates each of the issues of VoIP individually and trades it with the performance of the ASR system. In this chapter, a narrowband 2.4 kbps mixed-excitation linear prediction (MELP) codec is used over a VoIP network.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Appendix
Available only for authorised users
Literature
1.
go back to reference Aggarwal C, Olshefski D, Saha D, Shae Z-Y, Yu P (2005) CSR: speaker recognition from compressed VoIP packet stream. IEEE Int. Conf. on Multimedia and Expo, ICME Amsterdam, The Netherlands, pp 970–973 Aggarwal C, Olshefski D, Saha D, Shae Z-Y, Yu P (2005) CSR: speaker recognition from compressed VoIP packet stream. IEEE Int. Conf. on Multimedia and Expo, ICME Amsterdam, The Netherlands, pp 970–973
2.
go back to reference Amino K, Arai T (2009) Speaker-dependent characteristics of the nasals. Forensic Sci Int 185(1–3):21–28CrossRef Amino K, Arai T (2009) Speaker-dependent characteristics of the nasals. Forensic Sci Int 185(1–3):21–28CrossRef
3.
go back to reference Analog-to-Digital Conversion of Voice by 2,400 BIT/Second Mixed Excitation Linear Prediction (MELP) MIL-STD-3005 Analog-to-Digital Conversion of Voice by 2,400 BIT/Second Mixed Excitation Linear Prediction (MELP) MIL-STD-3005
4.
go back to reference Atal BS (1974) Effectiveness of linear prediction of the speech wave for automatic speaker identification and verification. J Acoust Soc Am 55(6):1304–1312CrossRef Atal BS (1974) Effectiveness of linear prediction of the speech wave for automatic speaker identification and verification. J Acoust Soc Am 55(6):1304–1312CrossRef
5.
go back to reference Besacier L (2008) Speech coding and packet loss effects on speech and speaker recognition. In: Tan ZH, Lindberg B (eds) Automatic speech recognition on mobile devices and over communication networks. Springer, London, pp 27–39CrossRef Besacier L (2008) Speech coding and packet loss effects on speech and speaker recognition. In: Tan ZH, Lindberg B (eds) Automatic speech recognition on mobile devices and over communication networks. Springer, London, pp 27–39CrossRef
6.
go back to reference Besacier L, Grassi S, Dufaux A, Ansorge M, Pellandini F (2000) GSM speech coding and speaker recognition. ICASSP’00 2:1085–1088 Besacier L, Grassi S, Dufaux A, Ansorge M, Pellandini F (2000) GSM speech coding and speaker recognition. ICASSP’00 2:1085–1088
7.
go back to reference Besacier L, Mayorga P, Bonastre J-F, Fredoulile C, Meignier S (2003) Overview of compression and packet loss effects in speech biometrics. IEE Proc Vision Image Signal Process 150(6):372–376CrossRef Besacier L, Mayorga P, Bonastre J-F, Fredoulile C, Meignier S (2003) Overview of compression and packet loss effects in speech biometrics. IEE Proc Vision Image Signal Process 150(6):372–376CrossRef
8.
go back to reference Bimbot F, Bonastre J-F, Fredouille C, Gravier G, Magrin-Chagnolleau I, Meignier S, Merlin T, Ortega-Garcia J, Petrovska-Delacretaz D, Reynolds DA (2004) A tutorial on text-independent speaker verification. EURASIP J Appl Signal Process JASP 4:430–451CrossRef Bimbot F, Bonastre J-F, Fredouille C, Gravier G, Magrin-Chagnolleau I, Meignier S, Merlin T, Ortega-Garcia J, Petrovska-Delacretaz D, Reynolds DA (2004) A tutorial on text-independent speaker verification. EURASIP J Appl Signal Process JASP 4:430–451CrossRef
9.
go back to reference Blomberg M, Elenius D, Zetterholm E (2004) Speaker verification scores and acoustic analysis of a professional impersonator. Proc. FONETIK, Stockholm University Blomberg M, Elenius D, Zetterholm E (2004) Speaker verification scores and acoustic analysis of a professional impersonator. Proc. FONETIK, Stockholm University
10.
go back to reference Bocchieri E (2008) Fixed-point arithmetic. In: Tan ZH, Lindberg B (eds) Automatic speech recognition on mobile devices and over communication networks. Springer, London, pp 255–275CrossRef Bocchieri E (2008) Fixed-point arithmetic. In: Tan ZH, Lindberg B (eds) Automatic speech recognition on mobile devices and over communication networks. Springer, London, pp 255–275CrossRef
11.
go back to reference Boe LJ (2000) Forensic voice identification in France. Speech Commun 31(2–3):205–224CrossRef Boe LJ (2000) Forensic voice identification in France. Speech Commun 31(2–3):205–224CrossRef
12.
go back to reference Bogert BP, Healy MJR, Tukey JW (1963) The quefrency alanysis of time series for echoes: cepstrum, pseudo-autocovariance, cross-cepstrum, and saphe tracking. In: Rosenblatt M (ed) Time series analysis. Wiley, New York, pp 209–243 (Ch 15) Bogert BP, Healy MJR, Tukey JW (1963) The quefrency alanysis of time series for echoes: cepstrum, pseudo-autocovariance, cross-cepstrum, and saphe tracking. In: Rosenblatt M (ed) Time series analysis. Wiley, New York, pp 209–243 (Ch 15)
13.
go back to reference Bonastre J-F, Matrouf D, Fredouille C (2007) Artificial impostor voice transformation effects on false acceptance rate. Proc Interspeech, pp 2053–2056 Bonastre J-F, Matrouf D, Fredouille C (2007) Artificial impostor voice transformation effects on false acceptance rate. Proc Interspeech, pp 2053–2056
14.
go back to reference Borah DK, DeLeon P (2004) Speaker identification in the presence of packet loss. IEEE 11th Digital Signal Processing Workshop and IEEE Signal Processing Education Workshop, pp 302–306 Borah DK, DeLeon P (2004) Speaker identification in the presence of packet loss. IEEE 11th Digital Signal Processing Workshop and IEEE Signal Processing Education Workshop, pp 302–306
15.
go back to reference Campbell JP Jr (1997) Speaker recognition: a tutorial. Proc IEEE 85(9):1437–1462CrossRef Campbell JP Jr (1997) Speaker recognition: a tutorial. Proc IEEE 85(9):1437–1462CrossRef
16.
go back to reference Campbell WM, Assaleh KT, Broun CC (2002) Speaker recognition with polynomial classifiers. IEEE Trans Speech Audio Process 10(4):pp 205–212CrossRef Campbell WM, Assaleh KT, Broun CC (2002) Speaker recognition with polynomial classifiers. IEEE Trans Speech Audio Process 10(4):pp 205–212CrossRef
17.
go back to reference Campbell JP, Nakasone H, Cieri C, Miller D, Walker K, Martin AF, Przybocki MA (2004) The MMSR bilingual and cross channel corpora for speaker recognition research and evaluation. Proc. of the Speaker and Language Recognition Workshop, Odyssey’04, Toledo, Spain, pp 29–32 Campbell JP, Nakasone H, Cieri C, Miller D, Walker K, Martin AF, Przybocki MA (2004) The MMSR bilingual and cross channel corpora for speaker recognition research and evaluation. Proc. of the Speaker and Language Recognition Workshop, Odyssey’04, Toledo, Spain, pp 29–32
18.
go back to reference Campbell JP, Shen W, Campbell WM, Schwartz R, Bonastre J-F, Mastrouf D (2009) Forensic speaker recognition: a need for caution. IEEE Signal Process Mag 26(2):95–103CrossRef Campbell JP, Shen W, Campbell WM, Schwartz R, Bonastre J-F, Mastrouf D (2009) Forensic speaker recognition: a need for caution. IEEE Signal Process Mag 26(2):95–103CrossRef
19.
go back to reference Carmen P-M, Ascension G-A, Fernando D-M (2001) Recognizing voice over IP: a robust front-end for speech recognition on the world wide web. IEEE Trans Multimedia 3(2):209–218CrossRef Carmen P-M, Ascension G-A, Fernando D-M (2001) Recognizing voice over IP: a robust front-end for speech recognition on the world wide web. IEEE Trans Multimedia 3(2):209–218CrossRef
20.
go back to reference Carmona JL, Peinado AM, Pe’rez-Cordoba JL, Gomez AM, Sanchez V (2007) iLBC-based tansparametrization: a real alternative to DSR for speech recognition over packet networks. ICASSP, pp 961–964 Carmona JL, Peinado AM, Pe’rez-Cordoba JL, Gomez AM, Sanchez V (2007) iLBC-based tansparametrization: a real alternative to DSR for speech recognition over packet networks. ICASSP, pp 961–964
21.
go back to reference Cerf VG, Kahn RE (1974) A protocol for packet network interconnection. IEEE Trans Commun 22(5):637–648CrossRef Cerf VG, Kahn RE (1974) A protocol for packet network interconnection. IEEE Trans Commun 22(5):637–648CrossRef
22.
go back to reference Chen SH, Wang HC (2004) Improvement of speaker recognition by combining residual and prosodic features with acoustic features. Proc IEEE Int Conf Acoustics, Speech and Signal Processing, ICASSP’04, Montreal, Canada Chen SH, Wang HC (2004) Improvement of speaker recognition by combining residual and prosodic features with acoustic features. Proc IEEE Int Conf Acoustics, Speech and Signal Processing, ICASSP’04, Montreal, Canada
23.
go back to reference Chua TK, Pheanics DC (2006) QoS evaluation of sender-based loss-recovery techniques for VoIP, Nov--Dec 2006. IEEE Network, pp 14–21, Chua TK, Pheanics DC (2006) QoS evaluation of sender-based loss-recovery techniques for VoIP, Nov--Dec 2006. IEEE Network, pp 14–21,
24.
go back to reference Davidson J, Peters J (2000) Voice over IP fundamentals. Cisco Press Davidson J, Peters J (2000) Voice over IP fundamentals. Cisco Press
25.
go back to reference Davis SB, Mermelstein P (1980) Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans Acoust Speech Signal Process ASSP 28(4):357–366CrossRef Davis SB, Mermelstein P (1980) Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans Acoust Speech Signal Process ASSP 28(4):357–366CrossRef
26.
go back to reference Deriche M, Ning D (2006) A novel audio coding scheme using warped linear prediction model and the discrete wavelet transform. IEEE Trans Audio Speech Lang Proc 14(6):2039–2048CrossRef Deriche M, Ning D (2006) A novel audio coding scheme using warped linear prediction model and the discrete wavelet transform. IEEE Trans Audio Speech Lang Proc 14(6):2039–2048CrossRef
27.
go back to reference DETware: DET curve-plotting software for use with MATLAB, http://www.itl.nist.gov/iad/mig//tools/ DETware: DET curve-plotting software for use with MATLAB, http://​www.​itl.​nist.​gov/​iad/​mig/​/​tools/​
28.
go back to reference Dunn RB, Quatieri TF, Reynolds DA (2001) Speaker recognition from coded speech in matched and mismatched conditions. Proc. Speaker Recognition Workshop, 1, Grete, Greece, pp 115–120 Dunn RB, Quatieri TF, Reynolds DA (2001) Speaker recognition from coded speech in matched and mismatched conditions. Proc. Speaker Recognition Workshop, 1, Grete, Greece, pp 115–120
29.
go back to reference Doddington GR (1985) Speaker recognition-identifying people by their voices. Proc IEEE 73:1651–1664CrossRef Doddington GR (1985) Speaker recognition-identifying people by their voices. Proc IEEE 73:1651–1664CrossRef
30.
go back to reference Doddington GR, Przybocki MA, Martin AF, Reynolds DA (2000) The NIST speaker recognition evaluation—overview, methodology systems, results, perspective. Speech Commun 31:225–254CrossRef Doddington GR, Przybocki MA, Martin AF, Reynolds DA (2000) The NIST speaker recognition evaluation—overview, methodology systems, results, perspective. Speech Commun 31:225–254CrossRef
31.
go back to reference Duda RO, Hart PE, Stork DG (2001) Pattern classification and scene analysis, 2nd edition. Wiley, New York Duda RO, Hart PE, Stork DG (2001) Pattern classification and scene analysis, 2nd edition. Wiley, New York
32.
go back to reference Eriksson A (2010) The disguised voice: imitating accents or speech styles and impersonating individuals. In: Llamas C, Watt D (eds) Language and identities. Edinburgh University Press, Edinburgh, pp 86–96 Eriksson A (2010) The disguised voice: imitating accents or speech styles and impersonating individuals. In: Llamas C, Watt D (eds) Language and identities. Edinburgh University Press, Edinburgh, pp 86–96
33.
go back to reference Endres W, Bambach W, Flösser G (1971) Voice spectrograms as a function of age, voice disguise, and voice imitation. J Acoust Soc Am 49:1842–1848CrossRef Endres W, Bambach W, Flösser G (1971) Voice spectrograms as a function of age, voice disguise, and voice imitation. J Acoust Soc Am 49:1842–1848CrossRef
34.
go back to reference Fant G (1970) Acoustic theory of speech production. Mouton, The Hague Fant G (1970) Acoustic theory of speech production. Mouton, The Hague
35.
go back to reference Flanagan JL (1972) Speech analysis, synthesis and perception. Springer, BerlinCrossRef Flanagan JL (1972) Speech analysis, synthesis and perception. Springer, BerlinCrossRef
36.
go back to reference FEXT: open source program sox, http://sox.sourceforge.net/ FEXT: open source program sox, http://​sox.​sourceforge.​net/​
37.
go back to reference Gallardo-Antolin A, Pelaez-Moreno C, Diaz-De-Maria F (2005) Recognizing GSM digital speech. IEEE Trans Speech Audio Proc 13:1186–1205CrossRef Gallardo-Antolin A, Pelaez-Moreno C, Diaz-De-Maria F (2005) Recognizing GSM digital speech. IEEE Trans Speech Audio Proc 13:1186–1205CrossRef
38.
go back to reference Gish H, Schmidt M (1994) Text-independent speaker identification. IEEE Signal Process Mag 11:18–32CrossRef Gish H, Schmidt M (1994) Text-independent speaker identification. IEEE Signal Process Mag 11:18–32CrossRef
39.
go back to reference G´omez AM, Peinado AM, S´anchez V, Rubio AJ (2006) Recognition of coded speech transmitted over wireless channels. IEEE Trans Wireless Commun 5(9):2555–2562CrossRef G´omez AM, Peinado AM, S´anchez V, Rubio AJ (2006) Recognition of coded speech transmitted over wireless channels. IEEE Trans Wireless Commun 5(9):2555–2562CrossRef
40.
go back to reference Hair GD, Rekieta TW (1972) Mimic resistance of speaker verification using phoneme spectra. J Acoust Soc Am 51:131(A)CrossRef Hair GD, Rekieta TW (1972) Mimic resistance of speaker verification using phoneme spectra. J Acoust Soc Am 51:131(A)CrossRef
41.
go back to reference Harma A, Laine U (2001) A comparison of warped and conventional linear prediction coding. IEEE Trans Speech Audio Process 9(4):579–588CrossRef Harma A, Laine U (2001) A comparison of warped and conventional linear prediction coding. IEEE Trans Speech Audio Process 9(4):579–588CrossRef
42.
go back to reference Harwell K, Scheets G, Weber J, Teague K (2009) A multilanguage study of the quality of interleaved MELP voice traffic over a lossy network. IEEE Signal Process Lett 16(7):565–568CrossRef Harwell K, Scheets G, Weber J, Teague K (2009) A multilanguage study of the quality of interleaved MELP voice traffic over a lossy network. IEEE Signal Process Lett 16(7):565–568CrossRef
43.
go back to reference Hassan M, Nayandoro A (2000) Internet telephony: services, technical challenges, and products. IEEE Commun Mag 38:96–103CrossRef Hassan M, Nayandoro A (2000) Internet telephony: services, technical challenges, and products. IEEE Commun Mag 38:96–103CrossRef
44.
go back to reference Huerta JM, Stern RM (1998) Speech compression from GSM coder parameters. Proc Int Conf Spoken Lang Proc, ICSLP-98, vol 4, pp 1463–1466 Huerta JM, Stern RM (1998) Speech compression from GSM coder parameters. Proc Int Conf Spoken Lang Proc, ICSLP-98, vol 4, pp 1463–1466
45.
go back to reference Hyarinen A, Karhunen J, Oja E (2001) Independent component analysis. Wiley, New YorkCrossRef Hyarinen A, Karhunen J, Oja E (2001) Independent component analysis. Wiley, New YorkCrossRef
46.
go back to reference Ion V, Reinhold H-U (2008) A novel uncertainty decoding rule with applications to transmission error robust speech recognition. IEEE Trans Audio Speech Lang Proc 16(5):1047–1060CrossRef Ion V, Reinhold H-U (2008) A novel uncertainty decoding rule with applications to transmission error robust speech recognition. IEEE Trans Audio Speech Lang Proc 16(5):1047–1060CrossRef
47.
48.
go back to reference Koenig BE (1986) Spectrographic voice identification: a forensic survey. J Acoust Soc Am 79:2088–2090CrossRef Koenig BE (1986) Spectrographic voice identification: a forensic survey. J Acoust Soc Am 79:2088–2090CrossRef
49.
go back to reference Kostas TJ, Borella MS, Sidhu I, Schuster GM, Grabiec J, Mahler J (1998) Real-time voice over packet-switched networks. IEEE Network 12:18–27CrossRef Kostas TJ, Borella MS, Sidhu I, Schuster GM, Grabiec J, Mahler J (1998) Real-time voice over packet-switched networks. IEEE Network 12:18–27CrossRef
50.
go back to reference Kuhlmann M, Sapatnekar S, Parhi KK (1999) Efficient crosstalk estimation. Proc of 1999 IEEE Int Conf on Computer Design, Austin Kuhlmann M, Sapatnekar S, Parhi KK (1999) Efficient crosstalk estimation. Proc of 1999 IEEE Int Conf on Computer Design, Austin
51.
go back to reference Lansky P, Steiglitz K (1981) Synthesis of timbral families by warped linear prediction. Comput Music J 5(3):45–49CrossRef Lansky P, Steiglitz K (1981) Synthesis of timbral families by warped linear prediction. Comput Music J 5(3):45–49CrossRef
52.
go back to reference Lee J, Lee YW, O’Clock G, Zhu X, Parhi K, Warwick W (2009) Induced respiratory system modeling by high frequency chest compression using lumped system identification method. Proc of 2009 IEEE Engineering in Medicine and Biology Society Conference, Minneapolis, MN, Sept 2009 Lee J, Lee YW, O’Clock G, Zhu X, Parhi K, Warwick W (2009) Induced respiratory system modeling by high frequency chest compression using lumped system identification method. Proc of 2009 IEEE Engineering in Medicine and Biology Society Conference, Minneapolis, MN, Sept 2009
53.
go back to reference Linguistic Data Consortium. http://www.ldc.upenn.edu/ Linguistic Data Consortium. http://​www.​ldc.​upenn.​edu/​
54.
go back to reference Maheswari K, Punithavalli M (2010) Enhanced packet loss recovery in voice multiplex-multicast based VoIP networks. A2CWiC ’10 Proceedings of the 1st Amrita ACM-W Celebration on Women in Computing in India Maheswari K, Punithavalli M (2010) Enhanced packet loss recovery in voice multiplex-multicast based VoIP networks. A2CWiC ’10 Proceedings of the 1st Amrita ACM-W Celebration on Women in Computing in India
55.
go back to reference Martin A, Przybocki M (2000) The NIST 1999 speaker recognition evaluation—an overview. Digital Signal Process 10(1–3):1–18CrossRef Martin A, Przybocki M (2000) The NIST 1999 speaker recognition evaluation—an overview. Digital Signal Process 10(1–3):1–18CrossRef
56.
go back to reference Martin AF, Przybocki MA (2001) The NIST speaker recognition evaluations: 1996–2001. A Speaker Odyssey, A Speaker Recognition Workshop, Dec 2001 Martin AF, Przybocki MA (2001) The NIST speaker recognition evaluations: 1996–2001. A Speaker Odyssey, A Speaker Recognition Workshop, Dec 2001
57.
go back to reference Martin AF, Doddington G, Kamm T, Ordowski M, Przybocki M (1997) The DET curve in assessment of detection task performance, vol 4. Proc Eurospeech’97, Rhodes, Greece, pp 1899–1903, Sept 1997 Martin AF, Doddington G, Kamm T, Ordowski M, Przybocki M (1997) The DET curve in assessment of detection task performance, vol 4. Proc Eurospeech’97, Rhodes, Greece, pp 1899–1903, Sept 1997
58.
go back to reference McCree AV, Barnwell TP III (1995) A mixed excitation LPC vocoder model for low bit rate speech coding. IEEE Trans Speech Audio Proc 3:242–250CrossRef McCree AV, Barnwell TP III (1995) A mixed excitation LPC vocoder model for low bit rate speech coding. IEEE Trans Speech Audio Proc 3:242–250CrossRef
59.
go back to reference McCree A, Truong K, George EB, Barnwell TP, Vzswanathanl V (1996) A 2.4 kbit/s MELP coder candidate for the new US. Federal standard, Proc Int Conf Acoust Speech Signal, ICASSP, pp 200–203 McCree A, Truong K, George EB, Barnwell TP, Vzswanathanl V (1996) A 2.4 kbit/s MELP coder candidate for the new US. Federal standard, Proc Int Conf Acoust Speech Signal, ICASSP, pp 200–203
60.
61.
go back to reference Merazka F (2008) Improved packet loss recovery using interleaving for CELP-type speech coders in packet networks. IAENG Int J Comput Sci 36:1, IJCS_36_1_08 Merazka F (2008) Improved packet loss recovery using interleaving for CELP-type speech coders in packet networks. IAENG Int J Comput Sci 36:1, IJCS_36_1_08
62.
go back to reference Nakasone H (2003) Automated speaker recognition in real world conditions: controlling the uncontrollable. Proc Eurospeech Nakasone H (2003) Automated speaker recognition in real world conditions: controlling the uncontrollable. Proc Eurospeech
63.
go back to reference Nakasone H, Beck SD (2001) Forensic automatic speaker recognition. A speaker Odyssey-the speaker recognition workshop, Crete, Greece, 18–22 June, 2001 Nakasone H, Beck SD (2001) Forensic automatic speaker recognition. A speaker Odyssey-the speaker recognition workshop, Crete, Greece, 18–22 June, 2001
64.
go back to reference Nolan F, Oh T (1996) Identical twins, different voices. Forensic Linguist 3(1):39–49 Nolan F, Oh T (1996) Identical twins, different voices. Forensic Linguist 3(1):39–49
65.
go back to reference Nolan JF (1983) The phonetic bases of speaker recognition. Cambridge University Press, Cambridge Nolan JF (1983) The phonetic bases of speaker recognition. Cambridge University Press, Cambridge
66.
go back to reference Open source SIP stack and media stack for presence, instant messaging, and multimedia communication, http://www.pjsip.org Open source SIP stack and media stack for presence, instant messaging, and multimedia communication, http://​www.​pjsip.​org
67.
go back to reference Oppenheim AV (1964) Superposition in a class of nonlinear systems. Ph.D. Dissertation, MIT, USA Oppenheim AV (1964) Superposition in a class of nonlinear systems. Ph.D. Dissertation, MIT, USA
68.
go back to reference Oppenheim AV, Schafer RW (1989) Discrete-time signal processing. Prentice-Hall, Englewood CliffsMATH Oppenheim AV, Schafer RW (1989) Discrete-time signal processing. Prentice-Hall, Englewood CliffsMATH
69.
go back to reference Ortega-Garcia J, Bigun J, Reynolds DA, Gonzalez-Rodriguez J (2004) Authentication gets personal with biometrics. IEEE Signal Process Mag 21(2):50–62CrossRef Ortega-Garcia J, Bigun J, Reynolds DA, Gonzalez-Rodriguez J (2004) Authentication gets personal with biometrics. IEEE Signal Process Mag 21(2):50–62CrossRef
70.
go back to reference O’Shaughnessy D (2001) Speech communications: human and machine, 2nd edition. Universities Press O’Shaughnessy D (2001) Speech communications: human and machine, 2nd edition. Universities Press
71.
go back to reference Parhi KK (2004) VLSI digital signal processing systems design and implementation. Wiley, New York Parhi KK (2004) VLSI digital signal processing systems design and implementation. Wiley, New York
72.
go back to reference Patil HA (2005) Speaker recognition in Indian languages: a feature based approach. Ph.D. Thesis, Department of Electrical Engineering, IIT Kharagpur, India Patil HA (2005) Speaker recognition in Indian languages: a feature based approach. Ph.D. Thesis, Department of Electrical Engineering, IIT Kharagpur, India
73.
go back to reference Patil HA (2009) Infant identification from their cry. 7th Int Conf Advances in Pattern Recognition, ICAPR, ISI Kolkata, IEEE Comput Soc, 4–6 Feb 2009, pp 107–109 Patil HA (2009) Infant identification from their cry. 7th Int Conf Advances in Pattern Recognition, ICAPR, ISI Kolkata, IEEE Comput Soc, 4–6 Feb 2009, pp 107–109
74.
go back to reference Patil HA, Basu TK (2007) Advances in speaker recognition: a feature based approach. Proc Int Conf Artificial Intelligence and Pattern Recognition, AIPR, Orlando, 9–12 July 2007, pp 528–537 Patil HA, Basu TK (2007) Advances in speaker recognition: a feature based approach. Proc Int Conf Artificial Intelligence and Pattern Recognition, AIPR, Orlando, 9–12 July 2007, pp 528–537
75.
go back to reference Patil HA, Basu TK (2008) LP spectra vs. Mel spectra for identification of professional mimics in Indian languages. Int J Speech Technol 11(1):1–16 Patil HA, Basu TK (2008) LP spectra vs. Mel spectra for identification of professional mimics in Indian languages. Int J Speech Technol 11(1):1–16
76.
go back to reference Patil HA, Basu TK (2008) A novel approach to language identification using modified polynomial networks. In: Prasad B, Prasanna SRM (Eds) Speech, audio, image and biomedical signal processing using neural networks, studies in computational intelligence, vol 83. Springer, pp 117–144 Patil HA, Basu TK (2008) A novel approach to language identification using modified polynomial networks. In: Prasad B, Prasanna SRM (Eds) Speech, audio, image and biomedical signal processing using neural networks, studies in computational intelligence, vol 83. Springer, pp 117–144
77.
go back to reference Patil HA, Basu TK (2009) A novel modified polynomial networks design for dialect recognition. 7th Int Conf Advances in Pattern Recognition, ICAPR, ISI Kolkata, IEEE Computer Society 4–6 Feb 2009, pp 175–178 Patil HA, Basu TK (2009) A novel modified polynomial networks design for dialect recognition. 7th Int Conf Advances in Pattern Recognition, ICAPR, ISI Kolkata, IEEE Computer Society 4–6 Feb 2009, pp 175–178
78.
go back to reference Patil HA, Parhi KK (2009) Variable length Teager energy based Mel cepstral features for identification of twins. In: Chaudhury S et al. (eds) PReMI 2009, vol 5909, LNCS, Springer, pp 525–530 Patil HA, Parhi KK (2009) Variable length Teager energy based Mel cepstral features for identification of twins. In: Chaudhury S et al. (eds) PReMI 2009, vol 5909, LNCS, Springer, pp 525–530
79.
go back to reference Patil HA, Parhi KK (2010) Novel variable length Teager energy based features for person recognition from their hum. In: Proc Int Conf Acoust, Speech and Signal Proc, ICASSP 2010, Dallas, 14–19 March 2010 Patil HA, Parhi KK (2010) Novel variable length Teager energy based features for person recognition from their hum. In: Proc Int Conf Acoust, Speech and Signal Proc, ICASSP 2010, Dallas, 14–19 March 2010
80.
go back to reference Patil HA, Dutta PK, Basu TK (2006) Effectiveness of LP based features for identification of professional, mimics in Indian languages. Int Workshop on Multimodal User Authentication, MMUA06, Toulouse, France, 11–12 May 2006 Patil HA, Dutta PK, Basu TK (2006) Effectiveness of LP based features for identification of professional, mimics in Indian languages. Int Workshop on Multimodal User Authentication, MMUA06, Toulouse, France, 11–12 May 2006
81.
go back to reference Patil HA, Dutta PK, Basu TK (2006) On the investigation of spectral resolution problem for identification of female speakers in Bengali. Special session on person authentication: voice and other biometrics, IEEE Int Conf on Industrial Tech, IEEE ICIT’06, Mumbai, 15–17 Dec 2006 Patil HA, Dutta PK, Basu TK (2006) On the investigation of spectral resolution problem for identification of female speakers in Bengali. Special session on person authentication: voice and other biometrics, IEEE Int Conf on Industrial Tech, IEEE ICIT’06, Mumbai, 15–17 Dec 2006
82.
go back to reference Patil HA, Sitaram S, Sharma E (2009) DA-IICT cross-lingual and multilingual corpora for speaker recognition. 7th Int Conf Advances in Pattern Recognition, ICAPR, ISI Kolkata, IEEE Computer Society, 4–6 Feb 2009, pp 187–190 Patil HA, Sitaram S, Sharma E (2009) DA-IICT cross-lingual and multilingual corpora for speaker recognition. 7th Int Conf Advances in Pattern Recognition, ICAPR, ISI Kolkata, IEEE Computer Society, 4–6 Feb 2009, pp 187–190
83.
go back to reference Perceptual Evaluation of Speech Quality (PESQ) 2001, ITU-T, Recommendation.P.862. Perceptual Evaluation of Speech Quality (PESQ) 2001, ITU-T, Recommendation.P.862.
84.
go back to reference Perrot P, Aversano G, Blouet R, Charbit M, Chollet G (2005) Voice forgery using ALISP: indexation in a client memory. In: Proc Int Conf Acoust Speech and Signal Process, ICASSP 2005 Perrot P, Aversano G, Blouet R, Charbit M, Chollet G (2005) Voice forgery using ALISP: indexation in a client memory. In: Proc Int Conf Acoust Speech and Signal Process, ICASSP 2005
85.
go back to reference Perrot P, Aversano G, Chollet G (2007) Voice disguise and automatic detection: review and perspectives. In: Stylianou Y, Faundez-Zanuy M, Esposito A (eds) Progress in nonlinear speech processing. Springer, Berlin, pp 101–117CrossRef Perrot P, Aversano G, Chollet G (2007) Voice disguise and automatic detection: review and perspectives. In: Stylianou Y, Faundez-Zanuy M, Esposito A (eds) Progress in nonlinear speech processing. Springer, Berlin, pp 101–117CrossRef
86.
go back to reference Perrot P, Razik J, Chollet G (2009) Vocal forgery in forensic sciences. Proc E Forensics—Adelaïde, Australia Perrot P, Razik J, Chollet G (2009) Vocal forgery in forensic sciences. Proc E Forensics—Adelaïde, Australia
87.
go back to reference Plumpe MD, Quatieri TF, Reynolds DA (1999) Modeling of the glottal flow derivative waveform with application to speaker identification. IEEE Trans Speech Audio Process 7(5):569–585CrossRef Plumpe MD, Quatieri TF, Reynolds DA (1999) Modeling of the glottal flow derivative waveform with application to speaker identification. IEEE Trans Speech Audio Process 7(5):569–585CrossRef
88.
go back to reference Pols LCW (1977) Spectral analysis and identification of Dutch vowels in monosyllabic words. Ph.D. thesis, Free University of Amsterdam Pols LCW (1977) Spectral analysis and identification of Dutch vowels in monosyllabic words. Ph.D. thesis, Free University of Amsterdam
89.
go back to reference Porwal G, Patil HA, Basu TK (2004) Effect of GSM-FR coding standard on performance of text-independent speaker identification. Int Conf on Advanced Computing and Communications, ADCOM04, 13–15 Dec 2004 Porwal G, Patil HA, Basu TK (2004) Effect of GSM-FR coding standard on performance of text-independent speaker identification. Int Conf on Advanced Computing and Communications, ADCOM04, 13–15 Dec 2004
90.
go back to reference Porwal G, Patil HA, Basu TK (2005) Effect of speech coding on text-independent speaker identification. Int Conf on Intelligent Sensing and Information Processing, ICISIP0, 4–7 Jan 2005, pp 415–420 Porwal G, Patil HA, Basu TK (2005) Effect of speech coding on text-independent speaker identification. Int Conf on Intelligent Sensing and Information Processing, ICISIP0, 4–7 Jan 2005, pp 415–420
91.
go back to reference Prybocki MA, Martin AF, Le AN (2007) NIST speaker recognition evaluations utilizing the mixer corpora—2004, 2005, 2006. IEEE Trans Audio Speech Lang Process 15(7):1951–1959CrossRef Prybocki MA, Martin AF, Le AN (2007) NIST speaker recognition evaluations utilizing the mixer corpora—2004, 2005, 2006. IEEE Trans Audio Speech Lang Process 15(7):1951–1959CrossRef
92.
go back to reference Quatieri TF (2002) Discrete-time speech signal processing: principles and practices. Pearson Education Quatieri TF (2002) Discrete-time speech signal processing: principles and practices. Pearson Education
93.
go back to reference Quatieri TF, Singer E, Dunn RB, Rynolds DA, Campbell JP (1999) Speaker and language recognition using speech codec parameters, vol 2. Proc Eurospeech99, pp 787–790 Quatieri TF, Singer E, Dunn RB, Rynolds DA, Campbell JP (1999) Speaker and language recognition using speech codec parameters, vol 2. Proc Eurospeech99, pp 787–790
94.
go back to reference Quatieri TF, Dunn RB, Reynolds DA, Campbell JP, Singer E (2000) Speaker recognition using G. 729 speech codec parameters. Proc Int Conf Acoust Speech and Signal Process, vol 2, ICASSP’00, pp 1089–1092 Quatieri TF, Dunn RB, Reynolds DA, Campbell JP, Singer E (2000) Speaker recognition using G. 729 speech codec parameters. Proc Int Conf Acoust Speech and Signal Process, vol 2, ICASSP’00, pp 1089–1092
95.
go back to reference Rabiner LR, Schafer RW (1978) Digital processing of speech signals. Prentice-Hall, Englewood Cliffs Rabiner LR, Schafer RW (1978) Digital processing of speech signals. Prentice-Hall, Englewood Cliffs
96.
go back to reference Ranganathan MK, Kilmartin L (2005) Neural and fuzzy computation techniques for playout delay adaptation in VoIP networks. IEEE Trans Neural Networks 16(5):1174–1194CrossRef Ranganathan MK, Kilmartin L (2005) Neural and fuzzy computation techniques for playout delay adaptation in VoIP networks. IEEE Trans Neural Networks 16(5):1174–1194CrossRef
97.
go back to reference Reynolds DA (1992) A Gaussian mixture modeling approach to text-independent speaker identification. Ph.D. Dissertation, Georgia Institute of Technology Reynolds DA (1992) A Gaussian mixture modeling approach to text-independent speaker identification. Ph.D. Dissertation, Georgia Institute of Technology
98.
go back to reference Reynolds DA (1994) Experimental evaluation of features for robust speaker identification. IEEE Trans Speech Audio Process 2:639–643CrossRef Reynolds DA (1994) Experimental evaluation of features for robust speaker identification. IEEE Trans Speech Audio Process 2:639–643CrossRef
99.
go back to reference Reynolds DA (1995) Large population speaker identification using clean and telephone speech. IEEE Signal Process Lett 2(3)46–48CrossRef Reynolds DA (1995) Large population speaker identification using clean and telephone speech. IEEE Signal Process Lett 2(3)46–48CrossRef
100.
go back to reference Reynolds DA, Rose RC (1995) Robust text-independent speaker identification using Gaussian mixture models. IEEE Trans Speech Audio Process 3(1):72–83CrossRef Reynolds DA, Rose RC (1995) Robust text-independent speaker identification using Gaussian mixture models. IEEE Trans Speech Audio Process 3(1):72–83CrossRef
101.
go back to reference Reynolds DA, Andrews W, Campbell J, Navratil J, Peskin B, Adami A, Jin Q, Klusacek D, Abramson J, Mihaescu R, Godfrey J, Jones D, Xiang B (2003) The SuperSID project: exploiting high-level information for high-accuracy speaker recognition. Proc Int Conf Acoustics, Speech, and Signal Processing, 06–10 Apr 2003, ICASSP’03, Hong Kong, pp IV:784–787 Reynolds DA, Andrews W, Campbell J, Navratil J, Peskin B, Adami A, Jin Q, Klusacek D, Abramson J, Mihaescu R, Godfrey J, Jones D, Xiang B (2003) The SuperSID project: exploiting high-level information for high-accuracy speaker recognition. Proc Int Conf Acoustics, Speech, and Signal Processing, 06–10 Apr 2003, ICASSP’03, Hong Kong, pp IV:784–787
102.
103.
go back to reference Sat B, Wah BW (2006) Analysis and evaluation of the Skype and Google-talk VoIP systems. ICME, pp 2153–2156 Sat B, Wah BW (2006) Analysis and evaluation of the Skype and Google-talk VoIP systems. ICME, pp 2153–2156
104.
go back to reference Sambur MR (1975) Selection of acoustic features for speaker identification. IEEE Trans Acoust Speech Signal Process ASSP-23;176–182 Sambur MR (1975) Selection of acoustic features for speaker identification. IEEE Trans Acoust Speech Signal Process ASSP-23;176–182
105.
go back to reference Schafer RW (1968) Echo removal by discrete generalized filtering. Ph.D. Dissertation, MIT, USA Schafer RW (1968) Echo removal by discrete generalized filtering. Ph.D. Dissertation, MIT, USA
106.
go back to reference Schwartz R (2006) Voiceprint in the United States—why they won’t go away. Proc Int Association for Forensic Phonetics and Acoustics, Sweden Schwartz R (2006) Voiceprint in the United States—why they won’t go away. Proc Int Association for Forensic Phonetics and Acoustics, Sweden
107.
go back to reference Soong FK, Rosenberg AE, Juang B-H (1987) A vector quantization approach to speaker recognition. AT&T Tech J 66(2):14–26 Soong FK, Rosenberg AE, Juang B-H (1987) A vector quantization approach to speaker recognition. AT&T Tech J 66(2):14–26
108.
go back to reference Special Section on Speaker and Language Recognition (2007) IEEE Trans Audio Speech Lang Proc 15(7):2104–2115 Special Section on Speaker and Language Recognition (2007) IEEE Trans Audio Speech Lang Proc 15(7):2104–2115
109.
go back to reference Stone JV (2004) Independent component analysis: a tutorial introduction. MIT Press, Boston Stone JV (2004) Independent component analysis: a tutorial introduction. MIT Press, Boston
110.
go back to reference Strube HW (1980) Linear prediction on a warped frequency scale. J Acoust Soc Am 68(4):1071–1076CrossRef Strube HW (1980) Linear prediction on a warped frequency scale. J Acoust Soc Am 68(4):1071–1076CrossRef
111.
go back to reference The NS-3 network simulator, http://www.nsnam.org The NS-3 network simulator, http://​www.​nsnam.​org
112.
go back to reference Tosi O (1979) Voice identification: theory and legal applications. University Park Press, Baltimore Tosi O (1979) Voice identification: theory and legal applications. University Park Press, Baltimore
113.
go back to reference Wang X, Lin J (2007) Applying speaker recognition over VoIP auditing. Proc of the 6th Int Conf on Machine Learning and Cybernetics, 19–22 Aug 2007, Hong Kong, pp 3577–3581 Wang X, Lin J (2007) Applying speaker recognition over VoIP auditing. Proc of the 6th Int Conf on Machine Learning and Cybernetics, 19–22 Aug 2007, Hong Kong, pp 3577–3581
114.
go back to reference Wasem O, Goodman D, Dvorak C, Page H (1988) The effect of waveform substitution on the quality of PCM packet communications. IEEE Trans Acoust Speech Signal Process 36(3):342–348CrossRef Wasem O, Goodman D, Dvorak C, Page H (1988) The effect of waveform substitution on the quality of PCM packet communications. IEEE Trans Acoust Speech Signal Process 36(3):342–348CrossRef
115.
go back to reference Weerackody V, Reichl W, Potamianos A (2002) An error-protected speech recognition system for wireless communications. IEEE Trans Wireless Commun 1(2):282–291CrossRef Weerackody V, Reichl W, Potamianos A (2002) An error-protected speech recognition system for wireless communications. IEEE Trans Wireless Commun 1(2):282–291CrossRef
116.
go back to reference Wolf JJ (1972) Efficient acoustic parameters for speaker recognition. J Acoust Soc Am 51:2030–2043CrossRef Wolf JJ (1972) Efficient acoustic parameters for speaker recognition. J Acoust Soc Am 51:2030–2043CrossRef
117.
go back to reference Yegnanarayana B, Prasanna SRM, Zachariah JM, Gupta ChS (2005) Combining evidence from source, suprasegmental and spectral features for a fixed-text speaker verification system. IEEE Trans Speech Audio Process 13(4):575–582 Yegnanarayana B, Prasanna SRM,  Zachariah JM, Gupta ChS (2005) Combining evidence from source, suprasegmental and spectral features for a fixed-text speaker verification system. IEEE Trans Speech Audio Process 13(4):575–582
118.
go back to reference Yoma NB, Busso C, Soto I (2005) Packet-loss modeling in IP networks with state-duration constraints. IEE Proc Commun 152(1):1–5CrossRef Yoma NB, Busso C, Soto I (2005) Packet-loss modeling in IP networks with state-duration constraints. IEE Proc Commun 152(1):1–5CrossRef
119.
go back to reference Yoma NB, Molina C, Silva J, Busso C (2006) Modeling, estimating, and compensating low-bit rate coding distortion in speech recognition. IEEE Trans Audio Speech Lang Process 14(1):246–255CrossRef Yoma NB, Molina C, Silva J, Busso C (2006) Modeling, estimating, and compensating low-bit rate coding distortion in speech recognition. IEEE Trans Audio Speech Lang Process 14(1):246–255CrossRef
120.
go back to reference Yu AT, Wang H-C (1998) A study on the recognition of low-bit-rate encoded speech. Proc Int Conf Spoken Lang Proc ICSLP, pp 38–41 Yu AT, Wang H-C (1998) A study on the recognition of low-bit-rate encoded speech. Proc Int Conf Spoken Lang Proc ICSLP, pp 38–41
121.
go back to reference Zetterholm E (2005) Voice imitation: a phonetic study of perceptual illusions and acoustic success, PhD Abstract. Int J Speech Lang Law, vol 12, no 1 Zetterholm E (2005) Voice imitation: a phonetic study of perceptual illusions and acoustic success, PhD Abstract. Int J Speech Lang Law, vol 12, no 1
122.
go back to reference Zetterholm E (2007) Detection of speaker characteristics using voice imitation. In: Speaker classification II lecture notes in computer science, vol 4441, Springer, pp 192–205 Zetterholm E (2007) Detection of speaker characteristics using voice imitation. In: Speaker classification II lecture notes in computer science, vol 4441, Springer, pp 192–205
123.
go back to reference Zhang C, Hansen JHL (2011) Whisper-island detection based on unsupervised segmentation with entropy based speech feature processing. IEEE Trans Audio Speech Lang Process 19:883–894 Zhang C, Hansen JHL (2011) Whisper-island detection based on unsupervised segmentation with entropy based speech feature processing. IEEE Trans Audio Speech Lang Process 19:883–894
124.
go back to reference Zheng N, Lee T, Ching PC (2007) Integration of complementary acoustic features for speaker recognition. IEEE Signal Proc Lett 14(3):181–184CrossRef Zheng N, Lee T, Ching PC (2007) Integration of complementary acoustic features for speaker recognition. IEEE Signal Proc Lett 14(3):181–184CrossRef
125.
go back to reference Zhu X, Parhi KK (2010) Underdetermined blind source separation based on continuous density Hidden Markov Model. Proc 2010 IEEE Int Conf Acoustics, Speech, and Signal Process, March 2010, Dallas, TX Zhu X, Parhi KK (2010) Underdetermined blind source separation based on continuous density Hidden Markov Model. Proc 2010 IEEE Int Conf Acoustics, Speech, and Signal Process, March 2010, Dallas, TX
Metadata
Title
Speaker Identification over Narrowband VoIP Networks
Authors
Hemant A. Patil, Ph.D.
Aaron E. Cohen, Ph.D.
Keshab K. Parhi, Ph.D.
Copyright Year
2012
Publisher
Springer New York
DOI
https://doi.org/10.1007/978-1-4614-0263-3_6