Skip to main content

2012 | OriginalPaper | Buchkapitel

7. Noise Robust Speaker Identification: Using Nonlinear Modeling Techniques

verfasst von : Raghunath S. Holambe, Ph.D., Mangesh S. Deshpande, M.E.

Erschienen in: Forensic Speaker Recognition

Verlag: Springer New York

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Session variability is one of the challenging tasks in forensic speaker identification. This variability in terms of mismatched environments seriously degrades the identification performance. In order to address the problem of environment mismatch due to noise, different types of robust features are discussed in this chapter. In state-of-the art features, the speech production system is modeled as a linear source-filter model. However, this modeling technique neglects some nonlinear aspects of speech production, which carry some speaker-specific information. Furthermore, the state-of-the art features are based on either speech production mechanism or speech perception mechanism. To overcome such limitations of existing features, features derived using non-linear modeling techniques are proposed in the chapter. The proposed features, Teager energy operator based cepstral coefficients (TEOCC) and amplitude-frequency modulation (AM-FM) based ‘Q’ features show significant improvement in speaker identification rate in mismatched environments. The performance of these features is evaluated for different types of noise signals in the NOISEX-92 database with clean training and noisy testing environments. The speaker identification rate achieved is 57% using TEOCC features and 97% using AM-FM based ‘Q’ features for 0 dB SNR compared to 25.5% using MFCC features, when the signal is corrupted by car engine noise. It is shown that, with the proposed features, speaker identification accuracy can be increased in presence of noise, without any additional pre-processing of the signal to remove noise.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Schmidt-Nielsen A, Crystal TH (1998) Human vs machine speaker identification with telephone speech. Proceedings ICSLP ’98, pp 1–4 Schmidt-Nielsen A, Crystal TH (1998) Human vs machine speaker identification with telephone speech. Proceedings ICSLP ’98, pp 1–4
2.
Zurück zum Zitat Przybocki MA, Martin AF, Le AN (2007) NIST speaker recognition evaluations utilizing the mixer corpora-2004, 2005, 2006. IEEE Trans Audio Speech Lang Process 15(7):1951–1959CrossRef Przybocki MA, Martin AF, Le AN (2007) NIST speaker recognition evaluations utilizing the mixer corpora-2004, 2005, 2006. IEEE Trans Audio Speech Lang Process 15(7):1951–1959CrossRef
3.
Zurück zum Zitat Gonzalez-Rodriguez J, Rose P, Ramos D, Doroteo TT, Ortega-Garcia J (2007) Emulating DNA: rigorous quantification of evidential weight in transparent and testable forensic speaker recognition. IEEE Trans Audio Speech Lang Process 15(7):2072–2084CrossRef Gonzalez-Rodriguez J, Rose P, Ramos D, Doroteo TT, Ortega-Garcia J (2007) Emulating DNA: rigorous quantification of evidential weight in transparent and testable forensic speaker recognition. IEEE Trans Audio Speech Lang Process 15(7):2072–2084CrossRef
4.
Zurück zum Zitat Marescal F (1999) The forensic speaker recognition method used by the French Gendrmerie. Internal publication, IRCGN, Paris Marescal F (1999) The forensic speaker recognition method used by the French Gendrmerie. Internal publication, IRCGN, Paris
5.
Zurück zum Zitat González-Rodriguez J, Ortega-García J, Lucena-Molina J (2001) On the application of the Bayesian framework to real forensic conditions with GMM-based systems. A Speaker Odyssey, Crete, Greece, pp 135–138 González-Rodriguez J, Ortega-García J, Lucena-Molina J (2001) On the application of the Bayesian framework to real forensic conditions with GMM-based systems. A Speaker Odyssey, Crete, Greece, pp 135–138
6.
Zurück zum Zitat Nakasone H, Beck SD (2001) Forensic automatic speaker identification. A Speaker Odyssey, Crete, Greece, pp 139–144 Nakasone H, Beck SD (2001) Forensic automatic speaker identification. A Speaker Odyssey, Crete, Greece, pp 139–144
7.
Zurück zum Zitat Kenny P, Boulianne G, Ouellet P, Dumouchel P (2007) Speaker and session variability in GMM-based speaker verification. IEEE Trans Audio Speech Signal Process 15(4):1448–1460CrossRef Kenny P, Boulianne G, Ouellet P, Dumouchel P (2007) Speaker and session variability in GMM-based speaker verification. IEEE Trans Audio Speech Signal Process 15(4):1448–1460CrossRef
8.
Zurück zum Zitat Vogt R, Sridharan S (2007) Explicit modelling of session variability for speaker verification. Comput Speech Lang 22(1):17–38CrossRef Vogt R, Sridharan S (2007) Explicit modelling of session variability for speaker verification. Comput Speech Lang 22(1):17–38CrossRef
9.
Zurück zum Zitat Atal BS (1974) Effectiveness of linear prediction of the speech wave for automatic speaker identification and verification. J Acoust Soc Am 55(6):1304–1312CrossRef Atal BS (1974) Effectiveness of linear prediction of the speech wave for automatic speaker identification and verification. J Acoust Soc Am 55(6):1304–1312CrossRef
10.
Zurück zum Zitat Davis SB, Mermelstein P (1980) Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans Acoust Speech Signal Process ASSP-28(4):357–366CrossRef Davis SB, Mermelstein P (1980) Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans Acoust Speech Signal Process ASSP-28(4):357–366CrossRef
11.
Zurück zum Zitat Raynolds DA (1994) Experimental evaluation of features for robust speaker identification. IEEE Trans Speech Audio Process 2:639–643CrossRef Raynolds DA (1994) Experimental evaluation of features for robust speaker identification. IEEE Trans Speech Audio Process 2:639–643CrossRef
12.
Zurück zum Zitat Rabiner LR, Juang BH (1993) Fundamentals of speech recognition. Prentice-Hall, New Delhi Rabiner LR, Juang BH (1993) Fundamentals of speech recognition. Prentice-Hall, New Delhi
13.
Zurück zum Zitat Lu X, Dang J (2008) An investigation of dependencies between frequency components and speaker characteristics for text-independent speaker identification. Speech Commun 50:312–322CrossRef Lu X, Dang J (2008) An investigation of dependencies between frequency components and speaker characteristics for text-independent speaker identification. Speech Commun 50:312–322CrossRef
14.
Zurück zum Zitat Gold B, Morgan N (2002) Speech and audio signal processing. Wiley, New York Gold B, Morgan N (2002) Speech and audio signal processing. Wiley, New York
15.
Zurück zum Zitat Shauhnessy DO (2001) Speech communications: human and machine, 2nd edn. University Press, Hyderabad Shauhnessy DO (2001) Speech communications: human and machine, 2nd edn. University Press, Hyderabad
16.
Zurück zum Zitat Lin Q, Jan E-E, Che D-S, Flanagan J (1997) Selective use of speech spectrum and a VQGMM method for speaker identification. Proc. European conf. speech communication and technology, pp 2415–2418 Lin Q, Jan E-E, Che D-S, Flanagan J (1997) Selective use of speech spectrum and a VQGMM method for speaker identification. Proc. European conf. speech communication and technology, pp 2415–2418
17.
Zurück zum Zitat Quatieri TF (2004) Discrete-time speech signal processing, principles and practice. Pearson Education, Delhi Quatieri TF (2004) Discrete-time speech signal processing, principles and practice. Pearson Education, Delhi
18.
Zurück zum Zitat Teager HM (1980) Some observations on oral air flow during phonation. IEEE Trans Speech Audio Process 28(5):599–601 Teager HM (1980) Some observations on oral air flow during phonation. IEEE Trans Speech Audio Process 28(5):599–601
19.
Zurück zum Zitat Hansen JHL, Liliana G-C, Kaiser JF (1998) Analysis method with application to vocal fold pathology assessment. IEEE Trans Biomed Eng 45(3):300–313CrossRef Hansen JHL, Liliana G-C, Kaiser JF (1998) Analysis method with application to vocal fold pathology assessment. IEEE Trans Biomed Eng 45(3):300–313CrossRef
20.
Zurück zum Zitat Hayakawa S, Itakura F (1994) Text-dependent speaker recognition using the information in the higher frequency band. Proc. IEEE international conference on acoustic speech and signal Processing (ICASSP ’94), Adelaide, Australia, pp 137–140 Hayakawa S, Itakura F (1994) Text-dependent speaker recognition using the information in the higher frequency band. Proc. IEEE international conference on acoustic speech and signal Processing (ICASSP ’94), Adelaide, Australia, pp 137–140
21.
Zurück zum Zitat Lu X, Dang J (2007) Physiological feature extraction for text independent speaker identification using non-uniform subband processing. Proc. IEEE international conference on acoustic speech and signal processing (ICASSP ’07), Adelaide, Australia, IV-461-IV-464 Lu X, Dang J (2007) Physiological feature extraction for text independent speaker identification using non-uniform subband processing. Proc. IEEE international conference on acoustic speech and signal processing (ICASSP ’07), Adelaide, Australia, IV-461-IV-464
22.
Zurück zum Zitat Wu J-D, Lin B-F (2009) Speaker identification using discrete wavelet packet transform technique with irregular decomposition. Expert Syst Appl 36:3136–3143MathSciNetCrossRef Wu J-D, Lin B-F (2009) Speaker identification using discrete wavelet packet transform technique with irregular decomposition. Expert Syst Appl 36:3136–3143MathSciNetCrossRef
23.
Zurück zum Zitat Honda K (2008) Physiological processes of speech production. In: Benesty J et al (eds) Springer handbook of speech processing. Springer, Berlin Honda K (2008) Physiological processes of speech production. In: Benesty J et al (eds) Springer handbook of speech processing. Springer, Berlin
24.
Zurück zum Zitat Dang J, Honda K (1997) Acoustic characteristics of the piriform fossa in models and humans. J Acoust Soc Am 101(1):456–465CrossRef Dang J, Honda K (1997) Acoustic characteristics of the piriform fossa in models and humans. J Acoust Soc Am 101(1):456–465CrossRef
25.
Zurück zum Zitat Kitamura T, Takemoto H, Adachi S, Mokhtari P, Honda K (2006) Cyclicity of laryngeal cavity resonance due to vocal fold vibration. J Acoust Soc Am 120(6):2239–2249CrossRef Kitamura T, Takemoto H, Adachi S, Mokhtari P, Honda K (2006) Cyclicity of laryngeal cavity resonance due to vocal fold vibration. J Acoust Soc Am 120(6):2239–2249CrossRef
26.
Zurück zum Zitat Dang J, Honda K (1996) An improved vocal tract model of vowel production implementing piriform fossa resonance and transvelar nasal coupling. Proc. ICSLP1996, pp 965–968 Dang J, Honda K (1996) An improved vocal tract model of vowel production implementing piriform fossa resonance and transvelar nasal coupling. Proc. ICSLP1996, pp 965–968
27.
Zurück zum Zitat Rabiner LR, Shafer RW (1989) Digital signal processing of speech signals. Prentice-Hall, Englewood Cliffs Rabiner LR, Shafer RW (1989) Digital signal processing of speech signals. Prentice-Hall, Englewood Cliffs
28.
Zurück zum Zitat Rao A, Kumaresan R (2000) On decomposing speech into modulated components. IEEE Trans Speech Audio Process 8(3):240–254CrossRef Rao A, Kumaresan R (2000) On decomposing speech into modulated components. IEEE Trans Speech Audio Process 8(3):240–254CrossRef
29.
Zurück zum Zitat Patterson RD (1987) A pulse ribbon model of monoaural phase perception. J Acoust Soc Am 82(5):1560–1586CrossRef Patterson RD (1987) A pulse ribbon model of monoaural phase perception. J Acoust Soc Am 82(5):1560–1586CrossRef
30.
Zurück zum Zitat Paliwal K, Arslan L (2003) Usefulness of phase spectrum in human speech perception. EUROSPEECH ’03, Geneva, pp 2117–2120 Paliwal K, Arslan L (2003) Usefulness of phase spectrum in human speech perception. EUROSPEECH ’03, Geneva, pp 2117–2120
31.
Zurück zum Zitat Paliwal K, Alsteris LD (2005) On the usefulness of STFT phase spectrum in human listening tests. Speech Commun 45(2):153–170CrossRef Paliwal K, Alsteris LD (2005) On the usefulness of STFT phase spectrum in human listening tests. Speech Commun 45(2):153–170CrossRef
32.
Zurück zum Zitat Alsteris LD, Paliwal K (2006) Further intelligibility results from human listening tests using the short-time phase spectrum. Speech Commun 48(6):727–736CrossRef Alsteris LD, Paliwal K (2006) Further intelligibility results from human listening tests using the short-time phase spectrum. Speech Commun 48(6):727–736CrossRef
33.
Zurück zum Zitat Lindemann E, Kates JM (1999) Phase relationships and amplitude envelopes in auditory perception. Proc. IEEE workshop on applications of signal processing to audio and acoustics, New Paltz, New York, pp 17–20 Lindemann E, Kates JM (1999) Phase relationships and amplitude envelopes in auditory perception. Proc. IEEE workshop on applications of signal processing to audio and acoustics, New Paltz, New York, pp 17–20
34.
Zurück zum Zitat Loughlin PJ, Tacer B (1996) On the amplitude and frequency modulation decomposition of signals. J Acoust Soc Am 100(3):1594–1601CrossRef Loughlin PJ, Tacer B (1996) On the amplitude and frequency modulation decomposition of signals. J Acoust Soc Am 100(3):1594–1601CrossRef
35.
Zurück zum Zitat Maragos P, Kaiser JF, Quatieri TF (1993) Energy separation in signal modulations with application to speech analysis. IEEE Trans Signal Process 41(10):3024–3051MATHCrossRef Maragos P, Kaiser JF, Quatieri TF (1993) Energy separation in signal modulations with application to speech analysis. IEEE Trans Signal Process 41(10):3024–3051MATHCrossRef
36.
Zurück zum Zitat Zeng F-G, Nie K, Stickney GS, Kong Y-Y, Vongphoe M, Bhargave A, Wei C, Cao K (2005) Speech recognition with amplitude and frequency modulations. Proc Natl Acad Sci U S A 102(7):2293–2298CrossRef Zeng F-G, Nie K, Stickney GS, Kong Y-Y, Vongphoe M, Bhargave A, Wei C, Cao K (2005) Speech recognition with amplitude and frequency modulations. Proc Natl Acad Sci U S A 102(7):2293–2298CrossRef
37.
Zurück zum Zitat Mishra H, Ikbal S, Yegnanarayana B (2003) Speaker specific mapping for text-independent speaker recognition. Speech Commun 39:301–310CrossRef Mishra H, Ikbal S, Yegnanarayana B (2003) Speaker specific mapping for text-independent speaker recognition. Speech Commun 39:301–310CrossRef
38.
Zurück zum Zitat Deshpande MS, Holambe RS (2009) Improving speaker identification in noisy environment. Proceedings of the 4th Indian International Conference on Artificial Intelligence (IICAI-09), Tumkur, Bangalore, pp 1687–1700 Deshpande MS, Holambe RS (2009) Improving speaker identification in noisy environment. Proceedings of the 4th Indian International Conference on Artificial Intelligence (IICAI-09), Tumkur, Bangalore, pp 1687–1700
39.
Zurück zum Zitat Farooq O, Datta S (2001) Mel filter like admissible wavelet packet structure for speech recognition. IEEE Signal Process Lett 8(7):196–199CrossRef Farooq O, Datta S (2001) Mel filter like admissible wavelet packet structure for speech recognition. IEEE Signal Process Lett 8(7):196–199CrossRef
40.
Zurück zum Zitat Hsieh CT, Lai E, Wang YC (2002) Robust speech features based on wavelet transform with application to speaker identification. IEE Proc Image Signal Process 149(2):108–114CrossRef Hsieh CT, Lai E, Wang YC (2002) Robust speech features based on wavelet transform with application to speaker identification. IEE Proc Image Signal Process 149(2):108–114CrossRef
41.
Zurück zum Zitat Torres MH, Rufiner HL (2002) Automatic speaker identification by means of mel cepstrum, wavelets and wavelets packets. In Proc. IEEE international conference, EMBS, Chicago, IL, pp 978–981 Torres MH, Rufiner HL (2002) Automatic speaker identification by means of mel cepstrum, wavelets and wavelets packets. In Proc. IEEE international conference, EMBS, Chicago, IL, pp 978–981
42.
Zurück zum Zitat Sarikaya R, Pellon BL, Hansen JHL (1998) Wavelet packet transforms features with application to speaker identification. IEEE nordic signal processing symp., pp 81–84 Sarikaya R, Pellon BL, Hansen JHL (1998) Wavelet packet transforms features with application to speaker identification. IEEE nordic signal processing symp., pp 81–84
43.
Zurück zum Zitat Sarikaya R, Hansen JHL (2000) High resolution speech feature parameterization for mono-phone based stressed speech recognition. IEEE Signal Process Lett 7(7):182–185CrossRef Sarikaya R, Hansen JHL (2000) High resolution speech feature parameterization for mono-phone based stressed speech recognition. IEEE Signal Process Lett 7(7):182–185CrossRef
44.
Zurück zum Zitat Patil HA, Dutta PK, Basu TK (2006) The wavelet packet based cepstral features for open set speaker classification in Marathi. In: Spiliopoulou M et al (eds) Studies in classification, data analysis, and knowledge organization. Springer, Berlin, pp 134–141 Patil HA, Dutta PK, Basu TK (2006) The wavelet packet based cepstral features for open set speaker classification in Marathi. In: Spiliopoulou M et al (eds) Studies in classification, data analysis, and knowledge organization. Springer, Berlin, pp 134–141
45.
Zurück zum Zitat Patil HA, Basu TK (2004) Comparison of subband cepstrum and Mel cepstrum for open set speaker classification. In IEEE INDICON, IIT Kharagpur, pp 35–40 Patil HA, Basu TK (2004) Comparison of subband cepstrum and Mel cepstrum for open set speaker classification. In IEEE INDICON, IIT Kharagpur, pp 35–40
46.
Zurück zum Zitat Zhau G, Hanscn JHL, Kaiser JF (2001) Non-linear feature based classification of speech under stress. IEEE Trans Speech Audio Process 9:201–216CrossRef Zhau G, Hanscn JHL, Kaiser JF (2001) Non-linear feature based classification of speech under stress. IEEE Trans Speech Audio Process 9:201–216CrossRef
47.
Zurück zum Zitat Jankowski CR (1996) Fine structure features for speaker identification. PhD thesis, MIT, USA Jankowski CR (1996) Fine structure features for speaker identification. PhD thesis, MIT, USA
48.
Zurück zum Zitat Jabloun F, Cetin AE, Erzin E (1999) Teager energy based feature parameters for speech recognition in car noise. IEEE Signal Process Lett 6(10):159–261CrossRef Jabloun F, Cetin AE, Erzin E (1999) Teager energy based feature parameters for speech recognition in car noise. IEEE Signal Process Lett 6(10):159–261CrossRef
49.
Zurück zum Zitat Kaiser JF (1993) Some useful properties of Teagers energy operator. Proc. IEEE int. conf. acoustics, speech, and signal processing, vol 3, pp 149–152 Kaiser JF (1993) Some useful properties of Teagers energy operator. Proc. IEEE int. conf. acoustics, speech, and signal processing, vol 3, pp 149–152
50.
Zurück zum Zitat Kaiser JF (1990) On a simple algorithm to calculate the energy of a signal. Proc. IEEE Int. Conf. acoustics, speech, and signal processing, Albuquerque, NM, pp 381–384 Kaiser JF (1990) On a simple algorithm to calculate the energy of a signal. Proc. IEEE Int. Conf. acoustics, speech, and signal processing, Albuquerque, NM, pp 381–384
51.
Zurück zum Zitat Deshpande MS, Holambe RS (2009) Teager energy operator based robust speaker identification in noisy environment. International conference on VLSI and communication (ICVcom-2009), Kottayam, pp 541-545 Deshpande MS, Holambe RS (2009) Teager energy operator based robust speaker identification in noisy environment. International conference on VLSI and communication (ICVcom-2009), Kottayam, pp 541-545
53.
Zurück zum Zitat Deshpande MS, Holambe RS (2009) Speaker identification based on robust AM-FM features. Proceedings of second IEEE international conference on emerging trends in engineering and technology (ICETET-2009), Nagpur, pp 880–884 Deshpande MS, Holambe RS (2009) Speaker identification based on robust AM-FM features. Proceedings of second IEEE international conference on emerging trends in engineering and technology (ICETET-2009), Nagpur, pp 880–884
54.
Zurück zum Zitat Deshpande MS, Holambe RS (2009) Robust Q features for speaker identification. Proceedings of IEEE international conference on Advances in Recent Technologies in Communication and computing (ARTCom-2009), Kottayam, Kerala, pp 209–213 Deshpande MS, Holambe RS (2009) Robust Q features for speaker identification. Proceedings of IEEE international conference on Advances in Recent Technologies in Communication and computing (ARTCom-2009), Kottayam, Kerala, pp 209–213
55.
Zurück zum Zitat Grimaldi M, Cummins F (2008) Speaker identification using instantaneous frequencies. IEEE Trans Audio Speech Lang Process 16(6):1097–1111CrossRef Grimaldi M, Cummins F (2008) Speaker identification using instantaneous frequencies. IEEE Trans Audio Speech Lang Process 16(6):1097–1111CrossRef
56.
Zurück zum Zitat Quatieri TF, Hanna TE, O’Leary GC (1997) AM-FM separation using auditory-motivated filters. IEEE Trans Speech Audio Process 5(5):465–480CrossRef Quatieri TF, Hanna TE, O’Leary GC (1997) AM-FM separation using auditory-motivated filters. IEEE Trans Speech Audio Process 5(5):465–480CrossRef
57.
Zurück zum Zitat Saberi K, Hafter ER (1995) A common neural code for frequency and amplitude-modulated sounds. Nature 374:537–539CrossRef Saberi K, Hafter ER (1995) A common neural code for frequency and amplitude-modulated sounds. Nature 374:537–539CrossRef
58.
Zurück zum Zitat Potamianos A, Maragos P (1994) A comparison of the energy operator and the Hilbert transform approach to signal and speech demodulation. Signal Process 37:95–120MATHCrossRef Potamianos A, Maragos P (1994) A comparison of the energy operator and the Hilbert transform approach to signal and speech demodulation. Signal Process 37:95–120MATHCrossRef
59.
Zurück zum Zitat Francesco G, Giorgio B, Paolo C, Claudio T (2007) Multicomponent AM-FM representations: an asymptotically exact approach. IEEE Trans Audio Speech Lang Process 15(3):823–837CrossRef Francesco G, Giorgio B, Paolo C, Claudio T (2007) Multicomponent AM-FM representations: an asymptotically exact approach. IEEE Trans Audio Speech Lang Process 15(3):823–837CrossRef
60.
Zurück zum Zitat Potamianos A, Maragos P (1996) Speech formant frequency and bandwidth tracking using multiband energy demodulation. J Acoust Soc Am 99(6):3795–3806CrossRef Potamianos A, Maragos P (1996) Speech formant frequency and bandwidth tracking using multiband energy demodulation. J Acoust Soc Am 99(6):3795–3806CrossRef
61.
Zurück zum Zitat Boashash B (1992) Estimating and interpreting the instanteneous frequency of a signal—Part 1: fundamentals. Proc IEEE 80(4):519–538 Boashash B (1992) Estimating and interpreting the instanteneous frequency of a signal—Part 1: fundamentals. Proc IEEE 80(4):519–538
62.
Zurück zum Zitat Jankowski CR, Quatieri TF, Reynolds DA (1995) Measuring fine structure in speech: application to speaker identification. Proc. IEEE int. conf. acoustics, speech, and signal processing, pp 325–328 Jankowski CR, Quatieri TF, Reynolds DA (1995) Measuring fine structure in speech: application to speaker identification. Proc. IEEE int. conf. acoustics, speech, and signal processing, pp 325–328
63.
Zurück zum Zitat Potamianos A, Maragos P (1995) Speech formant frequency and bandwidth tracking using multiband energy demodulation. Proc. IEEE int. conf. acoustics, speech, signal processing, pp 784–787 Potamianos A, Maragos P (1995) Speech formant frequency and bandwidth tracking using multiband energy demodulation. Proc. IEEE int. conf. acoustics, speech, signal processing, pp 784–787
64.
Zurück zum Zitat Dimitriadis DV, Maragos P, Potamianos A (2005) Robust AM-FM features for speech recognition. IEEE Signal Process Lett 12(9):621–624CrossRef Dimitriadis DV, Maragos P, Potamianos A (2005) Robust AM-FM features for speech recognition. IEEE Signal Process Lett 12(9):621–624CrossRef
Metadaten
Titel
Noise Robust Speaker Identification: Using Nonlinear Modeling Techniques
verfasst von
Raghunath S. Holambe, Ph.D.
Mangesh S. Deshpande, M.E.
Copyright-Jahr
2012
Verlag
Springer New York
DOI
https://doi.org/10.1007/978-1-4614-0263-3_7

Neuer Inhalt