Skip to main content
Erschienen in: International Journal of Speech Technology 2/2012

01.06.2012

Speaker verification using excitation source information

verfasst von: Debadatta Pati, S. R. Mahadeva Prasanna

Erschienen in: International Journal of Speech Technology | Ausgabe 2/2012

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In this work we develop a speaker recognition system based on the excitation source information and demonstrate its significance by comparing with the vocal tract information based system. The speaker-specific excitation information is extracted by the subsegmental, segmental and suprasegmental processing of the LP residual. The speaker-specific information from each level is modeled independently using Gaussian mixture modeling—universal background model (GMM-UBM) modeling and then combined at the score level. The significance of the proposed speaker recognition system is demonstrated by conducting speaker verification experiments on the NIST-03 database. Two different tests, namely, Clean test and Noisy test are conducted. In case of Clean test, the test speech signal is used as it is for verification. In case of Noisy test, the test speech is corrupted by factory noise (9 dB) and then used for verification. Even though for Clean test case, the proposed source based speaker recognition system still provides relatively poor performance than the vocal tract information, its performance is better for Noisy test case. Finally, for both clean and noisy cases, by providing different and robust speaker-specific evidences, the proposed system helps the vocal tract system to further improve the overall performance.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Ananthapadmanabha, T. V., & Yegnanarayana, B. (1979). Epoch extraction from linear prediction residual for identification of closed glottis interval. IEEE Transactions on Acoustics, Speech, and Signal Processing, ASSP-27, 309–319. CrossRef Ananthapadmanabha, T. V., & Yegnanarayana, B. (1979). Epoch extraction from linear prediction residual for identification of closed glottis interval. IEEE Transactions on Acoustics, Speech, and Signal Processing, ASSP-27, 309–319. CrossRef
Zurück zum Zitat Atal, B. S. (1974). Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification. The Journal of the Acoustical Society of America, 55(6), 1304–1312. CrossRef Atal, B. S. (1974). Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification. The Journal of the Acoustical Society of America, 55(6), 1304–1312. CrossRef
Zurück zum Zitat Atal, B. S. (1976). Automatic recognition of speakers from their voices. Proceedings of the IEEE, 64(4), 460–475. CrossRef Atal, B. S. (1976). Automatic recognition of speakers from their voices. Proceedings of the IEEE, 64(4), 460–475. CrossRef
Zurück zum Zitat Campbell, J. P. Jr. (1997). Speaker recognition: a tutorial. Proceedings of the IEEE, 85(9), 1437–1462. CrossRef Campbell, J. P. Jr. (1997). Speaker recognition: a tutorial. Proceedings of the IEEE, 85(9), 1437–1462. CrossRef
Zurück zum Zitat Chan, W. N., Zheng, N., & Lee, T. (2007). Discrimination power of vocal source and vocal tract related features for speaker segmentations. IEEE Transactions on Audio, Speech and Signal Processing, 15(6), 1884–1892. CrossRef Chan, W. N., Zheng, N., & Lee, T. (2007). Discrimination power of vocal source and vocal tract related features for speaker segmentations. IEEE Transactions on Audio, Speech and Signal Processing, 15(6), 1884–1892. CrossRef
Zurück zum Zitat Davis, S. B., & Mermelstein, P. (1980). Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech, and Signal Processing, 28(28), 357–366. CrossRef Davis, S. B., & Mermelstein, P. (1980). Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech, and Signal Processing, 28(28), 357–366. CrossRef
Zurück zum Zitat Deller, J. R. Jr., Hansen, J. H. L., & Proakis, J. G. (2000). Discrete-Time Processing of Speech Signal (2nd edn.). New York: IEEE Press. Deller, J. R. Jr., Hansen, J. H. L., & Proakis, J. G. (2000). Discrete-Time Processing of Speech Signal (2nd edn.). New York: IEEE Press.
Zurück zum Zitat Falk, T. H., & Chan, W.-Y. (2010). Modulation spectral features for robust far-field speaker identification. IEEE Transactions on Audio, Speech, and Language Processing, 18(1), 90–100. CrossRef Falk, T. H., & Chan, W.-Y. (2010). Modulation spectral features for robust far-field speaker identification. IEEE Transactions on Audio, Speech, and Language Processing, 18(1), 90–100. CrossRef
Zurück zum Zitat Furui, S. (1981). Cepstral analysis technique for automatic speaker verification. IEEE Transactions on Acoustics, Speech, and Signal Processing, 29(2), 254–272. CrossRef Furui, S. (1981). Cepstral analysis technique for automatic speaker verification. IEEE Transactions on Acoustics, Speech, and Signal Processing, 29(2), 254–272. CrossRef
Zurück zum Zitat Gish, H., & Schmidt, M. (1994). Text- independent speaker identification. IEEE Signal Processing Magazine, 11, 18–32. CrossRef Gish, H., & Schmidt, M. (1994). Text- independent speaker identification. IEEE Signal Processing Magazine, 11, 18–32. CrossRef
Zurück zum Zitat Hall, J. J., & Srihari, S. N. (1994). Decision combination in multiple classifier systems. IEEE Transactions on Pattern Analysis and Machine Intelligence, 16, 66–75. CrossRef Hall, J. J., & Srihari, S. N. (1994). Decision combination in multiple classifier systems. IEEE Transactions on Pattern Analysis and Machine Intelligence, 16, 66–75. CrossRef
Zurück zum Zitat Hayakawa, S., Takeda, K., & Itakura, F. (1997). Speaker identification using harmonic structure of lp-residual spectrum. In Lecture notes: Vol. 1206. Biometric personal Authentification (pp. 253–260). Berlin: Springer. Hayakawa, S., Takeda, K., & Itakura, F. (1997). Speaker identification using harmonic structure of lp-residual spectrum. In Lecture notes: Vol. 1206. Biometric personal Authentification (pp. 253–260). Berlin: Springer.
Zurück zum Zitat Iseli, M. R., & Alwan, A. (2000). Inter- and intra-speaker variability of glottal flow derivative. In Int. conf. on spoken language processing (ICSLP, 2000), Beijing, China. Iseli, M. R., & Alwan, A. (2000). Inter- and intra-speaker variability of glottal flow derivative. In Int. conf. on spoken language processing (ICSLP, 2000), Beijing, China.
Zurück zum Zitat Kinnunen, T., & Li, H. (2009). An overview of text-independent speaker recognition: from features to supervectors. Speech Communication, 52, 12–40. CrossRef Kinnunen, T., & Li, H. (2009). An overview of text-independent speaker recognition: from features to supervectors. Speech Communication, 52, 12–40. CrossRef
Zurück zum Zitat Kittler, J., Hatef, M., Duin, R. P. W., & Matas, J. (1998). On combining classifiers. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(3), 226–239. CrossRef Kittler, J., Hatef, M., Duin, R. P. W., & Matas, J. (1998). On combining classifiers. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(3), 226–239. CrossRef
Zurück zum Zitat Makhoul, J. (1975). Linear prediction: a tutorial review. Proceedings of the IEEE, 63(4), 561–580. CrossRef Makhoul, J. (1975). Linear prediction: a tutorial review. Proceedings of the IEEE, 63(4), 561–580. CrossRef
Zurück zum Zitat Martin, A., Doddington, G., Kamm, T., Ordowski, M., & Przybocki, M. (1997). The DET curve in assessment of detection task performance. In Proc. Eur. conf. on speech communication technology, Rhodes, Greece (Vol. 4, pp. 1895–1898). Martin, A., Doddington, G., Kamm, T., Ordowski, M., & Przybocki, M. (1997). The DET curve in assessment of detection task performance. In Proc. Eur. conf. on speech communication technology, Rhodes, Greece (Vol. 4, pp. 1895–1898).
Zurück zum Zitat Mary, L., & Yegnanarayana, B. (2008). Extraction and representation of prosodic features for language and speaker recognition. Speech Communication, 50, 782–796. CrossRef Mary, L., & Yegnanarayana, B. (2008). Extraction and representation of prosodic features for language and speaker recognition. Speech Communication, 50, 782–796. CrossRef
Zurück zum Zitat Mashao, D. J., & Skosan, M. (2006). Combining classifier decisions for robust speaker identification. Pattern Recognition, 39, 147–155. CrossRef Mashao, D. J., & Skosan, M. (2006). Combining classifier decisions for robust speaker identification. Pattern Recognition, 39, 147–155. CrossRef
Zurück zum Zitat Murthy, K. S. R., & Yegnanarayana, B. (2008). Epoch extraction from speech signal. IEEE Transactions on Audio, Speech, and Language Processing, 16(8), 1602–1613. CrossRef Murthy, K. S. R., & Yegnanarayana, B. (2008). Epoch extraction from speech signal. IEEE Transactions on Audio, Speech, and Language Processing, 16(8), 1602–1613. CrossRef
Zurück zum Zitat Murty, K. S. R., & Yegnanarayana, B. (2006). Combining evidence from residual phase and MFCC features for speaker recognition. IEEE Signal Processing Letters, 13(1), 52–55. CrossRef Murty, K. S. R., & Yegnanarayana, B. (2006). Combining evidence from residual phase and MFCC features for speaker recognition. IEEE Signal Processing Letters, 13(1), 52–55. CrossRef
Zurück zum Zitat Murty, K. S. R., Prasanna, S. R. M., & Yegnanarayana, B. (2004). Speaker specific information from residual phase. In Int. conf. on signal proces. and comm. (SPCOM). Murty, K. S. R., Prasanna, S. R. M., & Yegnanarayana, B. (2004). Speaker specific information from residual phase. In Int. conf. on signal proces. and comm. (SPCOM).
Zurück zum Zitat Naylor, P. A., Kounoudes, A., Gudnason, J., & Brookes, M. (2007). Estimation of glottal closure instants in voiced speech using the dypsa algorithm. IEEE Transactions on Audio, Speech, and Language Processing, 15(1), 34–43. CrossRef Naylor, P. A., Kounoudes, A., Gudnason, J., & Brookes, M. (2007). Estimation of glottal closure instants in voiced speech using the dypsa algorithm. IEEE Transactions on Audio, Speech, and Language Processing, 15(1), 34–43. CrossRef
Zurück zum Zitat Nist speaker recognition evaluation plan (2003). In Proc. NIST speaker recognition workshop, College Park, MD. Nist speaker recognition evaluation plan (2003). In Proc. NIST speaker recognition workshop, College Park, MD.
Zurück zum Zitat Padmanabhan, R., & Murthy, H. A. (2010). Acoustic feature diversity and speaker verification. In INTERSPEECH 2010, Sept., Makuhari, Chiba, Japan (pp. 2010–2013). Padmanabhan, R., & Murthy, H. A. (2010). Acoustic feature diversity and speaker verification. In INTERSPEECH 2010, Sept., Makuhari, Chiba, Japan (pp. 2010–2013).
Zurück zum Zitat Pati, D., & Prasanna, S. R. M. (2008). Non-parametric vector quantization of excitation source information for speaker recognition. In Proc. IEEE TENCON, 2008 (pp. 1–4). Pati, D., & Prasanna, S. R. M. (2008). Non-parametric vector quantization of excitation source information for speaker recognition. In Proc. IEEE TENCON, 2008 (pp. 1–4).
Zurück zum Zitat Pati, D., & Prasanna, S. R. M. (2010). Speaker information from subband energies of linear prediction residual. In Proc. NCC 2010 (pp. 1–4). Pati, D., & Prasanna, S. R. M. (2010). Speaker information from subband energies of linear prediction residual. In Proc. NCC 2010 (pp. 1–4).
Zurück zum Zitat Pati, D., & Prasanna, S. R. M. (2011a). Subsegmental, segmental and suprasegmental processing of linear prediction residual for speaker information. International Journal of Speech Technology, 14(1), 49–63. CrossRef Pati, D., & Prasanna, S. R. M. (2011a). Subsegmental, segmental and suprasegmental processing of linear prediction residual for speaker information. International Journal of Speech Technology, 14(1), 49–63. CrossRef
Zurück zum Zitat Pati, D., & Prasanna, S. R. M. (2011b, accepted). Speaker recognition using suprasegmental level excitation information. International Journal of Information and Communication Technology (IJICT). Pati, D., & Prasanna, S. R. M. (2011b, accepted). Speaker recognition using suprasegmental level excitation information. International Journal of Information and Communication Technology (IJICT).
Zurück zum Zitat Pati, D., & Prasanna, S. R. M. (2012a, in press). Processing of linear prediction residual in spectral and cepstral domains for speaker information. In Communicated to SADHANA (Springer). Pati, D., & Prasanna, S. R. M. (2012a, in press). Processing of linear prediction residual in spectral and cepstral domains for speaker information. In Communicated to SADHANA (Springer).
Zurück zum Zitat Pati, D., & Prasanna, S. R. M. (2012b, in press). A comparative study of explicit and implicit modeling of subsegmental speaker-specific excitation source information. In Communicated to SADHANA (Springer). Pati, D., & Prasanna, S. R. M. (2012b, in press). A comparative study of explicit and implicit modeling of subsegmental speaker-specific excitation source information. In Communicated to SADHANA (Springer).
Zurück zum Zitat Plumpe, M. D., Quatieri, T. F., & Reynolds, D. A. (1999). Modelling of glottal flow derivative waveform with application to speaker identification. IEEE Transactions on Speech and Audio Processing, 7(5), 569–586. CrossRef Plumpe, M. D., Quatieri, T. F., & Reynolds, D. A. (1999). Modelling of glottal flow derivative waveform with application to speaker identification. IEEE Transactions on Speech and Audio Processing, 7(5), 569–586. CrossRef
Zurück zum Zitat Prasanna, S. R. M., Gupta, C. S., & Yegnanarayana, B. (2006). Extraction of speaker-specific excitation information from linear prediction residual of speech. Speech Communication, 48, 1243–1261. CrossRef Prasanna, S. R. M., Gupta, C. S., & Yegnanarayana, B. (2006). Extraction of speaker-specific excitation information from linear prediction residual of speech. Speech Communication, 48, 1243–1261. CrossRef
Zurück zum Zitat Rabiner, L. R., & Schafer, R. W. (1978). Digital processing of speech signals. Englewood Cliffs: Prentice-Hall. Rabiner, L. R., & Schafer, R. W. (1978). Digital processing of speech signals. Englewood Cliffs: Prentice-Hall.
Zurück zum Zitat Reynolds, D. A. (1994). Experimental evaluation of features for robust speaker identification. IEEE Transactions on Speech and Audio Processing, 2(4), 639–643. CrossRef Reynolds, D. A. (1994). Experimental evaluation of features for robust speaker identification. IEEE Transactions on Speech and Audio Processing, 2(4), 639–643. CrossRef
Zurück zum Zitat Reynolds, D. A. (1995). Speaker identification and verification using Gaussian mixture speaker models. Speech Communication, 17, 91–108. CrossRef Reynolds, D. A. (1995). Speaker identification and verification using Gaussian mixture speaker models. Speech Communication, 17, 91–108. CrossRef
Zurück zum Zitat Reynolds, D. A., & Rose, R. C. (1995a). Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Transactions on Speech and Audio Processing, 3(1), 72–83. CrossRef Reynolds, D. A., & Rose, R. C. (1995a). Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Transactions on Speech and Audio Processing, 3(1), 72–83. CrossRef
Zurück zum Zitat Reynolds, D. A., & Rose, R. C. (1995b). Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Transactions on Speech and Audio Processing, 3(1), 4–17. CrossRef Reynolds, D. A., & Rose, R. C. (1995b). Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Transactions on Speech and Audio Processing, 3(1), 4–17. CrossRef
Zurück zum Zitat Reynolds, D. A., Quatieri, T. F., & Dunn, R. (2000). Speaker verification using adapted Gaussian mixture models. Digital Signal Processing, 10, 19–41. CrossRef Reynolds, D. A., Quatieri, T. F., & Dunn, R. (2000). Speaker verification using adapted Gaussian mixture models. Digital Signal Processing, 10, 19–41. CrossRef
Zurück zum Zitat Thevenaz, P., & Hugli, H. (1995). Usefulness of the LPC-residue in text-independent speaker verification. Speech Communication, 17, 145–157. CrossRef Thevenaz, P., & Hugli, H. (1995). Usefulness of the LPC-residue in text-independent speaker verification. Speech Communication, 17, 145–157. CrossRef
Zurück zum Zitat Veldhuish, R. (1998). A computationally efficient alternative for the Liljencrants-Fant model and its perceptual evaluation. The Journal of the Acoustical Society of America, 103(1), 566–571. CrossRef Veldhuish, R. (1998). A computationally efficient alternative for the Liljencrants-Fant model and its perceptual evaluation. The Journal of the Acoustical Society of America, 103(1), 566–571. CrossRef
Zurück zum Zitat Wang, N., Ching, P. C., & Lee, T. (2009). Exploration of vocal excitation modulation features for speaker recognition. In Proc. INTERSPEECH-09, Brighton, UK (pp. 892–895). Wang, N., Ching, P. C., & Lee, T. (2009). Exploration of vocal excitation modulation features for speaker recognition. In Proc. INTERSPEECH-09, Brighton, UK (pp. 892–895).
Zurück zum Zitat Xu, L., Krzyzak, A., & Suen, C. Y. (1992). Methods of combining multiple classifiers and their applications to handwriting. IEEE Transactions on Systems, Man, and Cybernetics, 22(3), 412–435. CrossRef Xu, L., Krzyzak, A., & Suen, C. Y. (1992). Methods of combining multiple classifiers and their applications to handwriting. IEEE Transactions on Systems, Man, and Cybernetics, 22(3), 412–435. CrossRef
Zurück zum Zitat Yegnanarayana, B., & Veldhuis, R. N. J. (1998). Extraction of vocal-tract system characteristics from speech signals. IEEE Transactions on Speech and Audio Processing, 6(4), 313–327. CrossRef Yegnanarayana, B., & Veldhuis, R. N. J. (1998). Extraction of vocal-tract system characteristics from speech signals. IEEE Transactions on Speech and Audio Processing, 6(4), 313–327. CrossRef
Zurück zum Zitat Yegnanarayana, B., Reddy, K. S., & Kishore, S. P. (2001). Source and systsem feature for speaker recognition using AANN Models. In Proc. IEEE int. con. acoust. speech and signal process, Salt Lake City, UT, USA, May (pp. 409–412). Yegnanarayana, B., Reddy, K. S., & Kishore, S. P. (2001). Source and systsem feature for speaker recognition using AANN Models. In Proc. IEEE int. con. acoust. speech and signal process, Salt Lake City, UT, USA, May (pp. 409–412).
Zurück zum Zitat Yegnenarayana, B., & Murthy, K. S. R. (2009). Event based instantaneous fundamental frequency estimation from speech signals. IEEE Transactions on Audio, Speech, and Language Processing, 17(4), 614–624. CrossRef Yegnenarayana, B., & Murthy, K. S. R. (2009). Event based instantaneous fundamental frequency estimation from speech signals. IEEE Transactions on Audio, Speech, and Language Processing, 17(4), 614–624. CrossRef
Zurück zum Zitat Zheng, N., Lee, T., & Ching, P. C. (2007). Integration of complimentary acoustic features for speaker recognition. IEEE Signal Processing Letters, 14(3), 181–184. CrossRef Zheng, N., Lee, T., & Ching, P. C. (2007). Integration of complimentary acoustic features for speaker recognition. IEEE Signal Processing Letters, 14(3), 181–184. CrossRef
Metadaten
Titel
Speaker verification using excitation source information
verfasst von
Debadatta Pati
S. R. Mahadeva Prasanna
Publikationsdatum
01.06.2012
Verlag
Springer US
Erschienen in
International Journal of Speech Technology / Ausgabe 2/2012
Print ISSN: 1381-2416
Elektronische ISSN: 1572-8110
DOI
https://doi.org/10.1007/s10772-012-9137-5

Weitere Artikel der Ausgabe 2/2012

International Journal of Speech Technology 2/2012 Zur Ausgabe