nach oben

International Journal of Speech Technology

Erschienen in:

01.06.2012

Speaker verification using excitation source information

verfasst von: Debadatta Pati, S. R. Mahadeva Prasanna

Erschienen in: International Journal of Speech Technology | Ausgabe 2/2012

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

In this work we develop a speaker recognition system based on the excitation source information and demonstrate its significance by comparing with the vocal tract information based system. The speaker-specific excitation information is extracted by the subsegmental, segmental and suprasegmental processing of the LP residual. The speaker-specific information from each level is modeled independently using Gaussian mixture modeling—universal background model (GMM-UBM) modeling and then combined at the score level. The significance of the proposed speaker recognition system is demonstrated by conducting speaker verification experiments on the NIST-03 database. Two different tests, namely, Clean test and Noisy test are conducted. In case of Clean test, the test speech signal is used as it is for verification. In case of Noisy test, the test speech is corrupted by factory noise (9 dB) and then used for verification. Even though for Clean test case, the proposed source based speaker recognition system still provides relatively poor performance than the vocal tract information, its performance is better for Noisy test case. Finally, for both clean and noisy cases, by providing different and robust speaker-specific evidences, the proposed system helps the vocal tract system to further improve the overall performance.

Vorheriger Artikel Time–domain non-linear feature parameter for consonant classification

Nächster Artikel Spectral histogram of oriented gradients (SHOGs) for Tamil language male/female speaker classification

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Ananthapadmanabha, T. V., & Yegnanarayana, B. (1979). Epoch extraction from linear prediction residual for identification of closed glottis interval. IEEE Transactions on Acoustics, Speech, and Signal Processing, ASSP-27, 309–319. CrossRef

Atal, B. S. (1974). Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification. The Journal of the Acoustical Society of America, 55(6), 1304–1312. CrossRef

Atal, B. S. (1976). Automatic recognition of speakers from their voices. Proceedings of the IEEE, 64(4), 460–475. CrossRef

Campbell, J. P. Jr. (1997). Speaker recognition: a tutorial. Proceedings of the IEEE, 85(9), 1437–1462. CrossRef

Chan, W. N., Zheng, N., & Lee, T. (2007). Discrimination power of vocal source and vocal tract related features for speaker segmentations. IEEE Transactions on Audio, Speech and Signal Processing, 15(6), 1884–1892. CrossRef

Davis, S. B., & Mermelstein, P. (1980). Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech, and Signal Processing, 28(28), 357–366. CrossRef

Deller, J. R. Jr., Hansen, J. H. L., & Proakis, J. G. (2000). Discrete-Time Processing of Speech Signal (2nd edn.). New York: IEEE Press.

Falk, T. H., & Chan, W.-Y. (2010). Modulation spectral features for robust far-field speaker identification. IEEE Transactions on Audio, Speech, and Language Processing, 18(1), 90–100. CrossRef

Furui, S. (1981). Cepstral analysis technique for automatic speaker verification. IEEE Transactions on Acoustics, Speech, and Signal Processing, 29(2), 254–272. CrossRef

Gish, H., & Schmidt, M. (1994). Text- independent speaker identification. IEEE Signal Processing Magazine, 11, 18–32. CrossRef

Hall, J. J., & Srihari, S. N. (1994). Decision combination in multiple classifier systems. IEEE Transactions on Pattern Analysis and Machine Intelligence, 16, 66–75. CrossRef

Hayakawa, S., Takeda, K., & Itakura, F. (1997). Speaker identification using harmonic structure of lp-residual spectrum. In Lecture notes: Vol. 1206. Biometric personal Authentification (pp. 253–260). Berlin: Springer.

Iseli, M. R., & Alwan, A. (2000). Inter- and intra-speaker variability of glottal flow derivative. In Int. conf. on spoken language processing (ICSLP, 2000), Beijing, China.

Kinnunen, T., & Li, H. (2009). An overview of text-independent speaker recognition: from features to supervectors. Speech Communication, 52, 12–40. CrossRef

Kittler, J., Hatef, M., Duin, R. P. W., & Matas, J. (1998). On combining classifiers. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(3), 226–239. CrossRef

Linguistic Data Consortium (2004). Switchboard cellular part 2 audio. http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2004S07.

Makhoul, J. (1975). Linear prediction: a tutorial review. Proceedings of the IEEE, 63(4), 561–580. CrossRef

Martin, A., Doddington, G., Kamm, T., Ordowski, M., & Przybocki, M. (1997). The DET curve in assessment of detection task performance. In Proc. Eur. conf. on speech communication technology, Rhodes, Greece (Vol. 4, pp. 1895–1898).

Mary, L., & Yegnanarayana, B. (2008). Extraction and representation of prosodic features for language and speaker recognition. Speech Communication, 50, 782–796. CrossRef

Mashao, D. J., & Skosan, M. (2006). Combining classifier decisions for robust speaker identification. Pattern Recognition, 39, 147–155. CrossRef

Murthy, K. S. R., & Yegnanarayana, B. (2008). Epoch extraction from speech signal. IEEE Transactions on Audio, Speech, and Language Processing, 16(8), 1602–1613. CrossRef

Murty, K. S. R., & Yegnanarayana, B. (2006). Combining evidence from residual phase and MFCC features for speaker recognition. IEEE Signal Processing Letters, 13(1), 52–55. CrossRef

Murty, K. S. R., Prasanna, S. R. M., & Yegnanarayana, B. (2004). Speaker specific information from residual phase. In Int. conf. on signal proces. and comm. (SPCOM).

Naylor, P. A., Kounoudes, A., Gudnason, J., & Brookes, M. (2007). Estimation of glottal closure instants in voiced speech using the dypsa algorithm. IEEE Transactions on Audio, Speech, and Language Processing, 15(1), 34–43. CrossRef

Nist speaker recognition evaluation plan (2003). In Proc. NIST speaker recognition workshop, College Park, MD.

Padmanabhan, R., & Murthy, H. A. (2010). Acoustic feature diversity and speaker verification. In INTERSPEECH 2010, Sept., Makuhari, Chiba, Japan (pp. 2010–2013).

Pati, D., & Prasanna, S. R. M. (2008). Non-parametric vector quantization of excitation source information for speaker recognition. In Proc. IEEE TENCON, 2008 (pp. 1–4).

Pati, D., & Prasanna, S. R. M. (2010). Speaker information from subband energies of linear prediction residual. In Proc. NCC 2010 (pp. 1–4).

Pati, D., & Prasanna, S. R. M. (2011a). Subsegmental, segmental and suprasegmental processing of linear prediction residual for speaker information. International Journal of Speech Technology, 14(1), 49–63. CrossRef

Pati, D., & Prasanna, S. R. M. (2011b, accepted). Speaker recognition using suprasegmental level excitation information. International Journal of Information and Communication Technology (IJICT).

Pati, D., & Prasanna, S. R. M. (2012a, in press). Processing of linear prediction residual in spectral and cepstral domains for speaker information. In Communicated to SADHANA (Springer).

Pati, D., & Prasanna, S. R. M. (2012b, in press). A comparative study of explicit and implicit modeling of subsegmental speaker-specific excitation source information. In Communicated to SADHANA (Springer).

Plumpe, M. D., Quatieri, T. F., & Reynolds, D. A. (1999). Modelling of glottal flow derivative waveform with application to speaker identification. IEEE Transactions on Speech and Audio Processing, 7(5), 569–586. CrossRef

Prasanna, S. R. M., Gupta, C. S., & Yegnanarayana, B. (2006). Extraction of speaker-specific excitation information from linear prediction residual of speech. Speech Communication, 48, 1243–1261. CrossRef

Rabiner, L. R., & Schafer, R. W. (1978). Digital processing of speech signals. Englewood Cliffs: Prentice-Hall.

Reynolds, D. A. (1994). Experimental evaluation of features for robust speaker identification. IEEE Transactions on Speech and Audio Processing, 2(4), 639–643. CrossRef

Reynolds, D. A. (1995). Speaker identification and verification using Gaussian mixture speaker models. Speech Communication, 17, 91–108. CrossRef

Reynolds, D. A., & Rose, R. C. (1995a). Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Transactions on Speech and Audio Processing, 3(1), 72–83. CrossRef

Reynolds, D. A., & Rose, R. C. (1995b). Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Transactions on Speech and Audio Processing, 3(1), 4–17. CrossRef

Reynolds, D. A., Quatieri, T. F., & Dunn, R. (2000). Speaker verification using adapted Gaussian mixture models. Digital Signal Processing, 10, 19–41. CrossRef

Thevenaz, P., & Hugli, H. (1995). Usefulness of the LPC-residue in text-independent speaker verification. Speech Communication, 17, 145–157. CrossRef

Veldhuish, R. (1998). A computationally efficient alternative for the Liljencrants-Fant model and its perceptual evaluation. The Journal of the Acoustical Society of America, 103(1), 566–571. CrossRef

Wang, N., Ching, P. C., & Lee, T. (2009). Exploration of vocal excitation modulation features for speaker recognition. In Proc. INTERSPEECH-09, Brighton, UK (pp. 892–895).

Xu, L., Krzyzak, A., & Suen, C. Y. (1992). Methods of combining multiple classifiers and their applications to handwriting. IEEE Transactions on Systems, Man, and Cybernetics, 22(3), 412–435. CrossRef

Yegnanarayana, B., & Veldhuis, R. N. J. (1998). Extraction of vocal-tract system characteristics from speech signals. IEEE Transactions on Speech and Audio Processing, 6(4), 313–327. CrossRef

Yegnanarayana, B., Reddy, K. S., & Kishore, S. P. (2001). Source and systsem feature for speaker recognition using AANN Models. In Proc. IEEE int. con. acoust. speech and signal process, Salt Lake City, UT, USA, May (pp. 409–412).

Yegnenarayana, B., & Murthy, K. S. R. (2009). Event based instantaneous fundamental frequency estimation from speech signals. IEEE Transactions on Audio, Speech, and Language Processing, 17(4), 614–624. CrossRef

Zheng, N., Lee, T., & Ching, P. C. (2007). Integration of complimentary acoustic features for speaker recognition. IEEE Signal Processing Letters, 14(3), 181–184. CrossRef

Titel: Speaker verification using excitation source information
verfasst von: Debadatta Pati
S. R. Mahadeva Prasanna
Publikationsdatum: 01.06.2012
Verlag: Springer US
Erschienen in: International Journal of Speech Technology / Ausgabe 2/2012
Print ISSN: 1381-2416
Elektronische ISSN: 1572-8110
DOI: https://doi.org/10.1007/s10772-012-9137-5

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Weitere Artikel der Ausgabe 2/2012

Emotion recognition from speech using source, system, and prosodic features

Emotion recognition from speech: a review

Filterbank optimization for robust ASR using GA and PSO

Speaker-independent emotion recognition exploiting a psychologically-inspired binary cascade classification schema

Spectral histogram of oriented gradients (SHOGs) for Tamil language male/female speaker classification

Multilingual recognition of non-native speech using acoustic model transformation and pronunciation modeling