nach oben

International Journal of Speech Technology

Erschienen in:

01.03.2013

Improving the performance of speaker and language identification tasks using unique characteristics of a class

verfasst von: B. Bharathi, C. Arun Kumar, T. Nagarajan

Erschienen in: International Journal of Speech Technology | Ausgabe 1/2013

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

In classification tasks, the error rate is proportional to the commonality among classes. In conventional GMM-based modeling technique, since the model parameters of a class are estimated without considering other classes in the system, features that are common across various classes may also be captured, along with unique features. This paper proposes to use unique characteristics of a class at the feature-level and at the phoneme-level, separately, to improve the classification accuracy. At the feature-level, the performance of a classifier has been analyzed by capturing the unique features while modeling, and removing common feature vectors during classification. Experiments were conducted on speaker identification task, using speech data of 40 female speakers from NTIMIT corpus, and on a language identification task, using speech data of two languages (English and French) from OGI_MLTS corpus. At the phoneme-level, performance of a classifier has been analyzed by identifying a subset of phonemes, which are unique to a speaker with respect to his/her closely resembling speaker, in the acoustic sense, on a speaker identification task. In both the cases (feature-level and phoneme-level) considerable improvement in classification accuracy is observed over conventional GMM-based classifiers in the above mentioned tasks. Among the three experimental setup, speaker identification task using unique phonemes shows as high as 9.56 % performance improvement over conventional GMM-based classifier.

Vorheriger Artikel Speaker recognition utilizing distributed DCT-II based Mel frequency cepstral coefficients and fuzzy vector quantization

Nächster Artikel Phoneme recognition using zerocrossing interval distribution of speech patterns and ANN

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

In the present study, N _k is not normalized, as this will not affect its use in (14).

Since the number of examples for each of the phonemes used in the work is less, product of likelihood-Gaussians used in the feature-level approach cannot be used.

Arslan, L. M., & Hansen, J. H. L. (1999). Selective training for hidden Markov models with applications to speech classification. IEEE Transactions on Speech and Audio Processing, 7(1), 46–54. CrossRef

Arun Kumar, C., Bharathi, B., & Nagarajan, T. (2009). A discriminative GMM technique using product of likelihood Gaussians. In IEEE TENCON (pp. 1–6).

Bharathi, B., Vijayalakshmi, P., & Nagarajan, T. (2011). Speaker identification using utterances correspond to speaker-specific-text. In IEEE students technology symposium (Techsym) (pp. 171–174). CrossRef

Garofolo, J. S., Lamel, L. F., Fisher, W. M., Fiscus, J. G., Pallett, D. S., Dahlgren, N. L., & Zue, V. (1993). TIMIT acoustic-phonetic continuous speech corpus. In Linguistic data consortium, Philadelphia, USA.

Jankowski, C., et al. (1990). NTIMIT: a phonetically balanced, continuous speech, telephone bandwidth speech database. In Proc. of ICASSP (pp. 109–112).

Liu, C.-S., Lee, C.-H., Juang, B.-H., & Rosenberg, A. E. (1994). Speaker recognition based on minimum error discriminative training. In Proceedings of IEEE international conference on acoustics, speech, and signal processing (Vol. 1, pp. 325–328).

Nagarajan, T., & O’Shaughnessy, D. (2006). Discriminative MLE training using a product of Gaussian likelihoods. In INTERSPEECH—2006, Pittsburgh, Pennsylvania, USA (pp. 601–604).

Nagarajan, T., & O’Shaughnessy, D. (2007). Bias estimation and correction in a classifier using product of likelihood-gaussians. In ICASSP, Hawaii, USA (pp. 1061–1064).

Reynolds, D. A., & Rose, R. (1995). Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Transactions on Speech and Audio Processing, 3, 72–83. CrossRef

Zissman, M. A. (1996). Comparison of four approaches to automatic language identification of telephone speech. IEEE Transactions on Speech and Audio Processing, 4(1), 31–44. CrossRef

Titel: Improving the performance of speaker and language identification tasks using unique characteristics of a class
verfasst von: B. Bharathi
C. Arun Kumar
T. Nagarajan
Publikationsdatum: 01.03.2013
Verlag: Springer US
Erschienen in: International Journal of Speech Technology / Ausgabe 1/2013
Print ISSN: 1381-2416
Elektronische ISSN: 1572-8110
DOI: https://doi.org/10.1007/s10772-012-9167-z

Neuer Inhalt

Bildnachweise

VDI-Icon, Profil Icon, inhalt2, Springer Professional Modul/© Springer Fachmedien Wiesbaden GmbH, Zukunftswerkstatt Sales Excellence_ieS/© Springer Fachmedien Wiesbaden GmbH, Search Icon, Banner Hanser, Strompreise/© vejaa / stock.adobe.com, Bunte Männchen, die Kunden darstelle, werden von einem riesigen Magneten angezogen. /© Oleksiy Mark, Dr. Daniel Schneider/© Fraunhofer IESE, Zeitschrift Wissensmanagement Cover, PatentFit-Logo/© Springer Fachmedien Wiesbaden GmbH, Zukunftswerkstatt Sales Excellence 2024/© AndreyPopov / Getty Images / iStock, 2023_Antrieb/© supervisuell, ATZ-Webinar: Prototypenfreie Entwicklung durch Offline- und Driver-in-the-Loop-HiL-Tests /© (c) VI-grade

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Weitere Artikel der Ausgabe 1/2013

The CARES corpus: a database of older adult actor simulated emergency dialogue for developing a personal emergency response system

Effect of aging on speech features and phoneme recognition: a study on Bengali voicing vowels

A hybrid VQ-GMM approach for identifying Indian languages

Emotion modeling from speech signal based on wavelet packet transform

Speaker recognition utilizing distributed DCT-II based Mel frequency cepstral coefficients and fuzzy vector quantization

Development and evaluation of online text-independent speaker verification system for remote person authentication

Neuer Inhalt

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.