Skip to main content
Erschienen in: International Journal of Speech Technology 1/2013

01.03.2013

Improving the performance of speaker and language identification tasks using unique characteristics of a class

verfasst von: B. Bharathi, C. Arun Kumar, T. Nagarajan

Erschienen in: International Journal of Speech Technology | Ausgabe 1/2013

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In classification tasks, the error rate is proportional to the commonality among classes. In conventional GMM-based modeling technique, since the model parameters of a class are estimated without considering other classes in the system, features that are common across various classes may also be captured, along with unique features. This paper proposes to use unique characteristics of a class at the feature-level and at the phoneme-level, separately, to improve the classification accuracy. At the feature-level, the performance of a classifier has been analyzed by capturing the unique features while modeling, and removing common feature vectors during classification. Experiments were conducted on speaker identification task, using speech data of 40 female speakers from NTIMIT corpus, and on a language identification task, using speech data of two languages (English and French) from OGI_MLTS corpus. At the phoneme-level, performance of a classifier has been analyzed by identifying a subset of phonemes, which are unique to a speaker with respect to his/her closely resembling speaker, in the acoustic sense, on a speaker identification task. In both the cases (feature-level and phoneme-level) considerable improvement in classification accuracy is observed over conventional GMM-based classifiers in the above mentioned tasks. Among the three experimental setup, speaker identification task using unique phonemes shows as high as 9.56 % performance improvement over conventional GMM-based classifier.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
In the present study, N k is not normalized, as this will not affect its use in (14).
 
2
Since the number of examples for each of the phonemes used in the work is less, product of likelihood-Gaussians used in the feature-level approach cannot be used.
 
Literatur
Zurück zum Zitat Arslan, L. M., & Hansen, J. H. L. (1999). Selective training for hidden Markov models with applications to speech classification. IEEE Transactions on Speech and Audio Processing, 7(1), 46–54. CrossRef Arslan, L. M., & Hansen, J. H. L. (1999). Selective training for hidden Markov models with applications to speech classification. IEEE Transactions on Speech and Audio Processing, 7(1), 46–54. CrossRef
Zurück zum Zitat Arun Kumar, C., Bharathi, B., & Nagarajan, T. (2009). A discriminative GMM technique using product of likelihood Gaussians. In IEEE TENCON (pp. 1–6). Arun Kumar, C., Bharathi, B., & Nagarajan, T. (2009). A discriminative GMM technique using product of likelihood Gaussians. In IEEE TENCON (pp. 1–6).
Zurück zum Zitat Bharathi, B., Vijayalakshmi, P., & Nagarajan, T. (2011). Speaker identification using utterances correspond to speaker-specific-text. In IEEE students technology symposium (Techsym) (pp. 171–174). CrossRef Bharathi, B., Vijayalakshmi, P., & Nagarajan, T. (2011). Speaker identification using utterances correspond to speaker-specific-text. In IEEE students technology symposium (Techsym) (pp. 171–174). CrossRef
Zurück zum Zitat Garofolo, J. S., Lamel, L. F., Fisher, W. M., Fiscus, J. G., Pallett, D. S., Dahlgren, N. L., & Zue, V. (1993). TIMIT acoustic-phonetic continuous speech corpus. In Linguistic data consortium, Philadelphia, USA. Garofolo, J. S., Lamel, L. F., Fisher, W. M., Fiscus, J. G., Pallett, D. S., Dahlgren, N. L., & Zue, V. (1993). TIMIT acoustic-phonetic continuous speech corpus. In Linguistic data consortium, Philadelphia, USA.
Zurück zum Zitat Jankowski, C., et al. (1990). NTIMIT: a phonetically balanced, continuous speech, telephone bandwidth speech database. In Proc. of ICASSP (pp. 109–112). Jankowski, C., et al. (1990). NTIMIT: a phonetically balanced, continuous speech, telephone bandwidth speech database. In Proc. of ICASSP (pp. 109–112).
Zurück zum Zitat Liu, C.-S., Lee, C.-H., Juang, B.-H., & Rosenberg, A. E. (1994). Speaker recognition based on minimum error discriminative training. In Proceedings of IEEE international conference on acoustics, speech, and signal processing (Vol. 1, pp. 325–328). Liu, C.-S., Lee, C.-H., Juang, B.-H., & Rosenberg, A. E. (1994). Speaker recognition based on minimum error discriminative training. In Proceedings of IEEE international conference on acoustics, speech, and signal processing (Vol. 1, pp. 325–328).
Zurück zum Zitat Nagarajan, T., & O’Shaughnessy, D. (2006). Discriminative MLE training using a product of Gaussian likelihoods. In INTERSPEECH—2006, Pittsburgh, Pennsylvania, USA (pp. 601–604). Nagarajan, T., & O’Shaughnessy, D. (2006). Discriminative MLE training using a product of Gaussian likelihoods. In INTERSPEECH—2006, Pittsburgh, Pennsylvania, USA (pp. 601–604).
Zurück zum Zitat Nagarajan, T., & O’Shaughnessy, D. (2007). Bias estimation and correction in a classifier using product of likelihood-gaussians. In ICASSP, Hawaii, USA (pp. 1061–1064). Nagarajan, T., & O’Shaughnessy, D. (2007). Bias estimation and correction in a classifier using product of likelihood-gaussians. In ICASSP, Hawaii, USA (pp. 1061–1064).
Zurück zum Zitat Reynolds, D. A., & Rose, R. (1995). Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Transactions on Speech and Audio Processing, 3, 72–83. CrossRef Reynolds, D. A., & Rose, R. (1995). Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Transactions on Speech and Audio Processing, 3, 72–83. CrossRef
Zurück zum Zitat Zissman, M. A. (1996). Comparison of four approaches to automatic language identification of telephone speech. IEEE Transactions on Speech and Audio Processing, 4(1), 31–44. CrossRef Zissman, M. A. (1996). Comparison of four approaches to automatic language identification of telephone speech. IEEE Transactions on Speech and Audio Processing, 4(1), 31–44. CrossRef
Metadaten
Titel
Improving the performance of speaker and language identification tasks using unique characteristics of a class
verfasst von
B. Bharathi
C. Arun Kumar
T. Nagarajan
Publikationsdatum
01.03.2013
Verlag
Springer US
Erschienen in
International Journal of Speech Technology / Ausgabe 1/2013
Print ISSN: 1381-2416
Elektronische ISSN: 1572-8110
DOI
https://doi.org/10.1007/s10772-012-9167-z

Weitere Artikel der Ausgabe 1/2013

International Journal of Speech Technology 1/2013 Zur Ausgabe

Neuer Inhalt