Skip to main content
Top

2016 | OriginalPaper | Chapter

Prosodic Features Based Text-dependent Speaker Recognition with Short Utterance

Authors : Jianwu Zhang, Jianchao He, Zhendong Wu, Ping Li

Published in: Computational Intelligence and Intelligent Systems

Publisher: Springer Singapore

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Over the past several years, Gaussian mixtures models have been the dominant approach for modeling in text-independent speaker recognition field. But the recognition accuracy for these models declines when utterances’ length becomes short. Presently Mel-frequency cepstral coefficients are generally used to characterize the properties of the vocal tract and widely applied in speech recognition. In addition, prosodic features, such as pitch and formant, are generally considered to describe the glottal characteristics. However, the efficiency of those approaches remain unsatisfactory. In text-dependent short utterances speaker verification systems, prosodic features can assist to improve the recognition result theoretically. In order to optimize the performance of speaker verification systems under the framework of adapted GMM-UBM, we adopt a variant speaker verification system based on prosodic features, in which a dual-judgment-mechanism is used in order to integrate vocal tract features with prosodic features. Experimental results showed that the new speech recognition system led a better consequence.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Jin, L., Xiaofeng, C., Mingqiang, L., et al.: Secure deduplication with efficient and reliable convergent key management. IEEE Trans. Parallel Distrib. Syst. 25(6), 1615–1625 (2014)CrossRef Jin, L., Xiaofeng, C., Mingqiang, L., et al.: Secure deduplication with efficient and reliable convergent key management. IEEE Trans. Parallel Distrib. Syst. 25(6), 1615–1625 (2014)CrossRef
2.
go back to reference Jin, L., Yatkit, L., Xiaofeng, C., et al.: A hybrid cloud approach for secure authorized deduplication. IEEE Trans. Parallel Distrib. Syst. 26(5), 1206–1216 (2015)CrossRef Jin, L., Yatkit, L., Xiaofeng, C., et al.: A hybrid cloud approach for secure authorized deduplication. IEEE Trans. Parallel Distrib. Syst. 26(5), 1206–1216 (2015)CrossRef
3.
go back to reference Zhendong, W., Bin, L., et al.: High dimension space projection-based biometric encryption for fingerprint with fuzzy minutia. Soft Comput. (2015, in Press). doi:10.1007/s00500-015-1778-2 Zhendong, W., Bin, L., et al.: High dimension space projection-based biometric encryption for fingerprint with fuzzy minutia. Soft Comput. (2015, in Press). doi:10.​1007/​s00500-015-1778-2
4.
go back to reference Campbell, J.P.: Speaker recognition: a tutorial. Proc. IEEE 85, 1437–1462 (1997)CrossRef Campbell, J.P.: Speaker recognition: a tutorial. Proc. IEEE 85, 1437–1462 (1997)CrossRef
5.
go back to reference Reynolds, D.A., Quatieri, T., Dunn, R.: Speaker verification using adapted gaussian mixture models. Digital Signal Process. 10, 19–41 (2000)CrossRef Reynolds, D.A., Quatieri, T., Dunn, R.: Speaker verification using adapted gaussian mixture models. Digital Signal Process. 10, 19–41 (2000)CrossRef
6.
go back to reference Reynolds, D.A.: Channel robust speaker verification via feature mapping. In: ICASSP, pp. 53–56 (2003) Reynolds, D.A.: Channel robust speaker verification via feature mapping. In: ICASSP, pp. 53–56 (2003)
7.
go back to reference Vogt, R., Sridharan, S., Michael, M.: Making confident speaker verification decisions with minimal speech. IEEE Trans. ASLP 18(6), 1182–1192 (2010) Vogt, R., Sridharan, S., Michael, M.: Making confident speaker verification decisions with minimal speech. IEEE Trans. ASLP 18(6), 1182–1192 (2010)
8.
go back to reference Kenny, P., Boulianne, G., Dumouchel, P.: Eigenvoice modeling with sparse training data. IEEE Trans. Speech Audio Process. 13(3), 345–354 (2005)CrossRef Kenny, P., Boulianne, G., Dumouchel, P.: Eigenvoice modeling with sparse training data. IEEE Trans. Speech Audio Process. 13(3), 345–354 (2005)CrossRef
9.
go back to reference Dehak, N., Dehak, R., Glass, J., Reynolds, D., Kenny, P.: Cosine similarity scoring without score normalization techniques. In: Proceedings of Odyssey 2010 - The Speaker and Language Recognition Workshop (2010) Dehak, N., Dehak, R., Glass, J., Reynolds, D., Kenny, P.: Cosine similarity scoring without score normalization techniques. In: Proceedings of Odyssey 2010 - The Speaker and Language Recognition Workshop (2010)
10.
go back to reference Nosratighods, M., Ambikairajah, E., Epps, J., Carey, M.J.: A segment selection technique for speaker verification. Speech Commun. 52(9), 753–761 (2010)CrossRef Nosratighods, M., Ambikairajah, E., Epps, J., Carey, M.J.: A segment selection technique for speaker verification. Speech Commun. 52(9), 753–761 (2010)CrossRef
11.
go back to reference Fattah, M.A.: Phoneme based speaker modeling to improve speaker recognition. Information 9(1), 135–147 (2010) Fattah, M.A.: Phoneme based speaker modeling to improve speaker recognition. Information 9(1), 135–147 (2010)
12.
go back to reference Davis, S.B., Mermelstein, P.: Comparison of parametric representation for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. ASLP 28(4), 357–366 (1980) Davis, S.B., Mermelstein, P.: Comparison of parametric representation for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. ASLP 28(4), 357–366 (1980)
13.
go back to reference Chow, D., Abdulla, W.H.: Robust speaker identification based perceptual log area ratio and Gaussian mixture models. In: INTERSPEECH (2004) Chow, D., Abdulla, W.H.: Robust speaker identification based perceptual log area ratio and Gaussian mixture models. In: INTERSPEECH (2004)
14.
go back to reference Matthieu, H.: Text-Dependent Speaker Recognition. Springer, Heidelberg (2008) Matthieu, H.: Text-Dependent Speaker Recognition. Springer, Heidelberg (2008)
15.
go back to reference Vogt, R.J., Lustri, C.J., Sridharan, S.: Factor analysis modelling for speaker verification with short utterances. In: Odyssey Speaker and Language Recognition Workshop. IEEE (2008) Vogt, R.J., Lustri, C.J., Sridharan, S.: Factor analysis modelling for speaker verification with short utterances. In: Odyssey Speaker and Language Recognition Workshop. IEEE (2008)
16.
go back to reference Vogt, R., Baker, B., Sridharan, S.: Factor analysis subspace estimation for speaker verification with short utterances. In: INTERSPEECH 2008, pp. 853–856 (2008) Vogt, R., Baker, B., Sridharan, S.: Factor analysis subspace estimation for speaker verification with short utterances. In: INTERSPEECH 2008, pp. 853–856 (2008)
17.
go back to reference Kanagasundaram, A., Vogt, R., Dean, D., Sridharan, S., Mason, M.: I-vector based speaker recognition on short utterances. In: Annual Conference of the International Speech Communication Association (2011) Kanagasundaram, A., Vogt, R., Dean, D., Sridharan, S., Mason, M.: I-vector based speaker recognition on short utterances. In: Annual Conference of the International Speech Communication Association (2011)
18.
go back to reference Larcher, A., Bousquet, P.M., Lee, K.A., Matrouf, D., et al.: I-vectors in the context of phonetically-constrained short utterances for speaker verification. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP (2012) Larcher, A., Bousquet, P.M., Lee, K.A., Matrouf, D., et al.: I-vectors in the context of phonetically-constrained short utterances for speaker verification. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP (2012)
19.
go back to reference Bilmes, J.A.: A gentle tutorial of the EM algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models. Int. Comput. Sci. Inst. 4, 126 (1998) Bilmes, J.A.: A gentle tutorial of the EM algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models. Int. Comput. Sci. Inst. 4, 126 (1998)
20.
go back to reference Rabiner, L., Cheng, M., Rosenberg, A.E., McGonegal, C.: A comparative performance study of several pitch detection algorithms. IEEE Trans. Acoust. Speech Signal Process. 24(5), 399–418 (1976)CrossRef Rabiner, L., Cheng, M., Rosenberg, A.E., McGonegal, C.: A comparative performance study of several pitch detection algorithms. IEEE Trans. Acoust. Speech Signal Process. 24(5), 399–418 (1976)CrossRef
21.
go back to reference Zhendong, W., Jie, Y., Jianwu, Z., Huaxin, H.: A hierarchical face recognition algorithm based on humanoid nonlinear least-squares computation. J. Ambient Intell. Humanized Comput. (2015, in Press). doi:10.1007/s12652-015-0321-8 Zhendong, W., Jie, Y., Jianwu, Z., Huaxin, H.: A hierarchical face recognition algorithm based on humanoid nonlinear least-squares computation. J. Ambient Intell. Humanized Comput. (2015, in Press). doi:10.​1007/​s12652-015-0321-8
Metadata
Title
Prosodic Features Based Text-dependent Speaker Recognition with Short Utterance
Authors
Jianwu Zhang
Jianchao He
Zhendong Wu
Ping Li
Copyright Year
2016
Publisher
Springer Singapore
DOI
https://doi.org/10.1007/978-981-10-0356-1_57

Premium Partner