Skip to main content

2018 | OriginalPaper | Buchkapitel

Modelling Speaker Variability Using Covariance Learning

verfasst von : Moses Ekpenyong, Imeh Umoren

Erschienen in: Artificial Intelligence and Soft Computing

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In this contribution, we investigate the relationship between speakers and speech utterance, and propose a speaker normalization/adaptation model that incorporates correlation amongst the utterance classes produced by male and female speakers of varying age categories (children: 0–15; youths: 16–30; adults: 31–50; seniors: \({>}50\)). Using Principal Component Analysis (PCA), a speaker space was constructed, and based on the speaker covariance matrix obtained directly from the speech data signals, a visualisation of the first three principal components (PCs) was achieved. For effective covariance learning, a component-wise normalisation of each vector weights of the covariance matrix was performed, and a machine learning algorithm (the SOM: self organising map) implemented to model selected speaker features (F0, intensity, pulse) variability. Results obtained reveal that, for the features selected, F0 gave the most variance, as both genders exhibited high variability. For male speakers, PC1 captured the most variance of 87%, while PC2 and PC3 captured the least variances of 7% and 3%, respectively. For female speakers, PC1 captured the most variance of 97%, while PC2 and PC3 captured the least variances of 2% and 1%, respectively. Further, intensity and pulse features show close similarity patterns between the speech features, and are not most relevant for speaker variability modelling. Component planes visualisation of the respective speech patterns learned from the features covariance revealed consistent patterns, and hence, useful in speaker recognition systems.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Kajarekar, S.S.: Analysis of variability in speech with applications to speech and speaker recognition. Ph.D. thesis, Oregon Health and Science University, Oregon (2002) Kajarekar, S.S.: Analysis of variability in speech with applications to speech and speaker recognition. Ph.D. thesis, Oregon Health and Science University, Oregon (2002)
2.
Zurück zum Zitat Chen, T., Huang, C., Chang, E., Wang, J.: On the use of Gaussian mixture model for speaker variability analysis. In: 17th International Conference on Spoken Language Processing, Denver, Colorado, USA, pp. 1–4 (2002) Chen, T., Huang, C., Chang, E., Wang, J.: On the use of Gaussian mixture model for speaker variability analysis. In: 17th International Conference on Spoken Language Processing, Denver, Colorado, USA, pp. 1–4 (2002)
3.
Zurück zum Zitat Huang, C., Chen, T., Li, S., Chang, E., Zhou, J.: Analysis of speaker variability. In: 7th European Conference on Speech Communication and Technology, Scandinavia, pp. 1–4 (2001) Huang, C., Chen, T., Li, S., Chang, E., Zhou, J.: Analysis of speaker variability. In: 7th European Conference on Speech Communication and Technology, Scandinavia, pp. 1–4 (2001)
4.
Zurück zum Zitat Kohonen, T.: MATLAB Implementations and Applications of the Self-organizing Map. Unigrafia Oy, Helsinki (2014) Kohonen, T.: MATLAB Implementations and Applications of the Self-organizing Map. Unigrafia Oy, Helsinki (2014)
5.
Zurück zum Zitat Zehraoui, F., Bennani, Y.: M-SOM: matricial self organizing map for sequence clustering and classification. In: Proceedings of IEEE International Joint Conference on Neural Networks, Budapest, Hungary, vol. 1, pp. 763–768 (2004) Zehraoui, F., Bennani, Y.: M-SOM: matricial self organizing map for sequence clustering and classification. In: Proceedings of IEEE International Joint Conference on Neural Networks, Budapest, Hungary, vol. 1, pp. 763–768 (2004)
6.
Zurück zum Zitat Le Cun, Y., Kanter, I., Solla, S.A.: Eigenvalues of covariance matrices: application to neural-network learning. Phys. Rev. Lett. 66(18), 2396 (1991)CrossRef Le Cun, Y., Kanter, I., Solla, S.A.: Eigenvalues of covariance matrices: application to neural-network learning. Phys. Rev. Lett. 66(18), 2396 (1991)CrossRef
7.
Zurück zum Zitat Park, S., Mun, S., Lee, Y., Ko, H.: Acoustic scene classification based on convolution neural network using double image features. In: Proceedings of Detection and Classification of Acoustic Scenes and Events Workshop, Munich, Germany, pp. 1–5 (2017) Park, S., Mun, S., Lee, Y., Ko, H.: Acoustic scene classification based on convolution neural network using double image features. In: Proceedings of Detection and Classification of Acoustic Scenes and Events Workshop, Munich, Germany, pp. 1–5 (2017)
8.
Zurück zum Zitat Vesanto, J., Himberg, J., Alhoniemi, E., Parhankangas, J.: Self-organizing map in MATLAB: the SOM Toolbox. In: Proceedings of MATLAB DSP Conference, Espoo, Finland (1999) Vesanto, J., Himberg, J., Alhoniemi, E., Parhankangas, J.: Self-organizing map in MATLAB: the SOM Toolbox. In: Proceedings of MATLAB DSP Conference, Espoo, Finland (1999)
Metadaten
Titel
Modelling Speaker Variability Using Covariance Learning
verfasst von
Moses Ekpenyong
Imeh Umoren
Copyright-Jahr
2018
DOI
https://doi.org/10.1007/978-3-319-91253-0_4