nach oben

Erschienen in:

2018 | OriginalPaper | Buchkapitel

Modelling Speaker Variability Using Covariance Learning

verfasst von : Moses Ekpenyong, Imeh Umoren

Erschienen in: Artificial Intelligence and Soft Computing

Verlag: Springer International Publishing

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

In this contribution, we investigate the relationship between speakers and speech utterance, and propose a speaker normalization/adaptation model that incorporates correlation amongst the utterance classes produced by male and female speakers of varying age categories (children: 0–15; youths: 16–30; adults: 31–50; seniors: \({>}50\)). Using Principal Component Analysis (PCA), a speaker space was constructed, and based on the speaker covariance matrix obtained directly from the speech data signals, a visualisation of the first three principal components (PCs) was achieved. For effective covariance learning, a component-wise normalisation of each vector weights of the covariance matrix was performed, and a machine learning algorithm (the SOM: self organising map) implemented to model selected speaker features (F0, intensity, pulse) variability. Results obtained reveal that, for the features selected, F0 gave the most variance, as both genders exhibited high variability. For male speakers, PC1 captured the most variance of 87%, while PC2 and PC3 captured the least variances of 7% and 3%, respectively. For female speakers, PC1 captured the most variance of 97%, while PC2 and PC3 captured the least variances of 2% and 1%, respectively. Further, intensity and pulse features show close similarity patterns between the speech features, and are not most relevant for speaker variability modelling. Component planes visualisation of the respective speech patterns learned from the features covariance revealed consistent patterns, and hence, useful in speaker recognition systems.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel On the Global Convergence of the Parzen-Based Generalized Regression Neural Networks Applied to Streaming Data

Nächstes Kapitel A Neural Network Model with Bidirectional Whitening

Kajarekar, S.S.: Analysis of variability in speech with applications to speech and speaker recognition. Ph.D. thesis, Oregon Health and Science University, Oregon (2002)

Chen, T., Huang, C., Chang, E., Wang, J.: On the use of Gaussian mixture model for speaker variability analysis. In: 17th International Conference on Spoken Language Processing, Denver, Colorado, USA, pp. 1–4 (2002)

Huang, C., Chen, T., Li, S., Chang, E., Zhou, J.: Analysis of speaker variability. In: 7th European Conference on Speech Communication and Technology, Scandinavia, pp. 1–4 (2001)

Kohonen, T.: MATLAB Implementations and Applications of the Self-organizing Map. Unigrafia Oy, Helsinki (2014)

Zehraoui, F., Bennani, Y.: M-SOM: matricial self organizing map for sequence clustering and classification. In: Proceedings of IEEE International Joint Conference on Neural Networks, Budapest, Hungary, vol. 1, pp. 763–768 (2004)

Le Cun, Y., Kanter, I., Solla, S.A.: Eigenvalues of covariance matrices: application to neural-network learning. Phys. Rev. Lett. 66(18), 2396 (1991)CrossRef

Park, S., Mun, S., Lee, Y., Ko, H.: Acoustic scene classification based on convolution neural network using double image features. In: Proceedings of Detection and Classification of Acoustic Scenes and Events Workshop, Munich, Germany, pp. 1–5 (2017)

Vesanto, J., Himberg, J., Alhoniemi, E., Parhankangas, J.: Self-organizing map in MATLAB: the SOM Toolbox. In: Proceedings of MATLAB DSP Conference, Espoo, Finland (1999)

Titel: Modelling Speaker Variability Using Covariance Learning
verfasst von: Moses Ekpenyong
Imeh Umoren
Verlag: Springer International Publishing
Buch: Artificial Intelligence and Soft Computing
Print ISBN: 978-3-319-91252-3

Electronic ISBN: 978-3-319-91253-0

Copyright-Jahr: 2018
DOI: https://doi.org/10.1007/978-3-319-91253-0_4

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"