Top

Published in:

2016 | OriginalPaper | Chapter

Objective Comparison of Four GMM-Based Methods for PMA-to-Speech Conversion

Authors : Daniel Erro, Inma Hernaez, Luis Serrano, Ibon Saratxaga, Eva Navas

Published in: Advances in Speech and Language Technologies for Iberian Languages

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

In silent speech interfaces a mapping is established between biosignals captured by sensors and acoustic characteristics of speech. Recent works have shown the feasibility of a silent interface based on permanent magnet-articulography (PMA). This paper studies the performance of four different mapping methods based on Gaussian mixture models (GMMs), typical from the voice conversion field, when applied to PMA-to-spectrum conversion. The results show the superiority of methods based on maximum likelihood parameter generation (MLPG), especially when the parameters of the mapping function are trained by minimizing the generation error. Informal listening tests reveal that the resulting speech is moderately intelligible for the database under study.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

previous chapter A Dynamic FEC for Improved Robustness of CELP-Based Codec

next chapter Adding Singing Capabilities to Unit Selection TTS Through HNM-Based Conversion

Qi, Y., Weinberg, B., Bi, N.: Enhancement of female esophageal and tracheoesophageal speech. J. Acoust. Soc. Am. 98, 2461–2465 (1995)CrossRef

Matsui, K., Hara, N.: Enhancement of esophageal speech using formant synthesis. In: Proceedings of the ICASSP, pp. 81–84 (1999)

del Pozo, A., Young, S.J.: Continuous tracheoesophageal speech repair. In: Proceedings of the EUSIPCO, pp. 1–5 (2006)

Türkmen, H.I., Karsligil, M.E.: Reconstruction of dysphonic speech by MELP. In: Ruiz-Shulcloper, J., Kropatsch, W.G. (eds.) CIARP 2008. LNCS, vol. 5197, pp. 767–774. Springer, Heidelberg (2008). doi:10.1007/978-3-540-85920-8_93 CrossRef

Mantilla-Caeiros, A., Nakano-Miyatake, M., Perez-Meana, H.: A pattern recognition based esophageal speech enhancement system. J. Appl. Res. Tech. 8(1), 56–71 (2010)

Doi, H., Nakamura, K., Toda, T., Saruwatari, H., Shikano, K.: Esophageal speech enhancement based on statistical voice conversion with Gaussian mixture models. IEICE Trans. Inf. Syst. E93–D(9), 2472–2482 (2010)CrossRef

Nakamura, K., Toda, T., Saruwatari, H., Shikano, K.: Speaking-aid systems using GMM-based voice conversion for electrolaryngeal speech. Speech Commun. 54(1), 134–146 (2012)CrossRef

Doi, H., Toda, T., Nakamura, K., Saruwatari, H., Shikano, K.: Alaryngeal speech enhancement based on one-to-many eigenvoice conversion. IEEE/ACM Trans. Audio Speech Lang. Process. 22(1), 172–183 (2014)CrossRef

Kello, C.T., Plaut, D.C.: A neural network model of the articulatoryacoustic forward mapping trained on recordings of articulatory parameters. J. Acoust. Soc. Am. 116(4), 2354–2364 (2004)CrossRef

10.

Toda, T., Black, A.W., Tokuda, K.: Statistical mapping between articulatory movements and acoustic spectrum using a Gaussian mixture model. Speech Commun. 50(3), 215–227 (2008)CrossRef

11.

Denby, B., Schultz, T., Honda, K., Hueber, T., Gilbert, J.M., Brumberg, J.S.: Silent speech interfaces. Speech Commun. 52(4), 270–287 (2010)CrossRef

12.

Hofe, R., Ell, S.R., Fagan, M.J., Gilbert, J.M., Green, P.D., Moore, R.K., Rybchenko, S.I.: Speech synthesis parameter generation for the assistive silent speech interface MVOCA. In: Proceedings of the INTERSPEECH, pp. 3009–3012 (2011)

13.

Cheah, L.A., Bai, J., Gonzalez, J.A., Ell, S.R., Gilbert, J.M., Moore, R.K., Green, P.D.: A user-centric design of permanent magnetic articulography based assistive speech technology. In: Proceedings of the BioSignals, pp. 109–116 (2015)

14.

Gonzalez, J.A., Cheah, L.A., Gilbert, J.M., Bai, J., Ell, S.R., Green, P.D., Moore, R.K.: A silent speech system based on permanent magnet articulography and direct synthesis. Comput. Speech Lang. 39, 67–87 (2016)CrossRef

15.

Kain, A., Macon, M.W.: Spectral voice conversion for text-to-speech synthesis. In: Proceedings of the ICASSP, pp. 285–288 (1998)

16.

Stylianou, Y., Cappé, O., Moulines, E.: Continuous probabilistic transform for voice conversion. IEEE Trans. Speech Audio Process. 6(2), 131–142 (1998)CrossRef

17.

Ye, H., Young, S.J.: Quality-enhanced voice morphing using maximum likelihood transformations. IEEE Trans. Audio Speech Lang. Process. 14(4), 1301–1312 (2006)CrossRef

18.

Toda, T., Black, A., Tokuda, K.: Voice conversion based on maximum-likelihood estimation of spectral parameter trajectory. IEEE Trans. Audio Speech Lang. Process. 15(8), 2222–2235 (2007)CrossRef

19.

Erro, D., Alonso, A., Serrano, L., Tavarez, D., Odriozola, I., Sarasola, X., Del-Blanco, E., Sanchez, J., Saratxaga, I., Navas, E., Hernaez, I.: ML parameter generation with a reformulated MGE training criterion participation in the voice conversion challenge 2016. In: Proceedings of the INTERSPEECH (2016)

20.

Kominek, J., Black, A.W.: The CMU arctic speech databases. In: Proceedings of the 5th ISCA Speech Synthesis Workshop, pp. 223–224 (2004)

21.

Erro, D., Sainz, I., Navas, E., Hernáez, I.: Harmonics plus noise model based vocoder for statistical parametric speech synthesis. IEEE J. Sel. Top. Sig. Process. 8(2), 184–194 (2014)CrossRef

22.

Tokuda, K., Masuko, T., Miyazaki, N., Kobayashi, T.: Multi-space probability distribution HMM. IEICE Trans. Inf. Syst. E85–D(3), 455–464 (2002)

Title: Objective Comparison of Four GMM-Based Methods for PMA-to-Speech Conversion
Authors: Daniel Erro
Inma Hernaez
Luis Serrano
Ibon Saratxaga
Eva Navas
Publisher: Springer International Publishing
Book: Advances in Speech and Language Technologies for Iberian Languages
Print ISBN: 978-3-319-49168-4

Electronic ISBN: 978-3-319-49169-1

Copyright Year: 2016
DOI: https://doi.org/10.1007/978-3-319-49169-1_3

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner