Skip to main content
Top

2016 | OriginalPaper | Chapter

Objective Comparison of Four GMM-Based Methods for PMA-to-Speech Conversion

Authors : Daniel Erro, Inma Hernaez, Luis Serrano, Ibon Saratxaga, Eva Navas

Published in: Advances in Speech and Language Technologies for Iberian Languages

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

In silent speech interfaces a mapping is established between biosignals captured by sensors and acoustic characteristics of speech. Recent works have shown the feasibility of a silent interface based on permanent magnet-articulography (PMA). This paper studies the performance of four different mapping methods based on Gaussian mixture models (GMMs), typical from the voice conversion field, when applied to PMA-to-spectrum conversion. The results show the superiority of methods based on maximum likelihood parameter generation (MLPG), especially when the parameters of the mapping function are trained by minimizing the generation error. Informal listening tests reveal that the resulting speech is moderately intelligible for the database under study.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Qi, Y., Weinberg, B., Bi, N.: Enhancement of female esophageal and tracheoesophageal speech. J. Acoust. Soc. Am. 98, 2461–2465 (1995)CrossRef Qi, Y., Weinberg, B., Bi, N.: Enhancement of female esophageal and tracheoesophageal speech. J. Acoust. Soc. Am. 98, 2461–2465 (1995)CrossRef
2.
go back to reference Matsui, K., Hara, N.: Enhancement of esophageal speech using formant synthesis. In: Proceedings of the ICASSP, pp. 81–84 (1999) Matsui, K., Hara, N.: Enhancement of esophageal speech using formant synthesis. In: Proceedings of the ICASSP, pp. 81–84 (1999)
3.
go back to reference del Pozo, A., Young, S.J.: Continuous tracheoesophageal speech repair. In: Proceedings of the EUSIPCO, pp. 1–5 (2006) del Pozo, A., Young, S.J.: Continuous tracheoesophageal speech repair. In: Proceedings of the EUSIPCO, pp. 1–5 (2006)
4.
5.
go back to reference Mantilla-Caeiros, A., Nakano-Miyatake, M., Perez-Meana, H.: A pattern recognition based esophageal speech enhancement system. J. Appl. Res. Tech. 8(1), 56–71 (2010) Mantilla-Caeiros, A., Nakano-Miyatake, M., Perez-Meana, H.: A pattern recognition based esophageal speech enhancement system. J. Appl. Res. Tech. 8(1), 56–71 (2010)
6.
go back to reference Doi, H., Nakamura, K., Toda, T., Saruwatari, H., Shikano, K.: Esophageal speech enhancement based on statistical voice conversion with Gaussian mixture models. IEICE Trans. Inf. Syst. E93–D(9), 2472–2482 (2010)CrossRef Doi, H., Nakamura, K., Toda, T., Saruwatari, H., Shikano, K.: Esophageal speech enhancement based on statistical voice conversion with Gaussian mixture models. IEICE Trans. Inf. Syst. E93–D(9), 2472–2482 (2010)CrossRef
7.
go back to reference Nakamura, K., Toda, T., Saruwatari, H., Shikano, K.: Speaking-aid systems using GMM-based voice conversion for electrolaryngeal speech. Speech Commun. 54(1), 134–146 (2012)CrossRef Nakamura, K., Toda, T., Saruwatari, H., Shikano, K.: Speaking-aid systems using GMM-based voice conversion for electrolaryngeal speech. Speech Commun. 54(1), 134–146 (2012)CrossRef
8.
go back to reference Doi, H., Toda, T., Nakamura, K., Saruwatari, H., Shikano, K.: Alaryngeal speech enhancement based on one-to-many eigenvoice conversion. IEEE/ACM Trans. Audio Speech Lang. Process. 22(1), 172–183 (2014)CrossRef Doi, H., Toda, T., Nakamura, K., Saruwatari, H., Shikano, K.: Alaryngeal speech enhancement based on one-to-many eigenvoice conversion. IEEE/ACM Trans. Audio Speech Lang. Process. 22(1), 172–183 (2014)CrossRef
9.
go back to reference Kello, C.T., Plaut, D.C.: A neural network model of the articulatoryacoustic forward mapping trained on recordings of articulatory parameters. J. Acoust. Soc. Am. 116(4), 2354–2364 (2004)CrossRef Kello, C.T., Plaut, D.C.: A neural network model of the articulatoryacoustic forward mapping trained on recordings of articulatory parameters. J. Acoust. Soc. Am. 116(4), 2354–2364 (2004)CrossRef
10.
go back to reference Toda, T., Black, A.W., Tokuda, K.: Statistical mapping between articulatory movements and acoustic spectrum using a Gaussian mixture model. Speech Commun. 50(3), 215–227 (2008)CrossRef Toda, T., Black, A.W., Tokuda, K.: Statistical mapping between articulatory movements and acoustic spectrum using a Gaussian mixture model. Speech Commun. 50(3), 215–227 (2008)CrossRef
11.
go back to reference Denby, B., Schultz, T., Honda, K., Hueber, T., Gilbert, J.M., Brumberg, J.S.: Silent speech interfaces. Speech Commun. 52(4), 270–287 (2010)CrossRef Denby, B., Schultz, T., Honda, K., Hueber, T., Gilbert, J.M., Brumberg, J.S.: Silent speech interfaces. Speech Commun. 52(4), 270–287 (2010)CrossRef
12.
go back to reference Hofe, R., Ell, S.R., Fagan, M.J., Gilbert, J.M., Green, P.D., Moore, R.K., Rybchenko, S.I.: Speech synthesis parameter generation for the assistive silent speech interface MVOCA. In: Proceedings of the INTERSPEECH, pp. 3009–3012 (2011) Hofe, R., Ell, S.R., Fagan, M.J., Gilbert, J.M., Green, P.D., Moore, R.K., Rybchenko, S.I.: Speech synthesis parameter generation for the assistive silent speech interface MVOCA. In: Proceedings of the INTERSPEECH, pp. 3009–3012 (2011)
13.
go back to reference Cheah, L.A., Bai, J., Gonzalez, J.A., Ell, S.R., Gilbert, J.M., Moore, R.K., Green, P.D.: A user-centric design of permanent magnetic articulography based assistive speech technology. In: Proceedings of the BioSignals, pp. 109–116 (2015) Cheah, L.A., Bai, J., Gonzalez, J.A., Ell, S.R., Gilbert, J.M., Moore, R.K., Green, P.D.: A user-centric design of permanent magnetic articulography based assistive speech technology. In: Proceedings of the BioSignals, pp. 109–116 (2015)
14.
go back to reference Gonzalez, J.A., Cheah, L.A., Gilbert, J.M., Bai, J., Ell, S.R., Green, P.D., Moore, R.K.: A silent speech system based on permanent magnet articulography and direct synthesis. Comput. Speech Lang. 39, 67–87 (2016)CrossRef Gonzalez, J.A., Cheah, L.A., Gilbert, J.M., Bai, J., Ell, S.R., Green, P.D., Moore, R.K.: A silent speech system based on permanent magnet articulography and direct synthesis. Comput. Speech Lang. 39, 67–87 (2016)CrossRef
15.
go back to reference Kain, A., Macon, M.W.: Spectral voice conversion for text-to-speech synthesis. In: Proceedings of the ICASSP, pp. 285–288 (1998) Kain, A., Macon, M.W.: Spectral voice conversion for text-to-speech synthesis. In: Proceedings of the ICASSP, pp. 285–288 (1998)
16.
go back to reference Stylianou, Y., Cappé, O., Moulines, E.: Continuous probabilistic transform for voice conversion. IEEE Trans. Speech Audio Process. 6(2), 131–142 (1998)CrossRef Stylianou, Y., Cappé, O., Moulines, E.: Continuous probabilistic transform for voice conversion. IEEE Trans. Speech Audio Process. 6(2), 131–142 (1998)CrossRef
17.
go back to reference Ye, H., Young, S.J.: Quality-enhanced voice morphing using maximum likelihood transformations. IEEE Trans. Audio Speech Lang. Process. 14(4), 1301–1312 (2006)CrossRef Ye, H., Young, S.J.: Quality-enhanced voice morphing using maximum likelihood transformations. IEEE Trans. Audio Speech Lang. Process. 14(4), 1301–1312 (2006)CrossRef
18.
go back to reference Toda, T., Black, A., Tokuda, K.: Voice conversion based on maximum-likelihood estimation of spectral parameter trajectory. IEEE Trans. Audio Speech Lang. Process. 15(8), 2222–2235 (2007)CrossRef Toda, T., Black, A., Tokuda, K.: Voice conversion based on maximum-likelihood estimation of spectral parameter trajectory. IEEE Trans. Audio Speech Lang. Process. 15(8), 2222–2235 (2007)CrossRef
19.
go back to reference Erro, D., Alonso, A., Serrano, L., Tavarez, D., Odriozola, I., Sarasola, X., Del-Blanco, E., Sanchez, J., Saratxaga, I., Navas, E., Hernaez, I.: ML parameter generation with a reformulated MGE training criterion participation in the voice conversion challenge 2016. In: Proceedings of the INTERSPEECH (2016) Erro, D., Alonso, A., Serrano, L., Tavarez, D., Odriozola, I., Sarasola, X., Del-Blanco, E., Sanchez, J., Saratxaga, I., Navas, E., Hernaez, I.: ML parameter generation with a reformulated MGE training criterion participation in the voice conversion challenge 2016. In: Proceedings of the INTERSPEECH (2016)
20.
go back to reference Kominek, J., Black, A.W.: The CMU arctic speech databases. In: Proceedings of the 5th ISCA Speech Synthesis Workshop, pp. 223–224 (2004) Kominek, J., Black, A.W.: The CMU arctic speech databases. In: Proceedings of the 5th ISCA Speech Synthesis Workshop, pp. 223–224 (2004)
21.
go back to reference Erro, D., Sainz, I., Navas, E., Hernáez, I.: Harmonics plus noise model based vocoder for statistical parametric speech synthesis. IEEE J. Sel. Top. Sig. Process. 8(2), 184–194 (2014)CrossRef Erro, D., Sainz, I., Navas, E., Hernáez, I.: Harmonics plus noise model based vocoder for statistical parametric speech synthesis. IEEE J. Sel. Top. Sig. Process. 8(2), 184–194 (2014)CrossRef
22.
go back to reference Tokuda, K., Masuko, T., Miyazaki, N., Kobayashi, T.: Multi-space probability distribution HMM. IEICE Trans. Inf. Syst. E85–D(3), 455–464 (2002) Tokuda, K., Masuko, T., Miyazaki, N., Kobayashi, T.: Multi-space probability distribution HMM. IEICE Trans. Inf. Syst. E85–D(3), 455–464 (2002)
Metadata
Title
Objective Comparison of Four GMM-Based Methods for PMA-to-Speech Conversion
Authors
Daniel Erro
Inma Hernaez
Luis Serrano
Ibon Saratxaga
Eva Navas
Copyright Year
2016
DOI
https://doi.org/10.1007/978-3-319-49169-1_3

Premium Partner