Skip to main content
Top

2017 | OriginalPaper | Chapter

Quality Improvement of Vietnamese HMM-Based Speech Synthesis System Based on Decomposition of Naturalness and Intelligibility Using Non-negative Matrix Factorization

Authors : Anh-Tuan Dinh, Thanh-Son Phan, Masato Akagi

Published in: Advances in Information and Communication Technology

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Hidden Markov model (HMM)-based synthesized speech is intelligible but not natural especially under limited data condition. The goal of this study is to improve naturalness without violating acceptable intelligibility by decomposing the naturalness and intelligibility of synthesized speech using a novel asymmetric bilinear model involving non-negative matrix factorization (NMF). Subjective evaluations carried out on Vietnamese data confirmed that the achieved synthesis quality is higher than other methods under limited data condition. Since F0 contour is important for naturalness and intelligibility, especially in Vietnamese. Proposed method is capable of modifying over-smoothed F0 contour without destroying tonal information.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Zen, H., Tokuda, K., Black, W.: Statistical parametric speech synthesis. Speech Comm. 51(11), 1039–1064 (2009)CrossRef Zen, H., Tokuda, K., Black, W.: Statistical parametric speech synthesis. Speech Comm. 51(11), 1039–1064 (2009)CrossRef
2.
go back to reference Toda, T., Tokuda, K.: A speech parameter generation algorithm considering global variance for HMM-based speech synthesis. IEICE Trans. E90–D(05), 816–824 (2007)CrossRef Toda, T., Tokuda, K.: A speech parameter generation algorithm considering global variance for HMM-based speech synthesis. IEICE Trans. E90–D(05), 816–824 (2007)CrossRef
3.
go back to reference Takamichi, S., Toda, T., Black, A., Nakamura, S.: Parameter generation algorithm considering modulation spectrum for HMM-based speech synthesis. In: Proceedings of ICASSP, pp. 4210–4214 (2015) Takamichi, S., Toda, T., Black, A., Nakamura, S.: Parameter generation algorithm considering modulation spectrum for HMM-based speech synthesis. In: Proceedings of ICASSP, pp. 4210–4214 (2015)
4.
go back to reference Takamichi, S., Toda, T., Neubig, G., Nakamura, S.: A post-filter to modify the modulation spectrum in HMM-based speech synthesis. In: Proceedings of ICASSP, pp. 290–294 (2014) Takamichi, S., Toda, T., Neubig, G., Nakamura, S.: A post-filter to modify the modulation spectrum in HMM-based speech synthesis. In: Proceedings of ICASSP, pp. 290–294 (2014)
5.
go back to reference Chen, L.H., Raitio, T., Valentini-Botinhao, C., Yamagishi, J., Ling, Z.H.: DNN-based stochastic postfilter for HMM-based speech synthesis. In: Proceedings of Interspeech, pp. 1954–1958 (2014) Chen, L.H., Raitio, T., Valentini-Botinhao, C., Yamagishi, J., Ling, Z.H.: DNN-based stochastic postfilter for HMM-based speech synthesis. In: Proceedings of Interspeech, pp. 1954–1958 (2014)
6.
go back to reference Tenenbaum, J., Freeman, W.: Separating style and content with bilinear models. Neural Comput. 12, 1247–1283 (2000)CrossRef Tenenbaum, J., Freeman, W.: Separating style and content with bilinear models. Neural Comput. 12, 1247–1283 (2000)CrossRef
7.
go back to reference Popa, V., Nurminen, J., Gabbouj, M.: A novel technique for voice conversion based on style and content decomposition with bilinear models. In: Proceedings of Interspeech, pp. 2655–2658 (2009) Popa, V., Nurminen, J., Gabbouj, M.: A novel technique for voice conversion based on style and content decomposition with bilinear models. In: Proceedings of Interspeech, pp. 2655–2658 (2009)
8.
go back to reference Stylianou, Y., Cappe, O., Moulines, E.: Continuous probabilistic transform for voice conversion. IEEE Trans. Audio, Speech, Lang. Process. 6, 131–142 (1998)CrossRef Stylianou, Y., Cappe, O., Moulines, E.: Continuous probabilistic transform for voice conversion. IEEE Trans. Audio, Speech, Lang. Process. 6, 131–142 (1998)CrossRef
9.
go back to reference Tokuda, K., Masuko, T., Imai, S.: Mel-generalized cepstral analysis - a unified approach to speech spectral estimation. In: Proceedings of ICSLP, pp. 1043–1046 (1994) Tokuda, K., Masuko, T., Imai, S.: Mel-generalized cepstral analysis - a unified approach to speech spectral estimation. In: Proceedings of ICSLP, pp. 1043–1046 (1994)
10.
go back to reference Dinh-Anh, T., Morikawa, D., Akagi, M.: Study on quality improvement of HMM-based synthesized voices using asymmetric bilinear model. J. Sig. Process. 20(4), 205–208 (2016)CrossRef Dinh-Anh, T., Morikawa, D., Akagi, M.: Study on quality improvement of HMM-based synthesized voices using asymmetric bilinear model. J. Sig. Process. 20(4), 205–208 (2016)CrossRef
11.
go back to reference Vu, T.T., Luong, M.C., Nakamura, S.: An HMM-based vietnamese speech synthesis system. In: Proceedings of Oriental COCOSDA, pp. 116–121 (2009) Vu, T.T., Luong, M.C., Nakamura, S.: An HMM-based vietnamese speech synthesis system. In: Proceedings of Oriental COCOSDA, pp. 116–121 (2009)
12.
go back to reference Phan, T.S., Duong, T.C., Dinh, A.T., Vu, T.T., Luong, M.C.: Improvement of naturalness for an HMM-based Vietnamese speech synthesis using the prosodic information. In: Proceedings of RIVF, pp. 276–281 (2013) Phan, T.S., Duong, T.C., Dinh, A.T., Vu, T.T., Luong, M.C.: Improvement of naturalness for an HMM-based Vietnamese speech synthesis using the prosodic information. In: Proceedings of RIVF, pp. 276–281 (2013)
13.
go back to reference Doan, T.T.: (Vietnamese Phonetics), pp. 99–148. Hanoi National University Publishing House (1999) Doan, T.T.: https://static-content.springer.com/image/chp%3A10.1007%2F978-3-319-49073-1_53/MediaObjects/426117_1_En_53_Figb_HTML.gif (Vietnamese Phonetics), pp. 99–148. Hanoi National University Publishing House (1999)
14.
go back to reference Mai, L.C., Duc, D.N.: Design of Vietnamese speech corpus and current status. In: Proceedings of ISCSLP 2006, pp. 748–758 (2006) Mai, L.C., Duc, D.N.: Design of Vietnamese speech corpus and current status. In: Proceedings of ISCSLP 2006, pp. 748–758 (2006)
15.
go back to reference Scheffe, H.: An analysis of variance for paired comparisons. J. Am. Stat. Assoc. 37, 381–400 (1952)MathSciNetMATH Scheffe, H.: An analysis of variance for paired comparisons. J. Am. Stat. Assoc. 37, 381–400 (1952)MathSciNetMATH
16.
go back to reference Kawahara, H., Masuda-Katsue, I., de Cheveigne, M.: Restructuring speech representations using a pitch-adaptive time-frequency smoothing and a instantaneous frequency-based F0 extraction: Possible role of a repetitive structure in sounds. Speech Comm. 27, 187–207 (1999)CrossRef Kawahara, H., Masuda-Katsue, I., de Cheveigne, M.: Restructuring speech representations using a pitch-adaptive time-frequency smoothing and a instantaneous frequency-based F0 extraction: Possible role of a repetitive structure in sounds. Speech Comm. 27, 187–207 (1999)CrossRef
Metadata
Title
Quality Improvement of Vietnamese HMM-Based Speech Synthesis System Based on Decomposition of Naturalness and Intelligibility Using Non-negative Matrix Factorization
Authors
Anh-Tuan Dinh
Thanh-Son Phan
Masato Akagi
Copyright Year
2017
DOI
https://doi.org/10.1007/978-3-319-49073-1_53

Premium Partner