Skip to main content

2015 | OriginalPaper | Buchkapitel

Hybrid Source Modeling Method Utilizing Optimal Residual Frames for HMM-based Speech Synthesis

verfasst von : N. P. Narendra, K. Sreenivasa Rao

Erschienen in: Mining Intelligence and Knowledge Exploration

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

This paper proposes a new hybrid source modeling method for improving the quality of HMM-based speech synthesis. The proposed method is an extension of recently proposed source model based on optimal residual frame [1]. The source or excitation signal is first decomposed into a number of pitch-synchronous residual frames. Unique variations are observed in the pitch-synchronous residual frames present at the beginning, middle and end regions of excitation signal of a phone. Based on the observation, one optimal residual frame is extracted from each of the beginning, middle and end regions of excitation signal of a phone. The optimal residual frames extracted from every region of excitation signal are separately grouped in the form of decision tree. During synthesis, for every phone, three optimal residual frames are selected from three decision trees based on target and concatenation costs. Using three optimal residual frames, the excitation signal of a phone is constructed. The proposed hybrid source model is used for synthesizing speech under HTS framework. Subjective evaluation results indicate that the proposed source model is better the two existing source modeling methods.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Narendra, N.P., Rao, K.S.: Optimal residual frame based source modeling for HMM-based speech synthesis. In: Proceedings of the International Conference on Advances in Pattern Recognition (ICAPR), pp. 1–5 (2015) Narendra, N.P., Rao, K.S.: Optimal residual frame based source modeling for HMM-based speech synthesis. In: Proceedings of the International Conference on Advances in Pattern Recognition (ICAPR), pp. 1–5 (2015)
2.
Zurück zum Zitat Yoshimura, T., Tokuda, K., Masuko, T., Kobayashi, T., Kitamura, T.: Mixed-excitation for HMM-based speech synthesis. In: Proceedings of the Eurospeech, pp. 2259–2262 (2001) Yoshimura, T., Tokuda, K., Masuko, T., Kobayashi, T., Kitamura, T.: Mixed-excitation for HMM-based speech synthesis. In: Proceedings of the Eurospeech, pp. 2259–2262 (2001)
3.
Zurück zum Zitat Zen, H., Toda, T., Nakamura, M., Tokuda, K.: Details of Nitech HMM-based speech synthesis system for the Blizzard Challenge 2005. In: IEICE Transactions on Information and Systems, vol. E90-D, pp. 325–333 (2007) Zen, H., Toda, T., Nakamura, M., Tokuda, K.: Details of Nitech HMM-based speech synthesis system for the Blizzard Challenge 2005. In: IEICE Transactions on Information and Systems, vol. E90-D, pp. 325–333 (2007)
4.
Zurück zum Zitat Kawahara, H., Masuda-Katsuse, I., de Cheveigne, A.: Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: possible role of a repetitive structure in sounds. Speech Commun. 27, 187–207 (1999)CrossRef Kawahara, H., Masuda-Katsuse, I., de Cheveigne, A.: Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: possible role of a repetitive structure in sounds. Speech Commun. 27, 187–207 (1999)CrossRef
5.
Zurück zum Zitat Maia, R., Toda, T., Zen, H., Nankaku, Y., Tokuda, K.: An excitation model for HMM-based speech synthesis based on residual modeling. In: Proceedings of the Speech Synthesis Workshop 6 (ISCA SW6) (2007) Maia, R., Toda, T., Zen, H., Nankaku, Y., Tokuda, K.: An excitation model for HMM-based speech synthesis based on residual modeling. In: Proceedings of the Speech Synthesis Workshop 6 (ISCA SW6) (2007)
6.
Zurück zum Zitat Raitio, T., Suni, A., Yamagishi, J., Pulakka, H., Nurminen, J., Vainio, M., Alku, P.: HMM-based speech synthesis utilizing glottal inverse filtering. IEEE Trans. Audio, Speech, Lang. Process. 19(1), 153–165 (2011)CrossRef Raitio, T., Suni, A., Yamagishi, J., Pulakka, H., Nurminen, J., Vainio, M., Alku, P.: HMM-based speech synthesis utilizing glottal inverse filtering. IEEE Trans. Audio, Speech, Lang. Process. 19(1), 153–165 (2011)CrossRef
7.
Zurück zum Zitat Drugman, T., Moinet, A., Dutoit, T., Wilfart, G.: Using a pitch-synchrounous residual codebook for hybrid HMM/frame selection speech synthesis. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing, (ICASSP), pp. 3793–3796 (2009) Drugman, T., Moinet, A., Dutoit, T., Wilfart, G.: Using a pitch-synchrounous residual codebook for hybrid HMM/frame selection speech synthesis. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing, (ICASSP), pp. 3793–3796 (2009)
8.
Zurück zum Zitat Raitio, T., Suni, A., Pulakka, H., Vainio, M., Alku, P.: Utilizing glottal source pulse library for generating improved excitation signal for HMM-based speech synthesis. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing, (ICASSP), pp. 4564–4567 (2011) Raitio, T., Suni, A., Pulakka, H., Vainio, M., Alku, P.: Utilizing glottal source pulse library for generating improved excitation signal for HMM-based speech synthesis. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing, (ICASSP), pp. 4564–4567 (2011)
9.
Zurück zum Zitat Cabral, J.P.: Uniform concatenative excitation model for synthesising speech without voiced/unvoiced classification. In: Proceedings of the Interspeech, pp. 1082–1086 (2013) Cabral, J.P.: Uniform concatenative excitation model for synthesising speech without voiced/unvoiced classification. In: Proceedings of the Interspeech, pp. 1082–1086 (2013)
10.
Zurück zum Zitat Murty, K.S.R., Yegnanarayana, B.: Epoch extraction from speech signals. IEEE Trans. Audio, Speech Lang. Process. 16(8), 1602–1613 (2008)CrossRef Murty, K.S.R., Yegnanarayana, B.: Epoch extraction from speech signals. IEEE Trans. Audio, Speech Lang. Process. 16(8), 1602–1613 (2008)CrossRef
11.
Zurück zum Zitat Yumoto, E., Gould, W., Baer, T.: Harmonics-to-noise ratio as an index of the degree of hoarseness. J. Acoust. Soc. Am. 71(6), 1544–1550 (1982)CrossRef Yumoto, E., Gould, W., Baer, T.: Harmonics-to-noise ratio as an index of the degree of hoarseness. J. Acoust. Soc. Am. 71(6), 1544–1550 (1982)CrossRef
12.
Zurück zum Zitat Narendra, N.P., Rao, K.S.: Time-domain deterministic plus noise model based hybrid source modeling for HMM-based speech synthesis. In: Speech Communciation, 2015 (Under review) Narendra, N.P., Rao, K.S.: Time-domain deterministic plus noise model based hybrid source modeling for HMM-based speech synthesis. In: Speech Communciation, 2015 (Under review)
14.
Zurück zum Zitat Narendra, N.P., Rao, K.S.: Robust voicing detection and F0 estimation for HMM-based speech synthesis. Circ. Syst. Sig. Process. 34(8), 2597–2619 (2015)CrossRef Narendra, N.P., Rao, K.S.: Robust voicing detection and F0 estimation for HMM-based speech synthesis. Circ. Syst. Sig. Process. 34(8), 2597–2619 (2015)CrossRef
Metadaten
Titel
Hybrid Source Modeling Method Utilizing Optimal Residual Frames for HMM-based Speech Synthesis
verfasst von
N. P. Narendra
K. Sreenivasa Rao
Copyright-Jahr
2015
DOI
https://doi.org/10.1007/978-3-319-26832-3_27