nach oben

Erschienen in:

2015 | OriginalPaper | Buchkapitel

Hybrid Source Modeling Method Utilizing Optimal Residual Frames for HMM-based Speech Synthesis

verfasst von : N. P. Narendra, K. Sreenivasa Rao

Erschienen in: Mining Intelligence and Knowledge Exploration

Verlag: Springer International Publishing

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

This paper proposes a new hybrid source modeling method for improving the quality of HMM-based speech synthesis. The proposed method is an extension of recently proposed source model based on optimal residual frame [1]. The source or excitation signal is first decomposed into a number of pitch-synchronous residual frames. Unique variations are observed in the pitch-synchronous residual frames present at the beginning, middle and end regions of excitation signal of a phone. Based on the observation, one optimal residual frame is extracted from each of the beginning, middle and end regions of excitation signal of a phone. The optimal residual frames extracted from every region of excitation signal are separately grouped in the form of decision tree. During synthesis, for every phone, three optimal residual frames are selected from three decision trees based on target and concatenation costs. Using three optimal residual frames, the excitation signal of a phone is constructed. The proposed hybrid source model is used for synthesizing speech under HTS framework. Subjective evaluation results indicate that the proposed source model is better the two existing source modeling methods.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel Automated Nuclear Pleomorphism Scoring in Breast Cancer Histopathology Images Using Deep Neural Networks

Nächstes Kapitel Significance of Emotionally Significant Regions of Speech for Emotive to Neutral Conversion

Narendra, N.P., Rao, K.S.: Optimal residual frame based source modeling for HMM-based speech synthesis. In: Proceedings of the International Conference on Advances in Pattern Recognition (ICAPR), pp. 1–5 (2015)

Yoshimura, T., Tokuda, K., Masuko, T., Kobayashi, T., Kitamura, T.: Mixed-excitation for HMM-based speech synthesis. In: Proceedings of the Eurospeech, pp. 2259–2262 (2001)

Zen, H., Toda, T., Nakamura, M., Tokuda, K.: Details of Nitech HMM-based speech synthesis system for the Blizzard Challenge 2005. In: IEICE Transactions on Information and Systems, vol. E90-D, pp. 325–333 (2007)

Kawahara, H., Masuda-Katsuse, I., de Cheveigne, A.: Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: possible role of a repetitive structure in sounds. Speech Commun. 27, 187–207 (1999)CrossRef

Maia, R., Toda, T., Zen, H., Nankaku, Y., Tokuda, K.: An excitation model for HMM-based speech synthesis based on residual modeling. In: Proceedings of the Speech Synthesis Workshop 6 (ISCA SW6) (2007)

Raitio, T., Suni, A., Yamagishi, J., Pulakka, H., Nurminen, J., Vainio, M., Alku, P.: HMM-based speech synthesis utilizing glottal inverse filtering. IEEE Trans. Audio, Speech, Lang. Process. 19(1), 153–165 (2011)CrossRef

Drugman, T., Moinet, A., Dutoit, T., Wilfart, G.: Using a pitch-synchrounous residual codebook for hybrid HMM/frame selection speech synthesis. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing, (ICASSP), pp. 3793–3796 (2009)

Raitio, T., Suni, A., Pulakka, H., Vainio, M., Alku, P.: Utilizing glottal source pulse library for generating improved excitation signal for HMM-based speech synthesis. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing, (ICASSP), pp. 4564–4567 (2011)

Cabral, J.P.: Uniform concatenative excitation model for synthesising speech without voiced/unvoiced classification. In: Proceedings of the Interspeech, pp. 1082–1086 (2013)

10.

Murty, K.S.R., Yegnanarayana, B.: Epoch extraction from speech signals. IEEE Trans. Audio, Speech Lang. Process. 16(8), 1602–1613 (2008)CrossRef

11.

Yumoto, E., Gould, W., Baer, T.: Harmonics-to-noise ratio as an index of the degree of hoarseness. J. Acoust. Soc. Am. 71(6), 1544–1550 (1982)CrossRef

12.

Narendra, N.P., Rao, K.S.: Time-domain deterministic plus noise model based hybrid source modeling for HMM-based speech synthesis. In: Speech Communciation, 2015 (Under review)

13.

HMM-based speech synthesis system (HTS). http://hts.sp.nitech.ac.jp/

14.

Narendra, N.P., Rao, K.S.: Robust voicing detection and F0 estimation for HMM-based speech synthesis. Circ. Syst. Sig. Process. 34(8), 2597–2619 (2015)CrossRef

Titel: Hybrid Source Modeling Method Utilizing Optimal Residual Frames for HMM-based Speech Synthesis
verfasst von: N. P. Narendra
K. Sreenivasa Rao
Verlag: Springer International Publishing
Buch: Mining Intelligence and Knowledge Exploration
Print ISBN: 978-3-319-26831-6

Electronic ISBN: 978-3-319-26832-3

Copyright-Jahr: 2015
DOI: https://doi.org/10.1007/978-3-319-26832-3_27

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"