Top

International Journal of Speech Technology

Published in:

01-06-2012

A HMM-WDLT framework for HNM-based voice conversion with parametric adjustment in formant bandwidth, duration and excitation

Authors: Hwai-Tsu Hu, Chu Yu

Published in: International Journal of Speech Technology | Issue 2/2012

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

This paper presents a framework, named Hidden Markov Model—Weighted Deviation Linear Transformation (HMM-WDLT), for performing voice conversion based on the Harmonic + Noise Model (HNM). The HMM-WDLT achieves the lowest average spectral distortion in a comparative study of spectral conversion. The problem with broader formant bandwidths can be remedied by a weighting constraint and ordering check with the minimum clearance estimated from the HMM-WDLT. By jointly exploiting the dynamic time warping (DTW) and the HMM-WDLT, the conversion in duration is also feasible. Moreover, the HMM-WDLT plays a part in the conversion of excitation-related parameters such as the fundamental frequency, maximum voiced frequency, and harmonic magnitudes for critical bands below 2.7 kHz. The ability of modifying the pitch and duration concurrently allows the HMM-WDLT to carry out the prosody conversion. Listening tests reveal that the converted speech successfully catches the speaker’s individuality with satisfactory quality.

previous article Multilingual recognition of non-native speech using acoustic model transformation and pronunciation modeling

next article Time–domain non-linear feature parameter for consonant classification

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Abe, M., Nakamura, S., Shikano, K., & Kuwabara, H. (1988). Voice conversion through vector quantization. In Proc. ICASSP (pp. 655–658).

Arslan, L. M. (1999). Speaker transformation algorithm using segmental. codebooks (STASC). Speech Communication, 28, 211–226. CrossRef

Chen, Y., Chu, M., Chang, E., Liu, J., & Liu, R. (2003). Voice conversion with smoothed GMM and MAP adaptation. In Proc. EUROSPEECH (pp. 2413–2416).

Gray, A. H., Jr., & Markel, J. D. (1976). Distance measures for speech processing. IEEE Transactions on Acoustics, Speech, and Signal Processing, 24(5), 380–391. MathSciNetCrossRef

Hu, H. T., & Yu, C. (2009). Combining HMM and weighted deviation linear transformation for highband speech parameter estimation. IEICE Transactions on Information and Systems, E92-D(7), 1488–1490. CrossRef

Hu, H. T., & Yu, C. (2010). Narrowband-to-wideband expansion of telephony speech using piecewise deviation linear transformation. International Journal of Electrical Engineering, 17(1), 7–17.

Jax, P., & Vary, P. (2003). On artificial bandwidth extension of telephone speech. Signal Processing, 83, 1707–1719. MATHCrossRef

Kim, E. K., Lee, S., & Oh, Y. H. (1997). Hidden Markov model based voice conversion using dynamic characteristics of speaker. In Proc. EUROSPEECH (pp. 2519–2522).

Lee, K. S. (2007). Statistical approach for voice personality transformation. IEEE Transactions on Audio, Speech, and Language Processing, 15(2), 641–651. CrossRef

Li, D., & Dang, J. (2007). Speech analysis: the production-perception perspective. In C. H. Lee, H. Li, L. S. Lee, R. H. Wang & Q. Huo (Eds.), Advances in Chinese spoken language processing. Singapore: World Scientific.

McCree, A., Truong, K., George, E. B., Barnwell, T. P., & Viswanathan, V. (1996). A 2.4 kbit/s MELP coder candidate for the new U.S. Federal Standard. In Proc. ICASSP (Vol. 1, pp. 200–203).

Mizuno, H., & Abe, M. (1995). Voice conversion algorithm based on piecewise linear conversion rules of formant frequency and spectral tilt. Speech Communication, 16(2), 153–164. CrossRef

Narendranath, M., Hema, A., Rajendran, S., & Yegnanarayana, B. (1995). Transformation of formants for voice conversion using artificial neural networks. Speech Communication, 16(2), 207–216. CrossRef

Neuburg, E. P. (1987). Dynamic frequency warping, the dual of dynamic time warping. The Journal of the Acoustical Society of America, 81(S1), S94–S94. CrossRef

Rabiner, L., & Juang, B. H. (1993). Fundamentals of speech recognition. New York: Prentice-Hall.

Savic, M., & Nam, I. H. (1991). Voice personality transformation. Digital Signal Processing, 4, 107–110. CrossRef

Soong, F. K., & Juang, B. H. (1984). Line spectrum pair (LSP) and speech data compression. In Proc. ICASSP (pp. 1.10.1–1.10.4).

Rentzos, D., Vaseghi, S., & Yan, Q. (2005). Parametric formant modelling and transformation in voice conversion. International Journal of Speech Technology, 8, 227–245. CrossRef

Stylianou, Y. (2001a). Applying the Harmonic plus Noise model in concatenative speech synthesis. IEEE Transactions on Speech and Audio Processing, 9(1), 21–29. CrossRef

Stylianou, Y. (2001b). Removing linear phase mismatches in concatenative speech synthesis. IEEE Transactions on Speech and Audio Processing, 9(3), 232–239. CrossRef

Stylianou, Y., Cappe, O., & Moulines, E. (1998). Continuous probabilistic transform for voice conversion. IEEE Transactions on Speech and Audio Processing, 6(2), 131–142. CrossRef

Valbret, H., Moulines, E., & Tubach, J. P. (1992). Voice transformation using PSOLA techniques. Speech Communication, 11, 175–187. CrossRef

Yue, Z., Zou, X., Jia, Y., & Wang, H. (2008). Voice conversion using HMM combined with GMM. In Congress on image and signal processing (Vol. 5, pp. 366–370). CrossRef

Title: A HMM-WDLT framework for HNM-based voice conversion with parametric adjustment in formant bandwidth, duration and excitation
Authors: Hwai-Tsu Hu
Chu Yu
Publication date: 01-06-2012
Publisher: Springer US
Published in: International Journal of Speech Technology / Issue 2/2012
Print ISSN: 1381-2416
Electronic ISSN: 1572-8110
DOI: https://doi.org/10.1007/s10772-012-9135-7

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Other articles of this Issue 2/2012

The Construction-Integration framework: a means to diminish bias in LSA-based call routing

Speaker-independent emotion recognition exploiting a psychologically-inspired binary cascade classification schema

Spectral histogram of oriented gradients (SHOGs) for Tamil language male/female speaker classification

A new approach to acoustic analysis of two British regional accents—Birmingham and Liverpool accents

Emotion recognition from speech: a review

Overall performance evaluation of adaptive multi rate 06.90 speech codec based on code excited linear prediction algorithm using MATLAB