Top

Published in:

2020 | OriginalPaper | Chapter

A Study of Speech Phase in Dysarthria Voice Conversion System

Authors : Ko-Chiang Chen, Ji-Yan Han, Sin-Hua Jhang, Ying-Hui Lai

Published in: Future Trends in Biomedical and Health Informatics and Cybersecurity in Medical Devices

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Dysarthria is a communication disorder common in people with damaged neuro-muscular apparatus resulting from events such as stroke. For a dysarthric speaker, voice conversion (VC) is one of the well-known approaches to improve speech intelligibility for a dysarthric speaker. Most of the well-known VC methods focus on converting amplitude features without phase information. Previous studies indicated that phase is an important factor in the speech signal. Therefore, we are interested in adding the correct phase information to VC for dysarthria speech. The results of automatic speech recognition and spectrum analysis show that intelligibility is improved by replacing the dysarthria phase with the normal phase during the synthesis step. It implies that the correct phase information must be considered for the dysarthria VC system.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

previous chapter Correlation Between Time-Domain Features of Electrohysterogram Data of Pregnant Women and Gestational Age

next chapter Toward the Precision Medicine for a Psychiatric Disorder: Light Therapy for Major Depressive Disorder with Neuroimaging Validation

Quick Facts About ASHA: American Speech-Language-Hearing Association (n.d.). https://www.asha.org/about/news/quick-facts/. Accessed 1 Feb 2019

Hosom, J.-P., Kain, A.B., Mishra, T., Van Santen, J.P., Fried-Oken, M., Staehely, J.: Intelligibility of modifications to dysarthric speech. In: Proceedings of Acoustics, Speech, and Signal Processing, p. I (2003)

Hwang, H.-T., Tsao, Y., Wang, H.-M., Wang, Y.-R., Chen, S.-H.: Alleviating the over-smoothing problem in GMM-based voice conversion with discriminative training. In: Proceedings of Interspeech, pp. 3062–3066 (2013)

Toda, T., Black, A.W., Tokuda, K.: Voice conversion based on maximum-likelihood estimation of spectral parameter trajectory. IEEE Trans. Audio Speech Lang. Process. 15(8), 2222–2235 (2007)CrossRef

Virtanen, T.: Monaural sound source separation by nonnegative matrix factorization with temporal continuity and sparseness criteria. IEEE Trans. Audio Speech Lang. Process. 15(3), 1066–1074 (2007)CrossRef

Zhang, Q., Tao, L., Zhou, J., Wang, H.: The voice conversion method based on sparse convolutive non-negative matrix factorization. In: Proceedings of International Conference on Electrical and Information Technologies for Rail Transportation, pp. 259–267 (2016)CrossRef

Fu, S.-W., Li, P.-C., Lai, Y.-H., Yang, C.-C., Hsieh, L.-C., Tsao, Y.: Joint dictionary learning-based non-negative matrix factorization for voice conversion to improve speech intelligibility after oral surgery. IEEE Trans. Biomed. Eng. 64(11), 2584–2594 (2017)CrossRef

Narendranath, M., Murthy, H.A., Rajendran, S., Yegnanarayana, B.: Transformation of formants for voice conversion using artificial neural networks. Speech Commun. 16(2), 207–216 (1995)CrossRef

Chen, L.-H., Ling, Z.-H., Liu, L.-J., Dai, L.-R.: Voice conversion using deep neural networks with layer-wise generative training. IEEE/ACM Trans. Audio Speech Lang. Process. 22(12), 1859–1872 (2014)CrossRef

10.

Zhou, C., Horgan, M., Kumar, V., Vasco, C., Darcy, D.: Voice conversion with conditional sampleRNN. arXiv preprint arXiv:1808.08311 (2018)

11.

Chorowski, J., Weiss, R.J., Saurous, R.A., Bengio, S.: On using backpropagation for speech texture generation and voice conversion. In: Proceedings of Acoustics, Speech and Signal Processing, pp. 2256–2260 (2018)

12.

Ohm, G.S.: Über die Definition des Tones, nebst daran geknüpfter Theorie der Sirene und ähnlicher tonbildender Vorrichtungen. Ann. Phys. 135(8), 513–565 (1843)CrossRef

13.

Helmholtz, H.: On the sensations of tone. Courier Corporation (2013)

14.

Kim, D.-S.: Perceptual phase redundancy in speech. In: Proceedings of Acoustics, Speech, and Signal Processing, pp. 1383–1386 (2000)

15.

Pobloth, H., Kleijn, W.B.: On phase perception in speech. In: Proceedings of Acoustics, Speech, and Signal Processing, pp. 29–32 (1999)

16.

Keogh, E., Ratanamahatana, C.A.: Exact indexing of dynamic time warping. Knowl. Inf. Syst. 7(3), 358–386 (2005)CrossRef

17.

Salamon, J., Bello, J.P.: Deep convolutional neural networks and data augmentation for environmental sound classification. IEEE Signal Process. Lett. 24(3), 279–283 (2017). https://doi.org/10.1109/LSP.2017.2657381CrossRef

18.

Mohammadi, S.H., Kain, A.: An overview of voice conversion systems. Speech Commun. 88, 65–82 (2017)CrossRef

19.

Griffin, D.W.: Signal estimation from modified short-time Fourier transfrom. IEEE ASSP 32, 2 (1984)CrossRef

20.

Wong, L.L., Soli, S.D., Liu, S., Han, N., Huang, M.-W.: Development of the Mandarin hearing in noise test (MHINT). Ear Hear. 28(2), 70S–74S (2007)CrossRef

21.

Muda, L., Begam, M., Elamvazuthi, I.: Voice recognition algorithms using mel frequency cepstral coefficient (MFCC) and dynamic time warping (DTW) techniques. arXiv preprint arXiv:1003.4083 (2010)

Title: A Study of Speech Phase in Dysarthria Voice Conversion System
Authors: Ko-Chiang Chen
Ji-Yan Han
Sin-Hua Jhang
Ying-Hui Lai
Publisher: Springer International Publishing
Book: Future Trends in Biomedical and Health Informatics and Cybersecurity in Medical Devices
Print ISBN: 978-3-030-30635-9

Electronic ISBN: 978-3-030-30636-6

Copyright Year: 2020
DOI: https://doi.org/10.1007/978-3-030-30636-6_31

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"