Skip to main content
Top

2020 | OriginalPaper | Chapter

A Study of Speech Phase in Dysarthria Voice Conversion System

Authors : Ko-Chiang Chen, Ji-Yan Han, Sin-Hua Jhang, Ying-Hui Lai

Published in: Future Trends in Biomedical and Health Informatics and Cybersecurity in Medical Devices

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Dysarthria is a communication disorder common in people with damaged neuro-muscular apparatus resulting from events such as stroke. For a dysarthric speaker, voice conversion (VC) is one of the well-known approaches to improve speech intelligibility for a dysarthric speaker. Most of the well-known VC methods focus on converting amplitude features without phase information. Previous studies indicated that phase is an important factor in the speech signal. Therefore, we are interested in adding the correct phase information to VC for dysarthria speech. The results of automatic speech recognition and spectrum analysis show that intelligibility is improved by replacing the dysarthria phase with the normal phase during the synthesis step. It implies that the correct phase information must be considered for the dysarthria VC system.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
2.
go back to reference Hosom, J.-P., Kain, A.B., Mishra, T., Van Santen, J.P., Fried-Oken, M., Staehely, J.: Intelligibility of modifications to dysarthric speech. In: Proceedings of Acoustics, Speech, and Signal Processing, p. I (2003) Hosom, J.-P., Kain, A.B., Mishra, T., Van Santen, J.P., Fried-Oken, M., Staehely, J.: Intelligibility of modifications to dysarthric speech. In: Proceedings of Acoustics, Speech, and Signal Processing, p. I (2003)
3.
go back to reference Hwang, H.-T., Tsao, Y., Wang, H.-M., Wang, Y.-R., Chen, S.-H.: Alleviating the over-smoothing problem in GMM-based voice conversion with discriminative training. In: Proceedings of Interspeech, pp. 3062–3066 (2013) Hwang, H.-T., Tsao, Y., Wang, H.-M., Wang, Y.-R., Chen, S.-H.: Alleviating the over-smoothing problem in GMM-based voice conversion with discriminative training. In: Proceedings of Interspeech, pp. 3062–3066 (2013)
4.
go back to reference Toda, T., Black, A.W., Tokuda, K.: Voice conversion based on maximum-likelihood estimation of spectral parameter trajectory. IEEE Trans. Audio Speech Lang. Process. 15(8), 2222–2235 (2007)CrossRef Toda, T., Black, A.W., Tokuda, K.: Voice conversion based on maximum-likelihood estimation of spectral parameter trajectory. IEEE Trans. Audio Speech Lang. Process. 15(8), 2222–2235 (2007)CrossRef
5.
go back to reference Virtanen, T.: Monaural sound source separation by nonnegative matrix factorization with temporal continuity and sparseness criteria. IEEE Trans. Audio Speech Lang. Process. 15(3), 1066–1074 (2007)CrossRef Virtanen, T.: Monaural sound source separation by nonnegative matrix factorization with temporal continuity and sparseness criteria. IEEE Trans. Audio Speech Lang. Process. 15(3), 1066–1074 (2007)CrossRef
6.
go back to reference Zhang, Q., Tao, L., Zhou, J., Wang, H.: The voice conversion method based on sparse convolutive non-negative matrix factorization. In: Proceedings of International Conference on Electrical and Information Technologies for Rail Transportation, pp. 259–267 (2016)CrossRef Zhang, Q., Tao, L., Zhou, J., Wang, H.: The voice conversion method based on sparse convolutive non-negative matrix factorization. In: Proceedings of International Conference on Electrical and Information Technologies for Rail Transportation, pp. 259–267 (2016)CrossRef
7.
go back to reference Fu, S.-W., Li, P.-C., Lai, Y.-H., Yang, C.-C., Hsieh, L.-C., Tsao, Y.: Joint dictionary learning-based non-negative matrix factorization for voice conversion to improve speech intelligibility after oral surgery. IEEE Trans. Biomed. Eng. 64(11), 2584–2594 (2017)CrossRef Fu, S.-W., Li, P.-C., Lai, Y.-H., Yang, C.-C., Hsieh, L.-C., Tsao, Y.: Joint dictionary learning-based non-negative matrix factorization for voice conversion to improve speech intelligibility after oral surgery. IEEE Trans. Biomed. Eng. 64(11), 2584–2594 (2017)CrossRef
8.
go back to reference Narendranath, M., Murthy, H.A., Rajendran, S., Yegnanarayana, B.: Transformation of formants for voice conversion using artificial neural networks. Speech Commun. 16(2), 207–216 (1995)CrossRef Narendranath, M., Murthy, H.A., Rajendran, S., Yegnanarayana, B.: Transformation of formants for voice conversion using artificial neural networks. Speech Commun. 16(2), 207–216 (1995)CrossRef
9.
go back to reference Chen, L.-H., Ling, Z.-H., Liu, L.-J., Dai, L.-R.: Voice conversion using deep neural networks with layer-wise generative training. IEEE/ACM Trans. Audio Speech Lang. Process. 22(12), 1859–1872 (2014)CrossRef Chen, L.-H., Ling, Z.-H., Liu, L.-J., Dai, L.-R.: Voice conversion using deep neural networks with layer-wise generative training. IEEE/ACM Trans. Audio Speech Lang. Process. 22(12), 1859–1872 (2014)CrossRef
10.
go back to reference Zhou, C., Horgan, M., Kumar, V., Vasco, C., Darcy, D.: Voice conversion with conditional sampleRNN. arXiv preprint arXiv:1808.08311 (2018) Zhou, C., Horgan, M., Kumar, V., Vasco, C., Darcy, D.: Voice conversion with conditional sampleRNN. arXiv preprint arXiv:​1808.​08311 (2018)
11.
go back to reference Chorowski, J., Weiss, R.J., Saurous, R.A., Bengio, S.: On using backpropagation for speech texture generation and voice conversion. In: Proceedings of Acoustics, Speech and Signal Processing, pp. 2256–2260 (2018) Chorowski, J., Weiss, R.J., Saurous, R.A., Bengio, S.: On using backpropagation for speech texture generation and voice conversion. In: Proceedings of Acoustics, Speech and Signal Processing, pp. 2256–2260 (2018)
12.
go back to reference Ohm, G.S.: Über die Definition des Tones, nebst daran geknüpfter Theorie der Sirene und ähnlicher tonbildender Vorrichtungen. Ann. Phys. 135(8), 513–565 (1843)CrossRef Ohm, G.S.: Über die Definition des Tones, nebst daran geknüpfter Theorie der Sirene und ähnlicher tonbildender Vorrichtungen. Ann. Phys. 135(8), 513–565 (1843)CrossRef
13.
go back to reference Helmholtz, H.: On the sensations of tone. Courier Corporation (2013) Helmholtz, H.: On the sensations of tone. Courier Corporation (2013)
14.
go back to reference Kim, D.-S.: Perceptual phase redundancy in speech. In: Proceedings of Acoustics, Speech, and Signal Processing, pp. 1383–1386 (2000) Kim, D.-S.: Perceptual phase redundancy in speech. In: Proceedings of Acoustics, Speech, and Signal Processing, pp. 1383–1386 (2000)
15.
go back to reference Pobloth, H., Kleijn, W.B.: On phase perception in speech. In: Proceedings of Acoustics, Speech, and Signal Processing, pp. 29–32 (1999) Pobloth, H., Kleijn, W.B.: On phase perception in speech. In: Proceedings of Acoustics, Speech, and Signal Processing, pp. 29–32 (1999)
16.
go back to reference Keogh, E., Ratanamahatana, C.A.: Exact indexing of dynamic time warping. Knowl. Inf. Syst. 7(3), 358–386 (2005)CrossRef Keogh, E., Ratanamahatana, C.A.: Exact indexing of dynamic time warping. Knowl. Inf. Syst. 7(3), 358–386 (2005)CrossRef
18.
go back to reference Mohammadi, S.H., Kain, A.: An overview of voice conversion systems. Speech Commun. 88, 65–82 (2017)CrossRef Mohammadi, S.H., Kain, A.: An overview of voice conversion systems. Speech Commun. 88, 65–82 (2017)CrossRef
19.
go back to reference Griffin, D.W.: Signal estimation from modified short-time Fourier transfrom. IEEE ASSP 32, 2 (1984)CrossRef Griffin, D.W.: Signal estimation from modified short-time Fourier transfrom. IEEE ASSP 32, 2 (1984)CrossRef
20.
go back to reference Wong, L.L., Soli, S.D., Liu, S., Han, N., Huang, M.-W.: Development of the Mandarin hearing in noise test (MHINT). Ear Hear. 28(2), 70S–74S (2007)CrossRef Wong, L.L., Soli, S.D., Liu, S., Han, N., Huang, M.-W.: Development of the Mandarin hearing in noise test (MHINT). Ear Hear. 28(2), 70S–74S (2007)CrossRef
21.
go back to reference Muda, L., Begam, M., Elamvazuthi, I.: Voice recognition algorithms using mel frequency cepstral coefficient (MFCC) and dynamic time warping (DTW) techniques. arXiv preprint arXiv:1003.4083 (2010) Muda, L., Begam, M., Elamvazuthi, I.: Voice recognition algorithms using mel frequency cepstral coefficient (MFCC) and dynamic time warping (DTW) techniques. arXiv preprint arXiv:​1003.​4083 (2010)
Metadata
Title
A Study of Speech Phase in Dysarthria Voice Conversion System
Authors
Ko-Chiang Chen
Ji-Yan Han
Sin-Hua Jhang
Ying-Hui Lai
Copyright Year
2020
DOI
https://doi.org/10.1007/978-3-030-30636-6_31