Skip to main content
Top
Published in: Neural Computing and Applications 15/2021

15-02-2021 | Original Article

Speech synthesis using generative adversarial network for improving readability of Hindi words to recuperate from dyslexia

Authors: Geeta Atkar, Priyadarshini Jayaraju

Published in: Neural Computing and Applications | Issue 15/2021

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Children learn and develop their abilities at their own pace. One of the most basic skills that they acquire is reading. However, some children struggle with reading longer than their friends, and in such a case, it is possible that they have a learning disorder known as dyslexia. The paper aims to use neural networks, namely generative neural networks, for generating raw audio data of two- or three-letter Hindi words. Using the generated data, a system will be built that will pronounce generated words for children recuperating from dyslexia. The system aims to be an effective helping tool for teachers to speed up the recuperation process by making the child repeat the correct pronunciation of the word. The system uses advance Mel-generative adversarial network neural network for working with Mel-spectrograms of the raw audio, by which the system will model its own audio iteratively, until a satisfactory result is achieved. Generated audio sample contains the Hindi words which will be taught to children. Mel-generative adversarial network will be used to generate audio samples since it provides better results compared to other existing models. 300 basic two- or three-letter Hindi words are taken as an input for assisting 5- to 8-year children. Minimum opinion score is calculated for comparison.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Bengio Y (2014) Generative adversarial nets. In: Advances in neural information processing systems, pp 2672–2680 Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Bengio Y (2014) Generative adversarial nets. In: Advances in neural information processing systems, pp 2672–2680
2.
go back to reference Hecht-Nielsen R (1992) Theory of the back propagation neural network. In: Neural networks for perception. Academic Press, pp 65–93 Hecht-Nielsen R (1992) Theory of the back propagation neural network. In: Neural networks for perception. Academic Press, pp 65–93
4.
go back to reference Isola P, Zhu J-Y, Zhou T, Efros A (2017) A Image-to-image translation with conditional adversarial networks. In: Proceedings of the IEEE conference on compute vision and pattern recognition, pp 1125–1134 Isola P, Zhu J-Y, Zhou T, Efros A (2017) A Image-to-image translation with conditional adversarial networks. In: Proceedings of the IEEE conference on compute vision and pattern recognition, pp 1125–1134
6.
go back to reference Zhang Z (2018) Improved Adam optimizer for deep neural networks. In: IEEE/ACM 26th international symposium on quality of service (IWQoS), IEEE, pp 1–2 Zhang Z (2018) Improved Adam optimizer for deep neural networks. In: IEEE/ACM 26th international symposium on quality of service (IWQoS), IEEE, pp 1–2
8.
go back to reference Zhao Y, Takaki S, Luong HT, Yamagishi J, Saito D, Minematsu N (2018) Wasserstein GAN and waveform loss-based acoustic model training for multi-speaker text-to-speech synthesis systems using a WaveNet vocoder. IEEE Access 6:60478–60488CrossRef Zhao Y, Takaki S, Luong HT, Yamagishi J, Saito D, Minematsu N (2018) Wasserstein GAN and waveform loss-based acoustic model training for multi-speaker text-to-speech synthesis systems using a WaveNet vocoder. IEEE Access 6:60478–60488CrossRef
9.
go back to reference Fang F, Yamagishi J, Echizen I, Lorenzo-Trueba J (2018) April. High-quality nonparallel voice conversion based on cycle-consistent adversarial network. In: 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP), IEEE, pp 5279–5283 Fang F, Yamagishi J, Echizen I, Lorenzo-Trueba J (2018) April. High-quality nonparallel voice conversion based on cycle-consistent adversarial network. In: 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP), IEEE, pp 5279–5283
10.
go back to reference Kaneko T, Kameoka H (2018) Cyclegan-vc: non-parallel voice conversion using cycle-consistent adversarial networks. In: 2018 26th European signal processing conference (EUSIPCO), IEEE, pp 2100–2104 Kaneko T, Kameoka H (2018) Cyclegan-vc: non-parallel voice conversion using cycle-consistent adversarial networks. In: 2018 26th European signal processing conference (EUSIPCO), IEEE, pp 2100–2104
12.
go back to reference Viswanathan M, Viswanathan M (2005) Measuring speech quality for text-tospeech systems: development and assessment of a modified mean opinion score (MOS) scale. Comput Speech Lang 19(01):55–83CrossRef Viswanathan M, Viswanathan M (2005) Measuring speech quality for text-tospeech systems: development and assessment of a modified mean opinion score (MOS) scale. Comput Speech Lang 19(01):55–83CrossRef
13.
go back to reference Kumar K, Kumar R, de Boissiere T, Gestin L, Teoh WZ, Sotelo J, Courville AC (2019) Melgan: generative adversarial networks for conditional waveform synthesis. In: Advances in neural information processing systems, pp 14881–14892 Kumar K, Kumar R, de Boissiere T, Gestin L, Teoh WZ, Sotelo J, Courville AC (2019) Melgan: generative adversarial networks for conditional waveform synthesis. In: Advances in neural information processing systems, pp 14881–14892
14.
go back to reference Yoshimura T (2002) Simultaneous modeling of phonetic and prosodic parameters, and characteristic conversion for hmm-based text-to-speech systems. Yoshimura T (2002) Simultaneous modeling of phonetic and prosodic parameters, and characteristic conversion for hmm-based text-to-speech systems.
15.
go back to reference Moulines E, Charpentier F (1990) Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones. Speech communication Moulines E, Charpentier F (1990) Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones. Speech communication
17.
go back to reference Lundh F (1999) An introduction to tkinter. www. Pythonware.com/library/tkinter/introduction/index.html Lundh F (1999) An introduction to tkinter. www. Pythonware.com/library/tkinter/introduction/index.html
18.
go back to reference Kaneko T, Takaki S, Kameoka H, Yamagishi J (2017) Generative adversarial network-based postfilter for STFT spectrograms. In: INTERSPEECH, pp 3389–3393 Kaneko T, Takaki S, Kameoka H, Yamagishi J (2017) Generative adversarial network-based postfilter for STFT spectrograms. In: INTERSPEECH, pp 3389–3393
20.
go back to reference Zhang X, Zou Y, Shi W (2017) Dilated convolution neural network with LeakyReLU for environmental sound classification. In: 2017 22nd international conference on digital signal processing (DSP), IEEE, pp 1–5 Zhang X, Zou Y, Shi W (2017) Dilated convolution neural network with LeakyReLU for environmental sound classification. In: 2017 22nd international conference on digital signal processing (DSP), IEEE, pp 1–5
21.
go back to reference Namatēvs I (2017) Deep convolutional neural networks: Structure, feature extraction and training. Inf Technol Manag Sci 20(1):40–47 Namatēvs I (2017) Deep convolutional neural networks: Structure, feature extraction and training. Inf Technol Manag Sci 20(1):40–47
22.
go back to reference Garofolo JS, Lamel LF, Fisher WM, Fiscus JG, Pallett DS, Dahlgren NL, Zue V (1993) TIMIT acoustic-phonetic continuous speech corpus. Linguistic data consortium. Garofolo JS, Lamel LF, Fisher WM, Fiscus JG, Pallett DS, Dahlgren NL, Zue V (1993) TIMIT acoustic-phonetic continuous speech corpus. Linguistic data consortium.
25.
go back to reference Stevens SS, Volkmann J, Newman EB (1937) A scale for the measurement of the psychological magnitude pitch. J Acoust Soc Am 185:83–90 Stevens SS, Volkmann J, Newman EB (1937) A scale for the measurement of the psychological magnitude pitch. J Acoust Soc Am 185:83–90
26.
go back to reference Theis L, van den Oord A, Bethge M (2016) A note on the evaluation of generative models. In: ICLR Theis L, van den Oord A, Bethge M (2016) A note on the evaluation of generative models. In: ICLR
28.
go back to reference Porter A (2013) Evaluating musical fingerprinting systems, Doctoral dissertation, McGill University Libraries Porter A (2013) Evaluating musical fingerprinting systems, Doctoral dissertation, McGill University Libraries
29.
go back to reference McGugan W (2007) Beginning game development with Python and Pygame: from novice to professional. Apress McGugan W (2007) Beginning game development with Python and Pygame: from novice to professional. Apress
31.
go back to reference Salimans T, Kingma DP (2016) Weight normalization: a simple reparameterization to accelerate training of deep neural networks. In: Advances in neural information processing systems, pp 901–909 Salimans T, Kingma DP (2016) Weight normalization: a simple reparameterization to accelerate training of deep neural networks. In: Advances in neural information processing systems, pp 901–909
Metadata
Title
Speech synthesis using generative adversarial network for improving readability of Hindi words to recuperate from dyslexia
Authors
Geeta Atkar
Priyadarshini Jayaraju
Publication date
15-02-2021
Publisher
Springer London
Published in
Neural Computing and Applications / Issue 15/2021
Print ISSN: 0941-0643
Electronic ISSN: 1433-3058
DOI
https://doi.org/10.1007/s00521-021-05695-3

Other articles of this Issue 15/2021

Neural Computing and Applications 15/2021 Go to the issue

Premium Partner