Top

Neural Computing and Applications

Published in:

15-02-2021 | Original Article

Speech synthesis using generative adversarial network for improving readability of Hindi words to recuperate from dyslexia

Authors: Geeta Atkar, Priyadarshini Jayaraju

Published in: Neural Computing and Applications | Issue 15/2021

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Children learn and develop their abilities at their own pace. One of the most basic skills that they acquire is reading. However, some children struggle with reading longer than their friends, and in such a case, it is possible that they have a learning disorder known as dyslexia. The paper aims to use neural networks, namely generative neural networks, for generating raw audio data of two- or three-letter Hindi words. Using the generated data, a system will be built that will pronounce generated words for children recuperating from dyslexia. The system aims to be an effective helping tool for teachers to speed up the recuperation process by making the child repeat the correct pronunciation of the word. The system uses advance Mel-generative adversarial network neural network for working with Mel-spectrograms of the raw audio, by which the system will model its own audio iteratively, until a satisfactory result is achieved. Generated audio sample contains the Hindi words which will be taught to children. Mel-generative adversarial network will be used to generate audio samples since it provides better results compared to other existing models. 300 basic two- or three-letter Hindi words are taken as an input for assisting 5- to 8-year children. Minimum opinion score is calculated for comparison.

previous article Cross-sample entropy for the study of coordinated brain activity in calm and distress conditions with electroencephalographic recordings

next article Optimal tracking control of switched systems applied in grid-connected hybrid generation using reinforcement learning

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Bengio Y (2014) Generative adversarial nets. In: Advances in neural information processing systems, pp 2672–2680

Hecht-Nielsen R (1992) Theory of the back propagation neural network. In: Neural networks for perception. Academic Press, pp 65–93

Donahue C, McAuley J, Puckette M (2018) Adversarial audio synthesis. Preprint http://arxiv.org/abs/1802.04208

Isola P, Zhu J-Y, Zhou T, Efros A (2017) A Image-to-image translation with conditional adversarial networks. In: Proceedings of the IEEE conference on compute vision and pattern recognition, pp 1125–1134

Oord AVD, Dieleman S, Zen H, Simonyan K, Vinyals O, Graves A, Kavukcuoglu K (2016) Wavenet: a generative model for raw audio. Preprint http://arxiv.org/abs/1609.03499

Zhang Z (2018) Improved Adam optimizer for deep neural networks. In: IEEE/ACM 26th international symposium on quality of service (IWQoS), IEEE, pp 1–2

van den Oord A, Dieleman S, Zen H, Simonyan K, Vinyals O, Graves A, Kalchbrenner N, Senior A, Kavukcuogl K (2016) WaveNet: a generative model for raw audio. http://arxiv.org/abs/1609.03499

Zhao Y, Takaki S, Luong HT, Yamagishi J, Saito D, Minematsu N (2018) Wasserstein GAN and waveform loss-based acoustic model training for multi-speaker text-to-speech synthesis systems using a WaveNet vocoder. IEEE Access 6:60478–60488CrossRef

Fang F, Yamagishi J, Echizen I, Lorenzo-Trueba J (2018) April. High-quality nonparallel voice conversion based on cycle-consistent adversarial network. In: 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP), IEEE, pp 5279–5283

10.

Kaneko T, Kameoka H (2018) Cyclegan-vc: non-parallel voice conversion using cycle-consistent adversarial networks. In: 2018 26th European signal processing conference (EUSIPCO), IEEE, pp 2100–2104

11.

Bińkowski M, Donahue J, Dieleman S, Clark A, Elsen E, Casagrande N, Simonyan K (2019) High fidelity speech synthesis with adversarial networks. Preprint http://arxiv.org/abs/1909.11646

12.

Viswanathan M, Viswanathan M (2005) Measuring speech quality for text-tospeech systems: development and assessment of a modified mean opinion score (MOS) scale. Comput Speech Lang 19(01):55–83CrossRef

13.

Kumar K, Kumar R, de Boissiere T, Gestin L, Teoh WZ, Sotelo J, Courville AC (2019) Melgan: generative adversarial networks for conditional waveform synthesis. In: Advances in neural information processing systems, pp 14881–14892

14.

Yoshimura T (2002) Simultaneous modeling of phonetic and prosodic parameters, and characteristic conversion for hmm-based text-to-speech systems.

15.

Moulines E, Charpentier F (1990) Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones. Speech communication

16.

Chan C, Ginosar S, Zhou T, Efros A (2018) Everybody dance now. Preprint http://arxiv.org/abs/1808.07371

17.

Lundh F (1999) An introduction to tkinter. www. Pythonware.com/library/tkinter/introduction/index.html

18.

Kaneko T, Takaki S, Kameoka H, Yamagishi J (2017) Generative adversarial network-based postfilter for STFT spectrograms. In: INTERSPEECH, pp 3389–3393

19.

Radford A, Metz L, Chintala S (2015) Unsupervised representation learning with deep convolutional generative adversarial networks. Preprint http://arxiv.org/abs/1511.06434

20.

Zhang X, Zou Y, Shi W (2017) Dilated convolution neural network with LeakyReLU for environmental sound classification. In: 2017 22nd international conference on digital signal processing (DSP), IEEE, pp 1–5

21.

Namatēvs I (2017) Deep convolutional neural networks: Structure, feature extraction and training. Inf Technol Manag Sci 20(1):40–47

22.

Garofolo JS, Lamel LF, Fisher WM, Fiscus JG, Pallett DS, Dahlgren NL, Zue V (1993) TIMIT acoustic-phonetic continuous speech corpus. Linguistic data consortium.

23.

Cvoki D (2020) Cutting testing costs by the pooling design. Vojnotehniˇcki glasnik/Military Technical Courier 68(4):743–759. https://doi.org/10.5937/vojtehg68-28078CrossRef

24.

Kalchbrenner N, Elsen E, Simonyan K, Noury S, Casagrande N, Lockhart E, Kavukcuoglu K (2018) Efficient neural audio synthesis. Preprint http://arxiv.org/abs/1802.08435

25.

Stevens SS, Volkmann J, Newman EB (1937) A scale for the measurement of the psychological magnitude pitch. J Acoust Soc Am 185:83–90

26.

Theis L, van den Oord A, Bethge M (2016) A note on the evaluation of generative models. In: ICLR

27.

Boesman P (2018) https://www.xeno-canto.org/contributor/OOECIWCSWV, 2018. Accessed

28.

Porter A (2013) Evaluating musical fingerprinting systems, Doctoral dissertation, McGill University Libraries

29.

McGugan W (2007) Beginning game development with Python and Pygame: from novice to professional. Apress

30.

Fabiano N, Radanovich S (2020) On covid-19 diffusion in Italy: data analysis and possible. Vojnotehniˇcki glasnik/Military Technical Courier 68(2):216–224. https://doi.org/10.5937/vojtehg68-25948CrossRef

31.

Salimans T, Kingma DP (2016) Weight normalization: a simple reparameterization to accelerate training of deep neural networks. In: Advances in neural information processing systems, pp 901–909

Title: Speech synthesis using generative adversarial network for improving readability of Hindi words to recuperate from dyslexia
Authors: Geeta Atkar
Priyadarshini Jayaraju
Publication date: 15-02-2021
Publisher: Springer London
Published in: Neural Computing and Applications / Issue 15/2021
Print ISSN: 0941-0643
Electronic ISSN: 1433-3058
DOI: https://doi.org/10.1007/s00521-021-05695-3

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Springer Professional "Wirtschaft+Technik"

Other articles of this Issue 15/2021

Optimizing a two-level closed-loop supply chain under the vendor managed inventory contract and learning: Fibonacci, GA, IWO, MFO algorithms

Cross-sample entropy for the study of coordinated brain activity in calm and distress conditions with electroencephalographic recordings

BiLSTM and dynamic fuzzy AHP-GA method for procedural game level generation

Deep learning to classify ultra-high-energy cosmic rays by means of PMT signals

An energy efficient intelligent torque vectoring approach based on fuzzy logic controller and neural network tire forces estimator

A deep multi-source adaptation transfer network for cross-subject electroencephalogram emotion recognition

Premium Partner