nach oben

Erschienen in:

2022 | OriginalPaper | Buchkapitel

A Review on Speech Synthesis Based on Machine Learning

verfasst von : Ruchika Kumari, Amita Dev, Ashwni Kumar

Erschienen in: Artificial Intelligence and Speech Technology

Verlag: Springer International Publishing

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Recently, Speech synthesis is one of the growing techniques in the research domain that takes input as text and provides output as acoustical form. The speech synthesis system is more advantageous to physically impaired people. In execution process, there arise some complications by surrounding noises and communication style. To neglect such unnecessary noises various machine learning techniques are employed. In this paper, we described various techniques adopted to improve the naturalness and quality of synthesized speech. The main contribution of this paper is to elaborate and compare the characteristics of techniques utilized in speech synthesis for different languages. The techniques such as support vector machine, Artificial Neural Network, Gaussian mixture modeling, Generative adversarial network, Deep Neural Network and Hidden Markov Model are employed in this work to enhance the speech naturalness and quality of synthesized speech signals.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel Speaker Independent Accent Based Speech Recognition for Malayalam Isolated Words: An LSTM-RNN Approach

Nächstes Kapitel Hindi Phoneme Recognition - A Review

Kumari, R., Dev, A., Kumar, A.: Automatic segmentation of hindi speech into syllable-like units. Int. J. Adv. Comput. Sci. Appl. 11(6), 400–406 (2020)

Kumari, R., Dev, A., Kumar, A.: Development of syllable dominated Hindi speech corpora. Int. Conf. Artif. Intell. Speech Technol. (AIST2019) 8(3), 1−9 (2019)

Macchi, M.: Issues in text-to-speech synthesis. In: Proceedings of the IEEE International Joint Symposia on Intelligence and Systems (Cat. No. 98EX174), pp. 318–325 (1998)

Baby, A., Prakash, J.J., Subramanian, A.S., Murthy, H.A.: Significance of spectral cues in automatic speech segmentation for Indian language speech synthesizers. Speech Commun. 123, 10–25 (2020)CrossRef

Kumari, R., Dev, A., Bayana, A., Kumar, A.: Machine learning techniques in speech generation: a review. J. Adv. Res. Dyn. Control Syst. 9, 1095–1110 (2019)CrossRef

Balyan, A.: An overview on resources for development of Hindi speech synthesis system. New Ideas Concerning Sci. Technol. 11, 57–63 (2021)

Bhatt, S., Jain, A., Dev, A.: Syllable based Hindi speech recognition. J. Inf. Optim. Sci. 41, 1333–1351 (2020)

Ramteke, G.D., Ramteke, R.J.: Efficient model for numerical text-to-speech synthesis system in Marathi, Hindi and English languages. Int. J. Image Graphics Sig. Process. 9(3), 1–13 (2017)CrossRef

Begum, A., Askari, S.M.: Text-to-speech synthesis system for mymensinghiya dialect of Bangla language. In: Panigrahi, C.R., Pujari, A.K., Misra, S., Pati, B., Li, K.-C. (eds.) Progress in Advanced Computing and Intelligent Engineering, pp. 291–303. Springer Singapore, Singapore (2019). https://doi.org/10.1007/978-981-13-0224-4_27CrossRef

10.

Rajendran, V., Kumar, G.B.: A Robust syllable centric pronunciation model for Tamil text to speech synthesizer. IETE J. Res. 65(5), 601–612 (2019)CrossRef

11.

Ramteke, R.J., Ramteke, G.D.: Hindi spoken signals for speech synthesizer. In: 2016 2nd International Conference on Next Generation Computing Technologies (NGCT), pp. 323–328. IEEE (2016)

12.

Balyan, A., Agrawal, S.S., Dev, A.: Speech synthesis: a review. Int. J. Eng. Res. Technol. (IJERT) 2(6), 57–75 (2013)

13.

Keletay, M.A., Worku, H.S.: Developing concatenative based text to speech synthesizer for Tigrigna. Internet Things Cloud Comput. 8(6), 24–30 (2020)CrossRef

14.

Reddy, M.K., Rao, K.S.: Improved HMM-based mixed-language (Telugu–Hindi) polyglot speech synthesis. In: Advances in Communication, Signal Processing, VLSI, and Embedded Systems, pp. 279–287 (2020)

15.

Panda, S.P., Nayak, A.K.: Automatic speech segmentation in syllable centric speech recognition system. Int. J. Speech Technol. 19(1), 9–18 (2016)CrossRef

16.

Balyan, A., Agrawal, S.S., Dev, A.: Automatic phonetic segmentation of Hindi speech using hidden Markov model. AI Soc. 27, 543–549 (2012)CrossRef

17.

Balyan, A., Dev, A., Kumari, R., Agrawal, S.S.: Labelling of Hindi speech. IETE J. Res. 62, 146–153 (2016)CrossRef

18.

Balyan, A.: Resources for development of Hindi speech synthesis system: an overview. Open J. Appl. Sci. 7(6), 233–241 (2017)CrossRef

19.

Jalin, A.F., Jayakumari, J.: A Robust Tamil text to speech synthesizer using support vector machine (SVM). In: Advances in Communication Systems and Networks, pp. 809–819. Springer, Singapore (2020)

20.

Kinoshita, Y., Hirakawa, R., Kawano, H., Nakashi, K., Nakatoh, Y.: Speech enhancement system using SVM for train announcement. In: 2021 IEEE International Conference on Consumer Electronics (ICCE), pp. 1–3 (2021)

21.

Kumari, R., Dev, A., Kumar, A.: An efficient adaptive artificial neural network based text to speech synthesizer for Hindi language. Multimedia Tools Appl. 80(2), 24669–24695 (2021)CrossRef

22.

Liu, R., Sisman, B., Li, H.: Graphspeech: syntax-aware graph attention network for neural speech synthesis. In: ICASSP 2021–2021 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 6059–6063 (2021)

23.

Ramani, B., Jeeva, M.A., Vijayalakshmi, P., Nagarajan, T.: A multi-level GMM-based cross-lingual voice conversion using language-specific mixture weights for polyglot synthesis. Circuits Syst. Sign. Process. 35(4), 1283–1311 (2016)MathSciNetCrossRef

24.

Popov, V., Kudinov, M., Sadekova, T.: Gaussian LPCNet for multisample speech synthesis. In: CASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6204–6208 (2020)

25.

Zhou, S., Jia, J., Zhang, L., Wang, Y., Chen, W., Meng, F., Yu, F., Shen, J.: Inferring emphasis for real voice data: an attentive multimodal neural network approach. In: Ro, Y.M., Cheng, W.-H., Kim, J., Chu, W.-T., Cui, P., Choi, J.-W., Hu, M.-C., De Neve, W. (eds.) MMM 2020. LNCS, vol. 11962, pp. 52–62. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-37734-2_5CrossRef

26.

Kaliyev, A., Zeno, B., Rybin, S.V., Matveev, Y.N., Lyakso, E.: GAN acoustic model for Kazakh speech synthesis. Int. J. Speech Technol. 24, 729–735 (2021)CrossRef

27.

Inoue, K., Hara, S., Abe, M., Hojo, N., Ijima, Y.: Model architectures to extrapolate emotional expressions in DNN-based text-to-speech. Speech Commun. 126, 35–43 (2021)CrossRef

28.

Zangar, I., Mnasri, Z., Colotte, V., Jouvet, D.: Duration modelling and evaluation for Arabic statistical parametric speech synthesis. Multimedia Tools Appl. 80(6), 8331–8353 (2021)CrossRef

29.

Lorenzo-Trueba, J., Henter, G.E., Takahashi, S., Yamagishi, J., Morino, Y., Ochiai, Y.: Investigating different representations for modeling multiple emotions in DNN-based speech synthesis. In: 3rd International Workshop on The Affective Social Multimedia Computing (2017)

30.

Reddy, R., Sreenivasa, V., Rao, K.: Prosody modeling for syllable based text-to-speech synthesis using feedforward neural networks. Neurocomputing 171, 1323–1334 (2016)CrossRef

31.

Maeno, Y., Nose, T., Kobayashi, T., Koriyama, T., Ijima, Y., Nakajima, H., Mizuno, H., Yoshioka, O.: Prosodic variation enhancement using unsupervised context labeling for HMM-based expressive speech synthesis. Speech Commun. 57, 144–154 (2014)CrossRef

32.

Houidhek, A., Colotte, V., Mnasri, Z., Jouvet, D.: Evaluation of speech unit modelling for HMM-based speech synthesis for Arabic. Int. J. Speech Technol. 21(4), 895–906 (2018)CrossRef

33.

Chen, C.-H., Wu, Y.C., Huang, S.-L., Lin, J.-F.: Candidate expansion and prosody adjustment for natural speech synthesis using a small corpus. IEEE/ACM Trans. Audio Speech Lang. Process. 24(6), 1052–1065 (2016)CrossRef

34.

Karhila, R., Remes, U., Kurimo, M.: Noise in HMM-based speech synthesis adaptation: analysis, evaluation methods and experiments. IEEE J. Sel. Top. Sign. Process. 8(5), 285–295 (2014)CrossRef

35.

He, M., Yang, J., He, L., Soong, F.K.: Multilingual Byte2Speech Models for Scalable Low-resource Speech Synthesis (2021). arXiv preprint arXiv:2103.03541

36.

Yang, M., Ding, S., Chen, T., Wang, T., Wang, Z.: Towards Lifelong Learning of Multilingual Text-To-Speech Synthesis (2021). arXiv preprint arXiv:2110.04482

37.

De Korte, M., Kim, J., Klabbers, E.: Efficient neural speech synthesis for low-resource languages through multilingual modelling (2020). arXiv preprint arXiv:2008.09659

Titel: A Review on Speech Synthesis Based on Machine Learning
verfasst von: Ruchika Kumari
Amita Dev
Ashwni Kumar
Verlag: Springer International Publishing
Buch: Artificial Intelligence and Speech Technology
Print ISBN: 978-3-030-95710-0

Electronic ISBN: 978-3-030-95711-7

Copyright-Jahr: 2022
DOI: https://doi.org/10.1007/978-3-030-95711-7_3

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner