nach oben

International Journal of Speech Technology

Erschienen in:

18.05.2016

Arabic speech synthesis and diacritic recognition

verfasst von: Ilyes Rebai, Yassine BenAyed

Erschienen in: International Journal of Speech Technology | Ausgabe 3/2016

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Text-to-speech system (TTS), known also as speech synthesizer, is one of the important technology in the last years due to the expanding field of applications. Several works on speech synthesizer have been made on English and French, whereas many other languages, including Arabic, have been recently taken into consideration. The area of Arabic speech synthesis has not sufficient progress and it is still in its first stage with a low speech quality. In fact, speech synthesis systems face several problems (e.g. speech quality, articulatory effect, etc.). Different methods were proposed to solve these issues, such as the use of large and different unit sizes. This method is mainly implemented with the concatenative approach to improve the speech quality and several works have proved its effectiveness. This paper presents an efficient Arabic TTS system based on statistical parametric approach and non-uniform units speech synthesis. Our system includes a diacritization engine. Modern Arabic text is written without mention the vowels, called also diacritic marks. Unfortunately, these marks are very important to define the right pronunciation of the text which explains the incorporation of the diacritization engine to our system. In this work, we propose a simple approach based on deep neural networks. Deep neural networks are trained to directly predict the diacritic marks and to predict the spectral and prosodic parameters. Furthermore, we propose a new simple stacked neural network approach to improve the accuracy of the acoustic models. Experimental results show that our diacritization system allows the generation of full diacritized text with high precision and our synthesis system produces high-quality speech.

Vorheriger Artikel Performance of speaker localization using microphone array

Nächster Artikel Arabic phonemes recognition using hybrid LVQ/HMM model for continuous speech recognition

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Although the softmax activation function is popular in DNN-based classification, our preliminary experiments showed that the DNN with the tangent sigmoid activation function at the output layer consistently outperformed those with the softmax one.

Al-Said, G., & Abdallah, M. (2009). An Arabic text-to-speech system based on artificial neural networks. Journal of Computer Science, 5, 207–213.CrossRef

Alghamdi, M., Zeeshan, M., & Hazim, A. (2010). Automatic restoration of Arabic diacritics: A simple, purely statistical approach. The Arabian Journal for Science and Engineering, 35, 125–135.

Attia, M. (2005). Theory and implementation of a large-scale Arabic phonetic transcriptor, and applications. PhD thesis, Department of Electronics and Electrical Communications, Faculty of Engineering, Cairo.

Badrashiny, M, (2009). Automatic diacritizer for Arabic texts. PhD thesis, University of Cairo, Cairo

Ben Sassi, S., Braham, R., & Belghith, A. (2001). Neural speech synthesis system for Arabic language using CELP algorithm. In ACS/IEEE International Conference on Computer Systems and Applications (pp. 119–121)

Chouireb, F., & Guerti, M. (2008). Towards a high quality Arabic speech synthesis system based on neural networks and residual excited vocal tract model. Signal, Image and Video Processing, 2, 73–87.CrossRefMATH

Ciresan, D., Meier, U., Masci, J., Gambardella, L., & Schmidhuber, J. (2011). High-performance neural networks for visual object classification. Computing Research Repository abs/1102.0183:1–11

Elshafei, M., Al-Muhtaseb, H., & Al-Ghamdi, M. (2002). Techniques for high quality Arabic speech synthesis. Information Sciences, 140, 255–267.CrossRefMATH

Elshafei, M., Almuhtasib, H., & Alghamdi, M. (2006). Machine generation of Arabic diacritical marks. In The 2006 World Congress in Computer Science Computer Engineering, and Applied Computing, pp. 128–133.

Fares, T., Khalil, A., & Hegazy, A. (2008). Usage of the HMM-based speech synthesis for intelligent Arabic voice. In International Conference on Computers and Their Applications (pp. 93–98).

Forti, M., & Nistri, P. (2003). Global convergence of neural networks with discontinuous neuron activations. IEEE Transactions on Circuits and Systems I: Fundamental Theory and Applications, 50, 1421–1435.MathSciNetCrossRef

Hamad, M., & Hussain, M. (2011). Arabic text-to-speech synthesizer. In IEEE Student Conference on Research and Development (pp. 409–414).

Harrat, S., Meftouh, K., Abbas, M., & Smaili, K. (2014). Grapheme to phoneme conversion: An Arabic dialect case. In ISCA Tutorial and Research Workshop on Non Linear Speech Processing (pp. 1–6).

Imai, S., Sumita, K., & Furuichi, C. (2007). Investigating an Arabic text to speech system based on diphone concatenation. International Journal of Intelligent Computing and Information Sciences, 7, 49–69.

Kantabutra, V. (2006). Towards reliable convergence in the training of neural networks—the streamlined glide algorithm and the LM Glide algorithm. In International Conference on Machine Learning: Models, Technologies and Applications (pp. 80–87).

Khalil, K., & Adnan, C. (2013). Arabic HMM-based speech synthesis. In International Conference on Electrical Engineering and Software Applications (pp. 1–5).

Khorsheed, M. (2012). A HMM-based system to diacritize Arabic text. Journal of Software Engineering and Applications, 5, 124–127.CrossRef

Kominek, J., Schultz., T., & Black, A. (2008). Synthesizer voice quality of new languages calibrated with mean Mel Cepstral Distortion. In Workshop on Spoken Language Technologies for Under-Resourced Languages (pp. 1–6).

Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Image Net classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems (pp. 1097–1105).

Martin, K., Grezl, F., Hannemann, M., Vesely, K., & Cernocky, J. (2013). BUT BABEL system for spontaneous Cantonese. In Interspeech (pp. 2589–2593).

Mnasri, Z., Boukadida, F., & Ellouze, N. (2010). F0 contour modeling for Arabic text-to-speech synthesis using Fujisaki parameters and neural networks. Signal Processing: An International Journal, 4, 352–369.

Raghavendra, E., Vijayaditya, P., & Prahallad, K. (2010). Speech synthesis using artificial neural networks. In National Conference on Communications (pp. 1–5).

Raitio, T., Suni, A., Yamagishi, J., Pulakka, H., Nurminen, J., & Vainio, M., et al. (2011). Hmm-based speech synthesis utilizing glottal inverse filtering. IEEE Transactions on Audio, Speech, and Language Processing, 19, 153–165.CrossRef

Rebai, I., & BenAyed, Y. (2013). Arabic text to speech synthesis based on neural networks for MFCC estimation. In International Conference on Artificial Intelligence (pp. 1–5).

Vinyals, O., Jia, Y., Deng, L., & Darrell, T. (2012). Learning with recursive perceptual representations. In 26th Annual Conference on Neural Information Processing Systems 2012 (pp. 2834–2842).

Yousif, A. (2004). Phonetization of Arabic: Rules and algorithms. Computer Speech and Language, 18, 339–373.CrossRef

Zen, H., Senior, A., & Schuster, M. (2013). Statistical parametric speech synthesis using deep neural networks. In International Conference on Acoustics, Speech, and Signal Processing (pp. 7962–7966).

Zitouni, I., & Sarikaya, R. (2009). Arabic diacritic restoration approach based on maximum entropy models. Computer Speech and Language, 23, 257–276.CrossRef

Titel: Arabic speech synthesis and diacritic recognition
verfasst von: Ilyes Rebai
Yassine BenAyed
Publikationsdatum: 18.05.2016
Verlag: Springer US
Erschienen in: International Journal of Speech Technology / Ausgabe 3/2016
Print ISSN: 1381-2416
Elektronische ISSN: 1572-8110
DOI: https://doi.org/10.1007/s10772-016-9342-8

Neuer Inhalt

Bildnachweise

VDI-Icon, Profil Icon, inhalt2, Springer Professional Modul/© Springer Fachmedien Wiesbaden GmbH, Nachhaltigkeitsaward Key Visual/© Cometis AG/Global ESG Monitor | Daniel Rupp | Generiert mit KI, Search Icon, Banner Hanser, Buchstaben, die aus einem Megaphon kommen/© MicroStockHub/Getty Images/iStock, Digitale Lieferkette/© zapp2photo / stock.adobe.com, Arbeitszeit/© granata68 / Fotolia, Zeitschrift Wissensmanagement Cover, PatentFit-Logo/© Springer Fachmedien Wiesbaden GmbH, Sustainibility Finance/© Robert Kneschke / stock.adobe.com / Springer Fachmedien Wiesbaden GmbH, Zukunftswerkstatt Sales Excellence 2024/© AndreyPopov / Getty Images / iStock, 2023_Antrieb/© supervisuell

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Weitere Artikel der Ausgabe 3/2016

Speech transmission with COFDM based on different discrete transforms

Performance of speaker identification using CSM and TM

Assessment of dysarthric speech using Elman back propagation network (recurrent network) for speech recognition

Improvements on self-adaptive voice activity detector for telephone data

Corpus based part-of-speech tagging

Wavelet energy based voice activity detection and adaptive thresholding for efficient speech coding

Neuer Inhalt

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.