Weitere Artikel dieser Ausgabe durch Wischen aufrufen
Text-to-speech system (TTS), known also as speech synthesizer, is one of the important technology in the last years due to the expanding field of applications. Several works on speech synthesizer have been made on English and French, whereas many other languages, including Arabic, have been recently taken into consideration. The area of Arabic speech synthesis has not sufficient progress and it is still in its first stage with a low speech quality. In fact, speech synthesis systems face several problems (e.g. speech quality, articulatory effect, etc.). Different methods were proposed to solve these issues, such as the use of large and different unit sizes. This method is mainly implemented with the concatenative approach to improve the speech quality and several works have proved its effectiveness. This paper presents an efficient Arabic TTS system based on statistical parametric approach and non-uniform units speech synthesis. Our system includes a diacritization engine. Modern Arabic text is written without mention the vowels, called also diacritic marks. Unfortunately, these marks are very important to define the right pronunciation of the text which explains the incorporation of the diacritization engine to our system. In this work, we propose a simple approach based on deep neural networks. Deep neural networks are trained to directly predict the diacritic marks and to predict the spectral and prosodic parameters. Furthermore, we propose a new simple stacked neural network approach to improve the accuracy of the acoustic models. Experimental results show that our diacritization system allows the generation of full diacritized text with high precision and our synthesis system produces high-quality speech.
Bitte loggen Sie sich ein, um Zugang zu diesem Inhalt zu erhalten
Sie möchten Zugang zu diesem Inhalt erhalten? Dann informieren Sie sich jetzt über unsere Produkte:
Al-Said, G., & Abdallah, M. (2009). An Arabic text-to-speech system based on artificial neural networks. Journal of Computer Science, 5, 207–213. CrossRef
Alghamdi, M., Zeeshan, M., & Hazim, A. (2010). Automatic restoration of Arabic diacritics: A simple, purely statistical approach. The Arabian Journal for Science and Engineering, 35, 125–135.
Attia, M. (2005). Theory and implementation of a large-scale Arabic phonetic transcriptor, and applications. PhD thesis, Department of Electronics and Electrical Communications, Faculty of Engineering, Cairo.
Badrashiny, M, (2009). Automatic diacritizer for Arabic texts. PhD thesis, University of Cairo, Cairo
Ben Sassi, S., Braham, R., & Belghith, A. (2001). Neural speech synthesis system for Arabic language using CELP algorithm. In ACS/IEEE International Conference on Computer Systems and Applications (pp. 119–121)
Ciresan, D., Meier, U., Masci, J., Gambardella, L., & Schmidhuber, J. (2011). High-performance neural networks for visual object classification. Computing Research Repository abs/1102.0183:1–11
Elshafei, M., Almuhtasib, H., & Alghamdi, M. (2006). Machine generation of Arabic diacritical marks. In The 2006 World Congress in Computer Science Computer Engineering, and Applied Computing, pp. 128–133.
Fares, T., Khalil, A., & Hegazy, A. (2008). Usage of the HMM-based speech synthesis for intelligent Arabic voice. In International Conference on Computers and Their Applications (pp. 93–98).
Hamad, M., & Hussain, M. (2011). Arabic text-to-speech synthesizer. In IEEE Student Conference on Research and Development (pp. 409–414).
Harrat, S., Meftouh, K., Abbas, M., & Smaili, K. (2014). Grapheme to phoneme conversion: An Arabic dialect case. In ISCA Tutorial and Research Workshop on Non Linear Speech Processing (pp. 1–6).
Imai, S., Sumita, K., & Furuichi, C. (2007). Investigating an Arabic text to speech system based on diphone concatenation. International Journal of Intelligent Computing and Information Sciences, 7, 49–69.
Kantabutra, V. (2006). Towards reliable convergence in the training of neural networks—the streamlined glide algorithm and the LM Glide algorithm. In International Conference on Machine Learning: Models, Technologies and Applications (pp. 80–87).
Khalil, K., & Adnan, C. (2013). Arabic HMM-based speech synthesis. In International Conference on Electrical Engineering and Software Applications (pp. 1–5).
Khorsheed, M. (2012). A HMM-based system to diacritize Arabic text. Journal of Software Engineering and Applications, 5, 124–127. CrossRef
Kominek, J., Schultz., T., & Black, A. (2008). Synthesizer voice quality of new languages calibrated with mean Mel Cepstral Distortion. In Workshop on Spoken Language Technologies for Under-Resourced Languages (pp. 1–6).
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Image Net classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems (pp. 1097–1105).
Martin, K., Grezl, F., Hannemann, M., Vesely, K., & Cernocky, J. (2013). BUT BABEL system for spontaneous Cantonese. In Interspeech (pp. 2589–2593).
Mnasri, Z., Boukadida, F., & Ellouze, N. (2010). F0 contour modeling for Arabic text-to-speech synthesis using Fujisaki parameters and neural networks. Signal Processing: An International Journal, 4, 352–369.
Raghavendra, E., Vijayaditya, P., & Prahallad, K. (2010). Speech synthesis using artificial neural networks. In National Conference on Communications (pp. 1–5).
Raitio, T., Suni, A., Yamagishi, J., Pulakka, H., Nurminen, J., & Vainio, M., et al. (2011). Hmm-based speech synthesis utilizing glottal inverse filtering. IEEE Transactions on Audio, Speech, and Language Processing, 19, 153–165. CrossRef
Rebai, I., & BenAyed, Y. (2013). Arabic text to speech synthesis based on neural networks for MFCC estimation. In International Conference on Artificial Intelligence (pp. 1–5).
Vinyals, O., Jia, Y., Deng, L., & Darrell, T. (2012). Learning with recursive perceptual representations. In 26th Annual Conference on Neural Information Processing Systems 2012 (pp. 2834–2842).
Yousif, A. (2004). Phonetization of Arabic: Rules and algorithms. Computer Speech and Language, 18, 339–373. CrossRef
Zen, H., Senior, A., & Schuster, M. (2013). Statistical parametric speech synthesis using deep neural networks. In International Conference on Acoustics, Speech, and Signal Processing (pp. 7962–7966).
Zitouni, I., & Sarikaya, R. (2009). Arabic diacritic restoration approach based on maximum entropy models. Computer Speech and Language, 23, 257–276. CrossRef
- Arabic speech synthesis and diacritic recognition
- Springer US