nach oben

International Journal of Speech Technology

Erschienen in:

01.06.2016

What we have and what is needed, how to evaluate Arabic Speech Synthesizer?

verfasst von: Iyad Abu Doush, Faisal Alkhatib, Abed Al Raoof Bsoul

Erschienen in: International Journal of Speech Technology | Ausgabe 2/2016

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Arabic language is one of six United Nations official languages. Arabic language processing, in particular speech synthesis, is a challenging task due to the inherent complexity of the language text and characters and because each letter may have up to seven different sounds. In this paper, we provide subjective and objective evaluation for six different speech synthesizer applications available on the Internet for Arabic language namely: Acapela, ISpeech, Arabi, Sakhr, Google, and Nuance. In the case of subjective evaluation the authors performed four intelligibility tests: Diagnostic Rhyme, Modified Rhyme, Phonetically Confusable Sentences. The fourth test is proposed by the authors, Automatic Diacritization Intelligibility (ADI) which is used to test the intelligibility of the speech engine in predicting the diacritization mark according to the word context in the statement. Another two tests were performed to evaluate other features of the speech engines are: first, Arabic Text with All Sounds (ATAS) test which is used to evaluate different features when the speech engine reads Arabic text that contains all sounds for different Arabic letters. Second, Best/Worst Pleasant Voice this test is proposed by the authors to determine the best and worst speech engine in terms of the voice pleasantness. The other type of evaluation conducted is objective evaluation we evaluate the output of the six systems objectively and compare the results with the subjective evaluations performed. Such comparison is achieved by computing some objective metrics from the signals of both the generated sound by the systems and a reference one (i.e., the same text is spoken by a human). Two types of measurements are used as the objective metrics; signal to noise variation (segmented SNR) and a linear predictive (LP-based) measure. The originality of the evaluation is that it is based on using an Arabic text (diacritized and non-diacritized) containing all sounds of Arabic letters. Another novelty is that we introduced two tests ADI and ATAS tests for Arabic speech synthesizers evaluation. The result from subject users are provided to measure clearness/naturalness, speed, sound quality, pronunciation, clearness, stress/intonation, pronunciation errors, intelligibility, and pleasantness. In addition, results from experts are presented to measure the articulation of each sound, number of not pronounced words, and the speed of reading. The obtained results reveal the need to have speech synthesizers for Arabic language that considers diacritization to enhance the performance of the system. It points also to the importance of having an accurate automatic diacritization system that generates a diacritized text to be synthesized. The results show the significance of having a human similar voice for the speech synthesizer. We proposed a set of recommendations for improving Arabic speech synthesizers.

Vorheriger Artikel Speaker identification based on state space model

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Nur mit Berechtigung zugänglich

Abdel-Hamid, O., Abdou, S. M., & Rashwan, M. (2006). Improving arabic hmm based speech synthesis quality. INTERSPEECH.

Acapela speech synthesizer. (2014). World Wide Web electronic publication. http://www.acapela-group.com/text-to-speech-interactive-demo.html.

Ahmad, J. (2007). Optical character recognition system for arabic text using cursive multi-directional approach. Journal of Computer Science, 3, 549–555.CrossRef

Ali, M. E. M., Al-Muhtaseb, H., & Al-Ghamdi, M. (2007). Automatic segmentation of arabic speech. In Workshop on information technology and islamic sciences, Imam Mohammad Ben Saud University, Riyadh, March.

AlKhateeb, J., H. Ren, J., Ipson, S., & Jiang, J. (2008). knowledge-based baseline detection and optimal thresholding for words segmentation in efficient preprocessing of handwritten arabic text. In Fifth international conference on information technology: New generations (pp. 1158–1159).

Al-Saud, N. B., & Al-Khalifa, H. S. (2012). An initial comparative study of arabic speech synthesis engines in ios and android: Proceedings of the 14th international conference on information integration and web-based applications & services, IIWAS ’12 (pp. 411–414). New York, NY: ACM.

Al-Wabil, A., Al-Khalifa, H., & Al-Saleh, W. (2007). Arabic text-to-speech synthesis: A preliminary evaluation. In C. Montgomerie & J. Seale (Eds.), Proceedings of world conference on educational multimedia, hypermedia and telecommunications 2007 (pp. 4423–4430). Vancouver: AACE.

Alyazeed, M. A., Al-Ghoneimy, M. R., & Mohammad, M. (1989). Comparison of syllable and sub-syllable methods for speech synthesis. In Proceedings of the second conference on arabic computational linguistics, Kuwait.

Arabi, automatic arabic text to speech system. (2014). World Wide Web electronic publication. http://www.arabinlp.com/Systems/Demo_SystemsTTS.php?pageLang=en.

Assaf, M. (2005). A prototype of an arabic diphone speech synthesizer in festival. Master’s thesis, Uppsala University.

Atallah, A. S., & Omar, K. (2008). Methods of arabic language baseline detection the state of art. International Journal of Computer Science and Network Security (IJCSNS), 8, 137–143.

Bennett, C. L. (2005). Large scale evaluation of corpus-based synthesizers:results and lessons from the blizzard challenge 2005. In Proceedings of interspeech 2005, Lisbon.

Black, A. W., & Tokuda, K. (2005). The blizzard challenge 2005: Evaluating corpus-based speech synthesis on common datasets. In Proceedings of interspeech 2005 (pp. 77–80). Lisbon.

Borovikov, E., & Zavorin, I. (2012). A multi-stage approach to arabic document analysis. In V. Margner & H. El Abed (Eds.), Guide to OCR for Arabic scripts (pp. 55–78). London: Springer.CrossRef

Campbell, N. (2007). Evaluation of speech synthesis. In L. Dybkjaer & H. Minker (Eds.), Evaluation of text and speech systems. From reading machines to talking machines. Dordrecht: Springer.

Chabchoub, A., & Cherif, A. (2011). An automatic mbrola tool for high quality arabic speech synthesis. International Journal of Computer Applications, 36(1):1–5. Published by Foundation of Computer Science, New York, USA.

Clark, R. A. J., Podsiadso, M., Fraser, M., Mayo, C., & King, S. (2007). Statistical analysis of the blizzard challenge 2007 listening test results. In Proceedings of blizzard workshop (in Proc. SSW6), Bonn.

Damper, R., Marchand, Y., Adamson, M., & Gustafson, K. (1999). Evaluating the pronunciation component of text-to-speech systems for english: A performance comparison of different approaches. Computer Speech and Language, 13(2), 155–176.CrossRef

Dutoit, T., Pagel, V., Pierret, N., Bataille, F., & Van der Vrecken, O. (1996). The mbrola project: towards a set of high quality speech synthesizers free of use for non commercial purposes. In Proceedings of fourth international conference on spoken language. ICSLP 96 (vol. 3, pp. 1393–1396).

El-Imam, Y. (1989). An unrestricted vocabulary arabic speech synthesis system. IEEE Transactions on Acoustics, Speech and Signal Processing, 37(12), 1829–1845.CrossRef

Elshafei, M. (1991). Toward an arabic text-to-speech system. Arabian Journal for Science and Engineering, 16(4B), 565–583.MathSciNet

Elshafei, M., Al-Muhtaseb, H., & Al-Ghamdi, M. (2002). Techniques for high quality arabic speech synthesis. Information Sciences, 140(34), 255–267.CrossRefMATH

Fraser, M., & King, S. (2007). The blizzard challenge 2007. In Proceedings blizzard workshop (in Proc. SSW6), Bonn.

Google translate. (2014). World Wide Web electronic publication. http://translate.google.com/.

Hamad, M., & Hussain, M. (2011). Arabic text-to-speech synthesizer. In The 2011 IEEE student conference on research and development (SCOReD) (pp. 409–414). IEEE.

Hansen, J. H., & Pellom, B. L. (1998). An effective quality evaluation protocol for speech enhancement algorithms. ICSLP, 7, 2819–2822. (Citeseer).

Hirst, D., & Cristo, A. D. (1998). Intonation systems: A survey of twenty languages (1st ed.). Cambridge: Cambridge University Press.

Hon, H., Acero, A., Huang, X., Liu, J., & Plumpe, M. (1998). Automatic generation of synthesis units for trainable text-to-speech systems. In Proceedings of the 1998 IEEE international conference on acoustics, speech and signal processing, 1998 (vol. 1, pp. 293–296). IEEE.

Hunt, A. J., & Black, A. W. (1996). Unit selection in a concatenative speech synthesis system using a large speech database. In Proceedings of 1996 IEEE international conference on acoustics, speech, and signal processing, 1996. ICASSP-96 (vol. 1, pp. 373–376). IEEE.

Indumathi, A., & Chandra, E. (2012). Survey on speech synthesis. Signal Processing: An International Journal (SPIJ), 6(5), 140.

Jayousi, A. Q. M. A. (2007). Arabic text-to-speech synthesizer.

Khalifa, O., Obaid, M., Naji, A., & Daoud, J. I. (2011). A rule-based arabic text-to-speech system based on hybrid synthesis technique. Australian Journal of Basic and Applied Sciences, 5(6), 342–354.

Khalil, K., & Adnan, C. (2013). Arabic hmm-based speech synthesis. In International conference on electrical engineering and software applications (ICEESA), 2013 (pp. 1–5).

Klatt, D. H. (1987). Review of text-to-speech conversion for english. Journal of the Acoustical Society of America, 82(3), 737–793.CrossRef

Kondo, K. (2012). Subjective quality measurement of speech. Berlin: Springer.CrossRef

Leila, C., Maamar, K., & Salim, C. (2011). Combining neural networks for arabic handwriting recognition. In 10th international symposium on programming and systems (ISPS), 2011 (pp. 74–79).

Liana, M., & Venu, G. (2006). Offline arabic handwriting recognition: A survey. IEEE, Transactions on Pattern Analysis and Machine Intelligence, 28, 712–724.CrossRef

Nuance vocalizer. (2014). World Wide Web electronic publication. http://enterprisecontent.nuance.com/vocalizer5-network-demo/index.html.

Rashad, M. Z., El-Bakry, H. M., & Isma’il, I. R. (2010). Diphone speech synthesis system for arabic using mary tts. International Journal of Computer Science and Information Technology (IJCSIT), 2(4), 18–26.CrossRef

Rashwan, M. A., Fakhr, M. W., Attia, M., & El-Mahallawy, M. S. (2007). Arabic ocr system analogous to hmm-based asr systems implementation and evaluation. Journal of Engineering and Applied Science (JEAS), 54(6), 653.

Sakhr speech synthesizer. (2014). World Wide Web electronic publication. http://www.sakhr.com/tts/TTS_Demo.aspx.

Schrder, M., & Trouvain, J. (2003). The german text-to-speech synthesis system mary: A tool for research, development and teaching. International Journal of Speech Technology, 6(4), 365–377.CrossRef

Shaker, N., Abou-Zleikha, M., & Al Dakkak, O. (2008). Ssml for arabic language. In Text, Speech and Dialogue, pp. 657–664. Springer.

Sluijter, A., Bosgoed, E., Kerkhoff, J., Meier, E., Rietveld, T., & Swerts, M., et al. (1998). Evaluation of speech synthesis systems for dutch in telecommunication applications. Jenolan Caves: In Proceedings of the Third ESCA/COCOSDA International Workshop on Speech Synthesis.

Speechworks solution division from ScanSoft, Peabody, MA (2004). White paper—Assessing text-to-speech system quality. Technical report.

Ssml. (2005). Ssml 1.0 say-as attribute values. Working group note 26 may, W3C.

Text to speech by ispeech. (2014). World Wide Web electronic publication. http://www.ispeech.org/text.to.speech.

Tratz, S. C. (2014). Accurate arabic script language/dialect classification. DTIC Document: Technical report.

Youssef, A., & Emam, O. (2004). An arabic tts system based on the ibm trainable speech synthesizer. JEP-TALN: Le traitement automatique de l’arabe.

Zeki, A. (2005). The segmentation problem on arabic character recognition the state of the art. 1st international conference on information and communication technology (ICICT) (pp. 48–57). Pakistan: Karachi.

Titel: What we have and what is needed, how to evaluate Arabic Speech Synthesizer?
verfasst von: Iyad Abu Doush
Faisal Alkhatib
Abed Al Raoof Bsoul
Publikationsdatum: 01.06.2016
Verlag: Springer US
Erschienen in: International Journal of Speech Technology / Ausgabe 2/2016
Print ISSN: 1381-2416
Elektronische ISSN: 1572-8110
DOI: https://doi.org/10.1007/s10772-015-9304-6

Neuer Inhalt

Bildnachweise

VDI-Icon, Profil Icon, inhalt2, Springer Professional Modul/© Springer Fachmedien Wiesbaden GmbH, Nachhaltigkeitsaward Key Visual/© Cometis AG/Global ESG Monitor | Daniel Rupp | Generiert mit KI, Search Icon, Banner Hanser, Interview Entropie Bild 1/© Bernhard Weßling, Joerg Schweinsberg/© Datacore Software, Smart Factory Symbolbild/© TensorSpark | Generated with AI | Getty Images, Zeitschrift Wissensmanagement Cover, PatentFit-Logo/© Springer Fachmedien Wiesbaden GmbH, Sustainibility Finance/© Robert Kneschke / stock.adobe.com / Springer Fachmedien Wiesbaden GmbH, Zukunftswerkstatt Sales Excellence 2024/© AndreyPopov / Getty Images / iStock, 2023_Antrieb/© supervisuell

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Weitere Artikel der Ausgabe 2/2016

Production of referring expressions in Arabic

Note from the Guest Editors: Special issue on Arabic Natural Language Processing and Speech Recognition: A study of algorithms, resources, tools, techniques, and commercial applications

A frame-based approach for capturing semantics from Arabic text for text-to-sign language MT

Comparative evaluation of tools for Arabic corpora search and analysis

Impact of morphological analysis and a large training corpus on the performances of Arabic diacritization

Bidirectional HMM-based Arabic POS tagging

Neuer Inhalt

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.