Skip to main content
Erschienen in: International Journal of Speech Technology 2/2016

01.06.2016

What we have and what is needed, how to evaluate Arabic Speech Synthesizer?

verfasst von: Iyad Abu Doush, Faisal Alkhatib, Abed Al Raoof Bsoul

Erschienen in: International Journal of Speech Technology | Ausgabe 2/2016

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Arabic language is one of six United Nations official languages. Arabic language processing, in particular speech synthesis, is a challenging task due to the inherent complexity of the language text and characters and because each letter may have up to seven different sounds. In this paper, we provide subjective and objective evaluation for six different speech synthesizer applications available on the Internet for Arabic language namely: Acapela, ISpeech, Arabi, Sakhr, Google, and Nuance. In the case of subjective evaluation the authors performed four intelligibility tests: Diagnostic Rhyme, Modified Rhyme, Phonetically Confusable Sentences. The fourth test is proposed by the authors, Automatic Diacritization Intelligibility (ADI) which is used to test the intelligibility of the speech engine in predicting the diacritization mark according to the word context in the statement. Another two tests were performed to evaluate other features of the speech engines are: first, Arabic Text with All Sounds (ATAS) test which is used to evaluate different features when the speech engine reads Arabic text that contains all sounds for different Arabic letters. Second, Best/Worst Pleasant Voice this test is proposed by the authors to determine the best and worst speech engine in terms of the voice pleasantness. The other type of evaluation conducted is objective evaluation we evaluate the output of the six systems objectively and compare the results with the subjective evaluations performed. Such comparison is achieved by computing some objective metrics from the signals of both the generated sound by the systems and a reference one (i.e., the same text is spoken by a human). Two types of measurements are used as the objective metrics; signal to noise variation (segmented SNR) and a linear predictive (LP-based) measure. The originality of the evaluation is that it is based on using an Arabic text (diacritized and non-diacritized) containing all sounds of Arabic letters. Another novelty is that we introduced two tests ADI and ATAS tests for Arabic speech synthesizers evaluation. The result from subject users are provided to measure clearness/naturalness, speed, sound quality, pronunciation, clearness, stress/intonation, pronunciation errors, intelligibility, and pleasantness. In addition, results from experts are presented to measure the articulation of each sound, number of not pronounced words, and the speed of reading. The obtained results reveal the need to have speech synthesizers for Arabic language that considers diacritization to enhance the performance of the system. It points also to the importance of having an accurate automatic diacritization system that generates a diacritized text to be synthesized. The results show the significance of having a human similar voice for the speech synthesizer. We proposed a set of recommendations for improving Arabic speech synthesizers.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Literatur
Zurück zum Zitat Abdel-Hamid, O., Abdou, S. M., & Rashwan, M. (2006). Improving arabic hmm based speech synthesis quality. INTERSPEECH. Abdel-Hamid, O., Abdou, S. M., & Rashwan, M. (2006). Improving arabic hmm based speech synthesis quality. INTERSPEECH.
Zurück zum Zitat Ahmad, J. (2007). Optical character recognition system for arabic text using cursive multi-directional approach. Journal of Computer Science, 3, 549–555.CrossRef Ahmad, J. (2007). Optical character recognition system for arabic text using cursive multi-directional approach. Journal of Computer Science, 3, 549–555.CrossRef
Zurück zum Zitat Ali, M. E. M., Al-Muhtaseb, H., & Al-Ghamdi, M. (2007). Automatic segmentation of arabic speech. In Workshop on information technology and islamic sciences, Imam Mohammad Ben Saud University, Riyadh, March. Ali, M. E. M., Al-Muhtaseb, H., & Al-Ghamdi, M. (2007). Automatic segmentation of arabic speech. In Workshop on information technology and islamic sciences, Imam Mohammad Ben Saud University, Riyadh, March.
Zurück zum Zitat AlKhateeb, J., H. Ren, J., Ipson, S., & Jiang, J. (2008). knowledge-based baseline detection and optimal thresholding for words segmentation in efficient preprocessing of handwritten arabic text. In Fifth international conference on information technology: New generations (pp. 1158–1159). AlKhateeb, J., H. Ren, J., Ipson, S., & Jiang, J. (2008). knowledge-based baseline detection and optimal thresholding for words segmentation in efficient preprocessing of handwritten arabic text. In Fifth international conference on information technology: New generations (pp. 1158–1159).
Zurück zum Zitat Al-Saud, N. B., & Al-Khalifa, H. S. (2012). An initial comparative study of arabic speech synthesis engines in ios and android: Proceedings of the 14th international conference on information integration and web-based applications & services, IIWAS ’12 (pp. 411–414). New York, NY: ACM. Al-Saud, N. B., & Al-Khalifa, H. S. (2012). An initial comparative study of arabic speech synthesis engines in ios and android: Proceedings of the 14th international conference on information integration and web-based applications & services, IIWAS ’12 (pp. 411–414). New York, NY: ACM.
Zurück zum Zitat Al-Wabil, A., Al-Khalifa, H., & Al-Saleh, W. (2007). Arabic text-to-speech synthesis: A preliminary evaluation. In C. Montgomerie & J. Seale (Eds.), Proceedings of world conference on educational multimedia, hypermedia and telecommunications 2007 (pp. 4423–4430). Vancouver: AACE. Al-Wabil, A., Al-Khalifa, H., & Al-Saleh, W. (2007). Arabic text-to-speech synthesis: A preliminary evaluation. In C. Montgomerie & J. Seale (Eds.), Proceedings of world conference on educational multimedia, hypermedia and telecommunications 2007 (pp. 4423–4430). Vancouver: AACE.
Zurück zum Zitat Alyazeed, M. A., Al-Ghoneimy, M. R., & Mohammad, M. (1989). Comparison of syllable and sub-syllable methods for speech synthesis. In Proceedings of the second conference on arabic computational linguistics, Kuwait. Alyazeed, M. A., Al-Ghoneimy, M. R., & Mohammad, M. (1989). Comparison of syllable and sub-syllable methods for speech synthesis. In Proceedings of the second conference on arabic computational linguistics, Kuwait.
Zurück zum Zitat Assaf, M. (2005). A prototype of an arabic diphone speech synthesizer in festival. Master’s thesis, Uppsala University. Assaf, M. (2005). A prototype of an arabic diphone speech synthesizer in festival. Master’s thesis, Uppsala University.
Zurück zum Zitat Atallah, A. S., & Omar, K. (2008). Methods of arabic language baseline detection the state of art. International Journal of Computer Science and Network Security (IJCSNS), 8, 137–143. Atallah, A. S., & Omar, K. (2008). Methods of arabic language baseline detection the state of art. International Journal of Computer Science and Network Security (IJCSNS), 8, 137–143.
Zurück zum Zitat Bennett, C. L. (2005). Large scale evaluation of corpus-based synthesizers:results and lessons from the blizzard challenge 2005. In Proceedings of interspeech 2005, Lisbon. Bennett, C. L. (2005). Large scale evaluation of corpus-based synthesizers:results and lessons from the blizzard challenge 2005. In Proceedings of interspeech 2005, Lisbon.
Zurück zum Zitat Black, A. W., & Tokuda, K. (2005). The blizzard challenge 2005: Evaluating corpus-based speech synthesis on common datasets. In Proceedings of interspeech 2005 (pp. 77–80). Lisbon. Black, A. W., & Tokuda, K. (2005). The blizzard challenge 2005: Evaluating corpus-based speech synthesis on common datasets. In Proceedings of interspeech 2005 (pp. 77–80). Lisbon.
Zurück zum Zitat Borovikov, E., & Zavorin, I. (2012). A multi-stage approach to arabic document analysis. In V. Margner & H. El Abed (Eds.), Guide to OCR for Arabic scripts (pp. 55–78). London: Springer.CrossRef Borovikov, E., & Zavorin, I. (2012). A multi-stage approach to arabic document analysis. In V. Margner & H. El Abed (Eds.), Guide to OCR for Arabic scripts (pp. 55–78). London: Springer.CrossRef
Zurück zum Zitat Campbell, N. (2007). Evaluation of speech synthesis. In L. Dybkjaer & H. Minker (Eds.), Evaluation of text and speech systems. From reading machines to talking machines. Dordrecht: Springer. Campbell, N. (2007). Evaluation of speech synthesis. In L. Dybkjaer & H. Minker (Eds.), Evaluation of text and speech systems. From reading machines to talking machines. Dordrecht: Springer.
Zurück zum Zitat Chabchoub, A., & Cherif, A. (2011). An automatic mbrola tool for high quality arabic speech synthesis. International Journal of Computer Applications, 36(1):1–5. Published by Foundation of Computer Science, New York, USA. Chabchoub, A., & Cherif, A. (2011). An automatic mbrola tool for high quality arabic speech synthesis. International Journal of Computer Applications, 36(1):1–5. Published by Foundation of Computer Science, New York, USA.
Zurück zum Zitat Clark, R. A. J., Podsiadso, M., Fraser, M., Mayo, C., & King, S. (2007). Statistical analysis of the blizzard challenge 2007 listening test results. In Proceedings of blizzard workshop (in Proc. SSW6), Bonn. Clark, R. A. J., Podsiadso, M., Fraser, M., Mayo, C., & King, S. (2007). Statistical analysis of the blizzard challenge 2007 listening test results. In Proceedings of blizzard workshop (in Proc. SSW6), Bonn.
Zurück zum Zitat Damper, R., Marchand, Y., Adamson, M., & Gustafson, K. (1999). Evaluating the pronunciation component of text-to-speech systems for english: A performance comparison of different approaches. Computer Speech and Language, 13(2), 155–176.CrossRef Damper, R., Marchand, Y., Adamson, M., & Gustafson, K. (1999). Evaluating the pronunciation component of text-to-speech systems for english: A performance comparison of different approaches. Computer Speech and Language, 13(2), 155–176.CrossRef
Zurück zum Zitat Dutoit, T., Pagel, V., Pierret, N., Bataille, F., & Van der Vrecken, O. (1996). The mbrola project: towards a set of high quality speech synthesizers free of use for non commercial purposes. In Proceedings of fourth international conference on spoken language. ICSLP 96 (vol. 3, pp. 1393–1396). Dutoit, T., Pagel, V., Pierret, N., Bataille, F., & Van der Vrecken, O. (1996). The mbrola project: towards a set of high quality speech synthesizers free of use for non commercial purposes. In Proceedings of fourth international conference on spoken language. ICSLP 96 (vol. 3, pp. 1393–1396).
Zurück zum Zitat El-Imam, Y. (1989). An unrestricted vocabulary arabic speech synthesis system. IEEE Transactions on Acoustics, Speech and Signal Processing, 37(12), 1829–1845.CrossRef El-Imam, Y. (1989). An unrestricted vocabulary arabic speech synthesis system. IEEE Transactions on Acoustics, Speech and Signal Processing, 37(12), 1829–1845.CrossRef
Zurück zum Zitat Elshafei, M. (1991). Toward an arabic text-to-speech system. Arabian Journal for Science and Engineering, 16(4B), 565–583.MathSciNet Elshafei, M. (1991). Toward an arabic text-to-speech system. Arabian Journal for Science and Engineering, 16(4B), 565–583.MathSciNet
Zurück zum Zitat Elshafei, M., Al-Muhtaseb, H., & Al-Ghamdi, M. (2002). Techniques for high quality arabic speech synthesis. Information Sciences, 140(34), 255–267.CrossRefMATH Elshafei, M., Al-Muhtaseb, H., & Al-Ghamdi, M. (2002). Techniques for high quality arabic speech synthesis. Information Sciences, 140(34), 255–267.CrossRefMATH
Zurück zum Zitat Fraser, M., & King, S. (2007). The blizzard challenge 2007. In Proceedings blizzard workshop (in Proc. SSW6), Bonn. Fraser, M., & King, S. (2007). The blizzard challenge 2007. In Proceedings blizzard workshop (in Proc. SSW6), Bonn.
Zurück zum Zitat Hamad, M., & Hussain, M. (2011). Arabic text-to-speech synthesizer. In The 2011 IEEE student conference on research and development (SCOReD) (pp. 409–414). IEEE. Hamad, M., & Hussain, M. (2011). Arabic text-to-speech synthesizer. In The 2011 IEEE student conference on research and development (SCOReD) (pp. 409–414). IEEE.
Zurück zum Zitat Hansen, J. H., & Pellom, B. L. (1998). An effective quality evaluation protocol for speech enhancement algorithms. ICSLP, 7, 2819–2822. (Citeseer). Hansen, J. H., & Pellom, B. L. (1998). An effective quality evaluation protocol for speech enhancement algorithms. ICSLP, 7, 2819–2822. (Citeseer).
Zurück zum Zitat Hirst, D., & Cristo, A. D. (1998). Intonation systems: A survey of twenty languages (1st ed.). Cambridge: Cambridge University Press. Hirst, D., & Cristo, A. D. (1998). Intonation systems: A survey of twenty languages (1st ed.). Cambridge: Cambridge University Press.
Zurück zum Zitat Hon, H., Acero, A., Huang, X., Liu, J., & Plumpe, M. (1998). Automatic generation of synthesis units for trainable text-to-speech systems. In Proceedings of the 1998 IEEE international conference on acoustics, speech and signal processing, 1998 (vol. 1, pp. 293–296). IEEE. Hon, H., Acero, A., Huang, X., Liu, J., & Plumpe, M. (1998). Automatic generation of synthesis units for trainable text-to-speech systems. In Proceedings of the 1998 IEEE international conference on acoustics, speech and signal processing, 1998 (vol. 1, pp. 293–296). IEEE.
Zurück zum Zitat Hunt, A. J., & Black, A. W. (1996). Unit selection in a concatenative speech synthesis system using a large speech database. In Proceedings of 1996 IEEE international conference on acoustics, speech, and signal processing, 1996. ICASSP-96 (vol. 1, pp. 373–376). IEEE. Hunt, A. J., & Black, A. W. (1996). Unit selection in a concatenative speech synthesis system using a large speech database. In Proceedings of 1996 IEEE international conference on acoustics, speech, and signal processing, 1996. ICASSP-96 (vol. 1, pp. 373–376). IEEE.
Zurück zum Zitat Indumathi, A., & Chandra, E. (2012). Survey on speech synthesis. Signal Processing: An International Journal (SPIJ), 6(5), 140. Indumathi, A., & Chandra, E. (2012). Survey on speech synthesis. Signal Processing: An International Journal (SPIJ), 6(5), 140.
Zurück zum Zitat Jayousi, A. Q. M. A. (2007). Arabic text-to-speech synthesizer. Jayousi, A. Q. M. A. (2007). Arabic text-to-speech synthesizer.
Zurück zum Zitat Khalifa, O., Obaid, M., Naji, A., & Daoud, J. I. (2011). A rule-based arabic text-to-speech system based on hybrid synthesis technique. Australian Journal of Basic and Applied Sciences, 5(6), 342–354. Khalifa, O., Obaid, M., Naji, A., & Daoud, J. I. (2011). A rule-based arabic text-to-speech system based on hybrid synthesis technique. Australian Journal of Basic and Applied Sciences, 5(6), 342–354.
Zurück zum Zitat Khalil, K., & Adnan, C. (2013). Arabic hmm-based speech synthesis. In International conference on electrical engineering and software applications (ICEESA), 2013 (pp. 1–5). Khalil, K., & Adnan, C. (2013). Arabic hmm-based speech synthesis. In International conference on electrical engineering and software applications (ICEESA), 2013 (pp. 1–5).
Zurück zum Zitat Klatt, D. H. (1987). Review of text-to-speech conversion for english. Journal of the Acoustical Society of America, 82(3), 737–793.CrossRef Klatt, D. H. (1987). Review of text-to-speech conversion for english. Journal of the Acoustical Society of America, 82(3), 737–793.CrossRef
Zurück zum Zitat Kondo, K. (2012). Subjective quality measurement of speech. Berlin: Springer.CrossRef Kondo, K. (2012). Subjective quality measurement of speech. Berlin: Springer.CrossRef
Zurück zum Zitat Leila, C., Maamar, K., & Salim, C. (2011). Combining neural networks for arabic handwriting recognition. In 10th international symposium on programming and systems (ISPS), 2011 (pp. 74–79). Leila, C., Maamar, K., & Salim, C. (2011). Combining neural networks for arabic handwriting recognition. In 10th international symposium on programming and systems (ISPS), 2011 (pp. 74–79).
Zurück zum Zitat Liana, M., & Venu, G. (2006). Offline arabic handwriting recognition: A survey. IEEE, Transactions on Pattern Analysis and Machine Intelligence, 28, 712–724.CrossRef Liana, M., & Venu, G. (2006). Offline arabic handwriting recognition: A survey. IEEE, Transactions on Pattern Analysis and Machine Intelligence, 28, 712–724.CrossRef
Zurück zum Zitat Rashad, M. Z., El-Bakry, H. M., & Isma’il, I. R. (2010). Diphone speech synthesis system for arabic using mary tts. International Journal of Computer Science and Information Technology (IJCSIT), 2(4), 18–26.CrossRef Rashad, M. Z., El-Bakry, H. M., & Isma’il, I. R. (2010). Diphone speech synthesis system for arabic using mary tts. International Journal of Computer Science and Information Technology (IJCSIT), 2(4), 18–26.CrossRef
Zurück zum Zitat Rashwan, M. A., Fakhr, M. W., Attia, M., & El-Mahallawy, M. S. (2007). Arabic ocr system analogous to hmm-based asr systems implementation and evaluation. Journal of Engineering and Applied Science (JEAS), 54(6), 653. Rashwan, M. A., Fakhr, M. W., Attia, M., & El-Mahallawy, M. S. (2007). Arabic ocr system analogous to hmm-based asr systems implementation and evaluation. Journal of Engineering and Applied Science (JEAS), 54(6), 653.
Zurück zum Zitat Schrder, M., & Trouvain, J. (2003). The german text-to-speech synthesis system mary: A tool for research, development and teaching. International Journal of Speech Technology, 6(4), 365–377.CrossRef Schrder, M., & Trouvain, J. (2003). The german text-to-speech synthesis system mary: A tool for research, development and teaching. International Journal of Speech Technology, 6(4), 365–377.CrossRef
Zurück zum Zitat Shaker, N., Abou-Zleikha, M., & Al Dakkak, O. (2008). Ssml for arabic language. In Text, Speech and Dialogue, pp. 657–664. Springer. Shaker, N., Abou-Zleikha, M., & Al Dakkak, O. (2008). Ssml for arabic language. In Text, Speech and Dialogue, pp. 657–664. Springer.
Zurück zum Zitat Sluijter, A., Bosgoed, E., Kerkhoff, J., Meier, E., Rietveld, T., & Swerts, M., et al. (1998). Evaluation of speech synthesis systems for dutch in telecommunication applications. Jenolan Caves: In Proceedings of the Third ESCA/COCOSDA International Workshop on Speech Synthesis. Sluijter, A., Bosgoed, E., Kerkhoff, J., Meier, E., Rietveld, T., & Swerts, M., et al. (1998). Evaluation of speech synthesis systems for dutch in telecommunication applications. Jenolan Caves: In Proceedings of the Third ESCA/COCOSDA International Workshop on Speech Synthesis.
Zurück zum Zitat Speechworks solution division from ScanSoft, Peabody, MA (2004). White paper—Assessing text-to-speech system quality. Technical report. Speechworks solution division from ScanSoft, Peabody, MA (2004). White paper—Assessing text-to-speech system quality. Technical report.
Zurück zum Zitat Ssml. (2005). Ssml 1.0 say-as attribute values. Working group note 26 may, W3C. Ssml. (2005). Ssml 1.0 say-as attribute values. Working group note 26 may, W3C.
Zurück zum Zitat Tratz, S. C. (2014). Accurate arabic script language/dialect classification. DTIC Document: Technical report. Tratz, S. C. (2014). Accurate arabic script language/dialect classification. DTIC Document: Technical report.
Zurück zum Zitat Youssef, A., & Emam, O. (2004). An arabic tts system based on the ibm trainable speech synthesizer. JEP-TALN: Le traitement automatique de l’arabe. Youssef, A., & Emam, O. (2004). An arabic tts system based on the ibm trainable speech synthesizer. JEP-TALN: Le traitement automatique de l’arabe.
Zurück zum Zitat Zeki, A. (2005). The segmentation problem on arabic character recognition the state of the art. 1st international conference on information and communication technology (ICICT) (pp. 48–57). Pakistan: Karachi. Zeki, A. (2005). The segmentation problem on arabic character recognition the state of the art. 1st international conference on information and communication technology (ICICT) (pp. 48–57). Pakistan: Karachi.
Metadaten
Titel
What we have and what is needed, how to evaluate Arabic Speech Synthesizer?
verfasst von
Iyad Abu Doush
Faisal Alkhatib
Abed Al Raoof Bsoul
Publikationsdatum
01.06.2016
Verlag
Springer US
Erschienen in
International Journal of Speech Technology / Ausgabe 2/2016
Print ISSN: 1381-2416
Elektronische ISSN: 1572-8110
DOI
https://doi.org/10.1007/s10772-015-9304-6

Weitere Artikel der Ausgabe 2/2016

International Journal of Speech Technology 2/2016 Zur Ausgabe

Neuer Inhalt