Skip to main content
Erschienen in: International Journal of Speech Technology 3/2017

24.07.2017

The impact of phonological rules on Arabic speech recognition

verfasst von: Fawaz S. Al-Anzi, Dia AbuZeina

Erschienen in: International Journal of Speech Technology | Ausgabe 3/2017

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The pronunciation variation is a well-known phenomenon that has been widely investigated for automatic speech recognition (ASR). The knowledge-based phonological rules are generally used to capture the accurate phonetic realization in order to minimize the mismatch between the ASR dictionary and the actual phonetic representation of the speech signal. For the Arabic ASR, there are a number of studies that employ these rules on Arabic ASR systems; however, little research has been devoted to measure the precise performance of each rule. In this paper, we aim at finding the exact effect of each rule as well as the rules that have no influence. We used the Carnegie Mellon University PocketSphinx speech recognizer with a new “in-house” modern standard Arabic speech corpus that contains 19 h for training and 3.7 h for testing. We evaluated the effect of three famous rules (Shadda, Tanween, and the solar letters). The experimental results do not show clear evidence that using phonological rules for ASR dictionary adaptation can enhance the performance for within-word pronunciation variation. The obtained results might be an indication to rethink or use other ASR performance aspects, such as cross-word pronunciation variation and the optimal phonemes set of the Arabic language.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Abushariah, M. A.-A. M., et al. (2012). Arabic speaker-independent continuous automatic speech recognition based on a phonetically rich and balanced speech corpus. International Arab Journal of Information Technology, 9(1), 84–93. Abushariah, M. A.-A. M., et al. (2012). Arabic speaker-independent continuous automatic speech recognition based on a phonetically rich and balanced speech corpus. International Arab Journal of Information Technology, 9(1), 84–93.
Zurück zum Zitat AbuZeina, D., et al. (2011). Toward enhanced Arabic speech recognition using part of speech tagging. International Journal of Speech Technology, 14(4), 419–426.CrossRef AbuZeina, D., et al. (2011). Toward enhanced Arabic speech recognition using part of speech tagging. International Journal of Speech Technology, 14(4), 419–426.CrossRef
Zurück zum Zitat AbuZeina, D., et al. (2011). Cross-word Arabic pronunciation variation modeling for speech recognition. International Journal of Speech Technology, 14(3), 227–236.CrossRef AbuZeina, D., et al. (2011). Cross-word Arabic pronunciation variation modeling for speech recognition. International Journal of Speech Technology, 14(3), 227–236.CrossRef
Zurück zum Zitat AbuZeina, D., et al. (2012) Within-word pronunciation variation modeling for Arabic ASRs: A direct data-driven approach. International Journal of Speech Technology, 15(2), 65–75.CrossRef AbuZeina, D., et al. (2012) Within-word pronunciation variation modeling for Arabic ASRs: A direct data-driven approach. International Journal of Speech Technology, 15(2), 65–75.CrossRef
Zurück zum Zitat Akesson, J. (2010). A study of the assimilation and substitution in Arabic. Lund: Pallas Athena Distribution. Akesson, J. (2010). A study of the assimilation and substitution in Arabic. Lund: Pallas Athena Distribution.
Zurück zum Zitat Al-Anzi, F. S., & AbuZeina, D. (2015). Stemming impact on Arabic text categorization performance: A survey. In Proceedings of the 2015 5th international conference on information & communication technology and accessibility (ICTA), IEEE. Al-Anzi, F. S., & AbuZeina, D. (2015). Stemming impact on Arabic text categorization performance: A survey. In Proceedings of the 2015 5th international conference on information & communication technology and accessibility (ICTA), IEEE.
Zurück zum Zitat Alghamdi, M., Elshafei, M., & Al-Muhtaseb, H. (2007). Arabic broadcast news transcription system. International Journal of Speech Technology, 10(4), 183–195.CrossRef Alghamdi, M., Elshafei, M., & Al-Muhtaseb, H. (2007). Arabic broadcast news transcription system. International Journal of Speech Technology, 10(4), 183–195.CrossRef
Zurück zum Zitat Al-Haj, H., Hsiao, R., Lane, I., Black, A., & Waibel, A. (2009) Pronunciation modeling for dialectal Arabic speech recognition, ASRU 2009: IEEE workshop, Italy. Al-Haj, H., Hsiao, R., Lane, I., Black, A., & Waibel, A. (2009) Pronunciation modeling for dialectal Arabic speech recognition, ASRU 2009: IEEE workshop, Italy.
Zurück zum Zitat Ali, M., Elshafei, M., Alghamdi, M., Almuhtaseb, H., & Alnajjar, A. (2009). Arabic phonetic dictionaries 236 for speech recognition. Journal of Information Technology Research, 2(4), 67–80.CrossRef Ali, M., Elshafei, M., Alghamdi, M., Almuhtaseb, H., & Alnajjar, A. (2009). Arabic phonetic dictionaries 236 for speech recognition. Journal of Information Technology Research, 2(4), 67–80.CrossRef
Zurück zum Zitat Benzeghiba, M., & De Mori, R. et al. (2007). Automatic speech recognition and speech variability: A review. Speech Communication, 49(10–11), 763–786.CrossRef Benzeghiba, M., & De Mori, R. et al. (2007). Automatic speech recognition and speech variability: A review. Speech Communication, 49(10–11), 763–786.CrossRef
Zurück zum Zitat Biadsy, F., Habash, N., & Hirschberg, J. (2009). Improving the Arabic pronunciation dictionary for phone and word recognition with linguistically-based pronunciation rules. In Proceedings of human language technologies: The 2009 annual conference of the North American chapter of the association for computational linguistics. Association for Computational Linguistics. Biadsy, F., Habash, N., & Hirschberg, J. (2009). Improving the Arabic pronunciation dictionary for phone and word recognition with linguistically-based pronunciation rules. In Proceedings of human language technologies: The 2009 annual conference of the North American chapter of the association for computational linguistics. Association for Computational Linguistics.
Zurück zum Zitat Elshafei, M.A. (1991) Toward an Arabic text-to-speech system. The Arabian Journal for Science and Engineering, 16(4B), 565–583.MathSciNet Elshafei, M.A. (1991) Toward an Arabic text-to-speech system. The Arabian Journal for Science and Engineering, 16(4B), 565–583.MathSciNet
Zurück zum Zitat Finke, M., & Waibel, A. (1997). Speaking mode dependent pronunciation modeling in large vocabulary conversational speech recognition. In Proceedings of EuroSpeech-97 (pp. 2379–2382), Rhodes. Finke, M., & Waibel, A. (1997). Speaking mode dependent pronunciation modeling in large vocabulary conversational speech recognition. In Proceedings of EuroSpeech-97 (pp.  2379–2382), Rhodes.
Zurück zum Zitat Fosler-Lussier, E., Greenberg, S., & Morgan, N. (1999) Incorporating contextual phonetics into automatic speech recognition. In Proceedings of the international congress on phonetic sciences, (pp 611–614). Fosler-Lussier, E., Greenberg, S., & Morgan, N. (1999) Incorporating contextual phonetics into automatic speech recognition. In Proceedings of the international congress on phonetic sciences, (pp 611–614).
Zurück zum Zitat Jeon, J., Cha, S., Chung, M., Park, J., & Hwang, K. (1998). Automatic generation of Korean pronunciation variants by multistage applications of phonological rules. In ICSLP-1998 (paper 0675). Jeon, J., Cha, S., Chung, M., Park, J., & Hwang, K. (1998). Automatic generation of Korean pronunciation variants by multistage applications of phonological rules. In ICSLP-1998 (paper 0675).
Zurück zum Zitat Jurafsky, D., Martin, J. (2009). Speech and language processing, 2nd edn. Hoboken: Pearson. Jurafsky, D., Martin, J. (2009). Speech and language processing, 2nd edn. Hoboken: Pearson.
Zurück zum Zitat Kessens, J. M., Wester, M., et al. (1999). Improving the performance of a Dutch CSR by modeling within-word and cross-word pronunciation variation. Speech Communication, 29(2–4), 193–207.CrossRef Kessens, J. M., Wester, M., et al. (1999). Improving the performance of a Dutch CSR by modeling within-word and cross-word pronunciation variation. Speech Communication, 29(2–4), 193–207.CrossRef
Zurück zum Zitat Kirchhoff, K., et al. (2002) Novel approaches to Arabic speech recognition-final report from the JHU summer workshop 2002. Technical Reports, John-Hopkins University. Kirchhoff, K., et al. (2002) Novel approaches to Arabic speech recognition-final report from the JHU summer workshop 2002. Technical Reports, John-Hopkins University.
Zurück zum Zitat Kyong-Nim, L. & Minhwa, C. (2007). Morpheme-based modeling of pronunciation variation for large vocabulary continuous speech recognition in Korean, IEICE Transactions on Information and Systems, 90(7), 1063–1072. Kyong-Nim, L. & Minhwa, C. (2007). Morpheme-based modeling of pronunciation variation for large vocabulary continuous speech recognition in Korean, IEICE Transactions on Information and Systems, 90(7), 1063–1072.
Zurück zum Zitat Liu, Y., & Fung, P. (2003). Modeling partial pronunciation variations for spontaneous Mandarin speech recognition. Computer Speech and Language, 17, 357–379.CrossRef Liu, Y., & Fung, P. (2003). Modeling partial pronunciation variations for spontaneous Mandarin speech recognition. Computer Speech and Language, 17, 357–379.CrossRef
Zurück zum Zitat Masmoudi, A., et al. (2014) A Corpus and Phonetic Dictionary for Tunisian Arabic Speech Recognition. In LREC. Masmoudi, A., et al. (2014) A Corpus and Phonetic Dictionary for Tunisian Arabic Speech Recognition. In LREC.
Zurück zum Zitat Ramsay, A., Alsharhan, I., Ahmed H. (2014). Generation of a phonetic transcription for modern standard Arabic: A knowledge-based model. Computer Speech & Language, 28(4), 959–978.CrossRef Ramsay, A., Alsharhan, I., Ahmed H. (2014). Generation of a phonetic transcription for modern standard Arabic: A knowledge-based model. Computer Speech & Language, 28(4), 959–978.CrossRef
Zurück zum Zitat Seman, N., & Jusoff, K. (2008). Acoustic pronunciation variations modeling for standard Malay speech recognition. Computer and Information Science, 1(4), 112.CrossRef Seman, N., & Jusoff, K. (2008). Acoustic pronunciation variations modeling for standard Malay speech recognition. Computer and Information Science, 1(4), 112.CrossRef
Zurück zum Zitat Tajchman, G., Foster, E., Jurafsky, D. (1995) Building multiple pronunciation models for novel words using exploratory computational phonology. In EUROSPEECH-1995 (pp. 2247–2250). Tajchman, G., Foster, E., Jurafsky, D. (1995) Building multiple pronunciation models for novel words using exploratory computational phonology. In EUROSPEECH-1995 (pp. 2247–2250).
Zurück zum Zitat Vergyri, D., et al. (2008) Development of the SRI/nightingale Arabic ASR system. Interspeech. Vergyri, D., et al. (2008) Development of the SRI/nightingale Arabic ASR system. Interspeech.
Zurück zum Zitat Vergyri, D., & Kirchhoff, K. (2004). Automatic diacritization of Arabic for acoustic modeling in speech recognition. In Proceedings of the workshop on computational approaches to Arabic script-based languages, Association for Computational Linguistics. Vergyri, D., & Kirchhoff, K. (2004). Automatic diacritization of Arabic for acoustic modeling in speech recognition. In Proceedings of the workshop on computational approaches to Arabic script-based languages, Association for Computational Linguistics.
Zurück zum Zitat Wester, M. (2003). Pronunciation modeling for ASR: Knowledge-based and data-derived methods. Computer Speech & Language, 17, 69–85.CrossRef Wester, M. (2003). Pronunciation modeling for ASR: Knowledge-based and data-derived methods. Computer Speech & Language, 17, 69–85.CrossRef
Metadaten
Titel
The impact of phonological rules on Arabic speech recognition
verfasst von
Fawaz S. Al-Anzi
Dia AbuZeina
Publikationsdatum
24.07.2017
Verlag
Springer US
Erschienen in
International Journal of Speech Technology / Ausgabe 3/2017
Print ISSN: 1381-2416
Elektronische ISSN: 1572-8110
DOI
https://doi.org/10.1007/s10772-017-9440-2

Weitere Artikel der Ausgabe 3/2017

International Journal of Speech Technology 3/2017 Zur Ausgabe