nach oben

International Journal of Speech Technology

Erschienen in:

24.07.2017

The impact of phonological rules on Arabic speech recognition

verfasst von: Fawaz S. Al-Anzi, Dia AbuZeina

Erschienen in: International Journal of Speech Technology | Ausgabe 3/2017

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

The pronunciation variation is a well-known phenomenon that has been widely investigated for automatic speech recognition (ASR). The knowledge-based phonological rules are generally used to capture the accurate phonetic realization in order to minimize the mismatch between the ASR dictionary and the actual phonetic representation of the speech signal. For the Arabic ASR, there are a number of studies that employ these rules on Arabic ASR systems; however, little research has been devoted to measure the precise performance of each rule. In this paper, we aim at finding the exact effect of each rule as well as the rules that have no influence. We used the Carnegie Mellon University PocketSphinx speech recognizer with a new “in-house” modern standard Arabic speech corpus that contains 19 h for training and 3.7 h for testing. We evaluated the effect of three famous rules (Shadda, Tanween, and the solar letters). The experimental results do not show clear evidence that using phonological rules for ASR dictionary adaptation can enhance the performance for within-word pronunciation variation. The obtained results might be an indication to rethink or use other ASR performance aspects, such as cross-word pronunciation variation and the optimal phonemes set of the Arabic language.

Vorheriger Artikel Rethinking classification results based on read speech, or: why improvements do not always transfer to other speaking styles

Nächster Artikel Text dependant speaker recognition using MFCC, LPC and DWT

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Abushariah, M. A.-A. M., et al. (2012). Arabic speaker-independent continuous automatic speech recognition based on a phonetically rich and balanced speech corpus. International Arab Journal of Information Technology, 9(1), 84–93.

AbuZeina, D., et al. (2011). Toward enhanced Arabic speech recognition using part of speech tagging. International Journal of Speech Technology, 14(4), 419–426.CrossRef

AbuZeina, D., et al. (2011). Cross-word Arabic pronunciation variation modeling for speech recognition. International Journal of Speech Technology, 14(3), 227–236.CrossRef

AbuZeina, D., et al. (2012) Within-word pronunciation variation modeling for Arabic ASRs: A direct data-driven approach. International Journal of Speech Technology, 15(2), 65–75.CrossRef

Akesson, J. (2010). A study of the assimilation and substitution in Arabic. Lund: Pallas Athena Distribution.

Al-Anzi, F. S., & AbuZeina, D. (2015). Stemming impact on Arabic text categorization performance: A survey. In Proceedings of the 2015 5th international conference on information & communication technology and accessibility (ICTA), IEEE.

Alghamdi, M., Elshafei, M., & Al-Muhtaseb, H. (2007). Arabic broadcast news transcription system. International Journal of Speech Technology, 10(4), 183–195.CrossRef

Al-Haj, H., Hsiao, R., Lane, I., Black, A., & Waibel, A. (2009) Pronunciation modeling for dialectal Arabic speech recognition, ASRU 2009: IEEE workshop, Italy.

Ali, M., Elshafei, M., Alghamdi, M., Almuhtaseb, H., & Alnajjar, A. (2009). Arabic phonetic dictionaries 236 for speech recognition. Journal of Information Technology Research, 2(4), 67–80.CrossRef

Al-Sabah TV. (2017). http://www.alsabahpress.com/.

Benzeghiba, M., & De Mori, R. et al. (2007). Automatic speech recognition and speech variability: A review. Speech Communication, 49(10–11), 763–786.CrossRef

Biadsy, F., Habash, N., & Hirschberg, J. (2009). Improving the Arabic pronunciation dictionary for phone and word recognition with linguistically-based pronunciation rules. In Proceedings of human language technologies: The 2009 annual conference of the North American chapter of the association for computational linguistics. Association for Computational Linguistics.

Building Language Model. (2017). http://cmusphinx.sourceforge.net/wiki/tutoriallm.

CMU Sphinx Downloads. (2017). http://cmusphinx.sourceforge.net/wiki/download.

Elshafei, M.A. (1991) Toward an Arabic text-to-speech system. The Arabian Journal for Science and Engineering, 16(4B), 565–583.MathSciNet

Finke, M., & Waibel, A. (1997). Speaking mode dependent pronunciation modeling in large vocabulary conversational speech recognition. In Proceedings of EuroSpeech-97 (pp. 2379–2382), Rhodes.

Fosler-Lussier, E., Greenberg, S., & Morgan, N. (1999) Incorporating contextual phonetics into automatic speech recognition. In Proceedings of the international congress on phonetic sciences, (pp 611–614).

Jeon, J., Cha, S., Chung, M., Park, J., & Hwang, K. (1998). Automatic generation of Korean pronunciation variants by multistage applications of phonological rules. In ICSLP-1998 (paper 0675).

Jurafsky, D., Martin, J. (2009). Speech and language processing, 2nd edn. Hoboken: Pearson.

Kessens, J. M., Wester, M., et al. (1999). Improving the performance of a Dutch CSR by modeling within-word and cross-word pronunciation variation. Speech Communication, 29(2–4), 193–207.CrossRef

Kirchhoff, K., et al. (2002) Novel approaches to Arabic speech recognition-final report from the JHU summer workshop 2002. Technical Reports, John-Hopkins University.

Kyong-Nim, L. & Minhwa, C. (2007). Morpheme-based modeling of pronunciation variation for large vocabulary continuous speech recognition in Korean, IEICE Transactions on Information and Systems, 90(7), 1063–1072.

Liu, Y., & Fung, P. (2003). Modeling partial pronunciation variations for spontaneous Mandarin speech recognition. Computer Speech and Language, 17, 357–379.CrossRef

Masmoudi, A., et al. (2014) A Corpus and Phonetic Dictionary for Tunisian Arabic Speech Recognition. In LREC.

Ramsay, A., Alsharhan, I., Ahmed H. (2014). Generation of a phonetic transcription for modern standard Arabic: A knowledge-based model. Computer Speech & Language, 28(4), 959–978.CrossRef

Seman, N., & Jusoff, K. (2008). Acoustic pronunciation variations modeling for standard Malay speech recognition. Computer and Information Science, 1(4), 112.CrossRef

Tajchman, G., Foster, E., Jurafsky, D. (1995) Building multiple pronunciation models for novel words using exploratory computational phonology. In EUROSPEECH-1995 (pp. 2247–2250).

Training Acoustic Model for CMUSphinx. (2017). http://cmusphinx.sourceforge.net/wiki/tutorialam.

Vergyri, D., et al. (2008) Development of the SRI/nightingale Arabic ASR system. Interspeech.

Vergyri, D., & Kirchhoff, K. (2004). Automatic diacritization of Arabic for acoustic modeling in speech recognition. In Proceedings of the workshop on computational approaches to Arabic script-based languages, Association for Computational Linguistics.

Wester, M. (2003). Pronunciation modeling for ASR: Knowledge-based and data-derived methods. Computer Speech & Language, 17, 69–85.CrossRef

Titel: The impact of phonological rules on Arabic speech recognition
verfasst von: Fawaz S. Al-Anzi
Dia AbuZeina
Publikationsdatum: 24.07.2017
Verlag: Springer US
Erschienen in: International Journal of Speech Technology / Ausgabe 3/2017
Print ISSN: 1381-2416
Elektronische ISSN: 1572-8110
DOI: https://doi.org/10.1007/s10772-017-9440-2

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Weitere Artikel der Ausgabe 3/2017

Encrypted gray image transmission over OFDM channel for TV cloud computing

Using geometric spectral subtraction approach for feature extraction for DSR front-end Arabic system

Opinion mining and sentiment analysis for Arabic on-line texts: application on the political domain

A decision tree using ID3 algorithm for English semantic analysis

Two-space variability compensation technique for speaker verification in short length and reverberant environments

Rethinking classification results based on read speech, or: why improvements do not always transfer to other speaking styles