2010 | OriginalPaper | Chapter
Impact of a Newly Developed Modern Standard Arabic Speech Corpus on Implementing and Evaluating Automatic Continuous Speech Recognition Systems
Authors : Mohammad A. M. Abushariah, Raja N. Ainon, Roziati Zainuddin, Bassam A. Al-Qatab, Assal A. M. Alqudah
Published in: Spoken Dialogue Systems for Ambient Environments
Publisher: Springer Berlin Heidelberg
Activate our intelligent search to find suitable subject content or patents.
Select sections of text to find matching patents with Artificial Intelligence. powered by
Select sections of text to find additional relevant content using AI-assisted search. powered by
Being current formal linguistic standard and only acceptable form of Arabic language for all native speakers, Modern Standard Arabic (MSA) still lacks sufficient spoken corpora compared to other forms like Dialectal Arabic. This paper describes our work towards developing a new speech corpus for MSA, which can be used for implementing and evaluating any Arabic automatic continuous speech recognition system. The speech corpus contains 415 (367 training and 48 testing) sentences recorded by 42 (21 male and 21 female) Arabic native speakers from 11 countries representing three major regions (Levant, Gulf, and Africa). The impact of using this speech corpus on overall performance of Arabic automatic continuous speech recognition systems was examined. Two development phases were conducted based on the size of training data, Gaussian mixture distributions, and tied states (senones). Overall results indicate that larger training data size result higher word recognition rates and lower Word Error Rates (WER).