2010 | OriginalPaper | Buchkapitel
Impact of a Newly Developed Modern Standard Arabic Speech Corpus on Implementing and Evaluating Automatic Continuous Speech Recognition Systems
verfasst von : Mohammad A. M. Abushariah, Raja N. Ainon, Roziati Zainuddin, Bassam A. Al-Qatab, Assal A. M. Alqudah
Erschienen in: Spoken Dialogue Systems for Ambient Environments
Verlag: Springer Berlin Heidelberg
Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.
Wählen Sie Textabschnitte aus um mit Künstlicher Intelligenz passenden Patente zu finden. powered by
Markieren Sie Textabschnitte, um KI-gestützt weitere passende Inhalte zu finden. powered by
Being current formal linguistic standard and only acceptable form of Arabic language for all native speakers, Modern Standard Arabic (MSA) still lacks sufficient spoken corpora compared to other forms like Dialectal Arabic. This paper describes our work towards developing a new speech corpus for MSA, which can be used for implementing and evaluating any Arabic automatic continuous speech recognition system. The speech corpus contains 415 (367 training and 48 testing) sentences recorded by 42 (21 male and 21 female) Arabic native speakers from 11 countries representing three major regions (Levant, Gulf, and Africa). The impact of using this speech corpus on overall performance of Arabic automatic continuous speech recognition systems was examined. Two development phases were conducted based on the size of training data, Gaussian mixture distributions, and tied states (senones). Overall results indicate that larger training data size result higher word recognition rates and lower Word Error Rates (WER).