2011 | OriginalPaper | Chapter
Part of Speech Tagging Approach to Designing Compound Words for Arabic Continuous Speech Recognition Systems
Authors : Dia AbuZeina, Moustafa Elshafei, Wasfi Al-Khatib
Published in: Informatics Engineering and Information Science
Publisher: Springer Berlin Heidelberg
Activate our intelligent search to find suitable subject content or patents.
Select sections of text to find matching patents with Artificial Intelligence. powered by
Select sections of text to find additional relevant content using AI-assisted search. powered by
Misrecognition of small words is one of the factors that lead to suboptimal performance in automatic continuous speech recognition systems. In general, errors generated from small words are much more than errors in long words. Therefore, compounding some words (small or long) to produce longer words is welcome by speech recognition decoders. In this paper, we present a novel approach to artificially generate compound words using part of speech tagging. For this purpose, we consider two Arabic pronunciation cases that usually occur together without any silence: a noun followed by an adjective, and a preposition followed by any other word. To collect the candidate compound words, we use Stanford Arabic tagger to tag all words in our Baseline transcription corpus. Using Sphinx 3, we test the proposed method on a 5.4 hours speech corpus of modern standard Arabic. The results show significant improvement, with the word error rate being reduced by 2.39%.