The development of large-scale automatic classroom dialog analysis systems requires accurate speech-to-text translation. A variety of automatic speech recognition (ASR) engines were evaluated for this purpose. Recordings of teachers in noisy classrooms were used for testing. In comparing ASR results, Google Speech and Bing Speech were more accurate with word accuracy scores of 0.56 for Google and 0.52 for Bing compared to 0.41 for AT&T Watson, 0.08 for Microsoft, 0.14 for Sphinx with the HUB4 model, and 0.00 for Sphinx with the WSJ model. Further analysis revealed both Google and Bing engines were largely unaffected by speakers, speech class sessions, and speech characteristics. Bing results were validated across speakers in a laboratory study, and a method of improving Bing results is presented. Results provide a useful understanding of the capabilities of contemporary ASR engines in noisy classroom environments. Results also highlight a list of issues to be aware of when selecting an ASR engine for difficult speech recognition tasks.
Weitere Kapitel dieses Buchs durch Wischen aufrufen
Bitte loggen Sie sich ein, um Zugang zu diesem Inhalt zu erhalten
Sie möchten Zugang zu diesem Inhalt erhalten? Dann informieren Sie sich jetzt über unsere Produkte:
- A Study of Automatic Speech Recognition in Noisy Classroom Environments for Automated Dialog Analysis
Andrew M. Olney