skip to main content
10.1145/3132525.3132542acmconferencesArticle/Chapter ViewAbstractPublication PagesassetsConference Proceedingsconference-collections
research-article
Best Paper

Evaluating the Usability of Automatically Generated Captions for People who are Deaf or Hard of Hearing

Published:19 October 2017Publication History

ABSTRACT

The accuracy of Automated Speech Recognition (ASR) technology has improved, but it is still imperfect in many settings. Researchers who evaluate ASR performance often focus on improving the Word Error Rate (WER) metric, but WER has been found to have little correlation with human-subject performance on many applications. We propose a new captioning-focused evaluation metric that better predicts the impact of ASR recognition errors on the usability of automatically generated captions for people who are Deaf or Hard of Hearing (DHH). Through a user study with 30 DHH users, we compared our new metric with the traditional WER metric on a caption usability evaluation task. In a side-by-side comparison of pairs of ASR text output (with identical WER), the texts preferred by our new metric were preferred by DHH participants. Further, our metric had significantly higher correlation with DHH participants' subjective scores on the usability of a caption, as compared to the correlation between WER metric and participant subjective scores. This new metric could be used to select ASR systems for captioning applications, and it may be a better metric for ASR researchers to consider when optimizing ASR systems.

References

  1. T. Apone, B. Botkin, M. Brooks and L. Goldberg. 2011. Caption Accuracy Metrics Project. Research into Automated Error Ranking of Real-time Captions in Live Television News Programs, WGBH. Retrieved from http:// ncam.wgbh.org/file_download/136Google ScholarGoogle Scholar
  2. K. Bain, S. H. Basson, M. Wald. 2002. Speech recognition in university classrooms: liberated learning project. In Proc. ASSETS '02, ACM, New York, NY, USA, 192-196. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. N. N. Belanger and K. Rayner. 2013. Frequency and predictability effects in eye fixations for skilled and less-skilled deaf readers. Visual cognition, 21(4):477-497Google ScholarGoogle Scholar
  4. D. L. Blackwell, J. W. Lucas, T. C. Clarke. 2014. Summary health statistics for us adults: national health interview survey, 2012. Vital and health statistics. Series 10, Data from the National Health Survey, (260):1-161Google ScholarGoogle Scholar
  5. T. Brants, A. C. Popat, P. Xu, F. J. Och, J. Dean. 2007. Large language models in machine translation. In Proc. EMNLP-CoNLL'07, Prague, Czech Republic, 858-867, Association for Computational Linguistics.Google ScholarGoogle Scholar
  6. A.-B. Dominguez, J. Alegria. 2010. Reading mechanisms in orally educated deaf adults. Journal of deaf studies and deaf education, 15(2): 136-148.Google ScholarGoogle ScholarCross RefCross Ref
  7. A.-B. Dominguez, M.-S. Carrillo, M. del Mar Perez, J. Alegría. 2014. Analysis of reading strategies in deaf adults as a function of their language and meta-phonological skills. Research in developmental disabilities, 35(7):1439-1456.Google ScholarGoogle Scholar
  8. J. Duffy, T. Giolas. 1971. The effect of word predictability on sentence intelligibility. Technical report, Submarine Medical Research Laboratory.Google ScholarGoogle Scholar
  9. L. Elliot, M. Stinson, J. Mallory, D. Easton, M. Huenerfauth. 2016. Deaf and hard of hearing individuals' perceptions of communication with hearing colleagues in small groups. In Proc. ASSETS '16, ACM, New York, NY, USA, 271-272. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. B. Favre, K. Cheung, S. Kazemian, A. Lee, Y. Liu, C. Munteanu, A. Nenkova, D. Ochei, G. Penn, S. Tratz, et al. 2013. Automatic human utility evaluation of ASR systems: does WER really predict performance? In Proc. Interspeech'13, 3463-3467.Google ScholarGoogle Scholar
  11. M. Federico, M. Furini. 2012. Enhancing learning accessibility through fully automatic captioning. In Proc. W4A'12, ACM, New York, NY, USA, Article 40, 4 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. J. Fiscus. 1998. Sclite scoring package version 1.5. US National Institute of Standard Technology, Retrieved on May 1, 2017 from http://www. itl. nist. gov/iaui/894.01/tools.Google ScholarGoogle Scholar
  13. I.R. Forman, B. Fletcher, J. Hartley, B. Rippon, A. Wilson. 2012. Blue herd: automated captioning for videoconferences. In Proc. ASSETS '12, ACM, New York, NY, USA, 227-228. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. N. Garay-Vitoria, J. Abascal. 2006. Text prediction systems: a survey. Univers. Access Inf. Soc. 4, 3 (February 2006), 188-203. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. J.S. Garofolo, C.G.P. Auzanne, E.M. Voorhees. 2000. The TREC spoken document retrieval track: A success story. In Content-Based Multimedia Information Access - Volume 1, RIAO ?00, Paris, France, 1-20. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Y. Gaur, W. S. Lasecki, F. Metze, J. P. Bigham. 2016. In Proc. W4A'16. ACM, New York, NY, USA, Article 23, 8 pages.Google ScholarGoogle Scholar
  17. J.J. Godfrey, E.C. Holliman, J. McDaniel. 1992. Switchboard: Telephone speech corpus for research and development. In IEEE International Conference on Acoustics, Speech, and Signal Processing, volume 1, 517-520. IEEE. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. D. Grangier, A. Vinciarelli, H. Bourlard. 2003. Information retrieval on noisy text. Technical report, IDIAP.Google ScholarGoogle Scholar
  19. S.S. Gray, D. Willett, J. Lu, J. Pinto, P. Maergner, N. Bodenstab. 2014. Child automatic speech recognition for US English: child interaction with living-room-electronic-devices. In Proc. WOCCI, 21-26.Google ScholarGoogle Scholar
  20. R.P. Harrington, G.C. Vanderheiden. 2013. Crowd caption correction (CCC). In Proc. of ASSETS '13. ACM, New York, NY, USA, Article 45, 2 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. D. W. Jackson, P. V. Paul, J. C. Smith. 1997. Prior knowledge and reading comprehension ability of deaf adolescents. Journal of Deaf Studies and Deaf Education, 2(3), 172-184.Google ScholarGoogle ScholarCross RefCross Ref
  22. S. Kafle, M. Huenerfauth. 2016. Effect of speech recognition errors in text understandability for people who are deaf or hard-of-hearing. In Proceedings of the 7th Workshop on Speech and Language Processing for Assistive Technologies (SLPAT), Interspeech'16, San Francisco, CA, USA. 20-25.Google ScholarGoogle ScholarCross RefCross Ref
  23. S. Kawas, G. Karalis, T. Wen, R.E. Ladner. 2016. Improving real-time captioning experiences for deaf and hard of hearing students. In Proc. ASSETS'16, ACM, New York, NY, USA, 15-23. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. R. Kliegl, E. Grabner, M. Rolfs, R. Engbert. 2004. Length, frequency, and predictability effects of words on eye movements in reading. European Journal of Cognitive Psychology, 16(1-2):262-284.Google ScholarGoogle ScholarCross RefCross Ref
  25. R.S. Kushalnagar, W.S. Lasecki, J.P. Bigham. 2014. Accessibility evaluation of classroom captions. ACM Trans. Access. Comput. 5, 3, Article 7, 24 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. W.S. Lasecki, J.P. Bigham. 2014. Real-time captioning with the crowd. interactions, 21, 3 (May 2014), 50-55. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. X. Lei, A.W. Senior, A. Gruenstein, J. Sorensen. 2013. Accurate and compact large vocabulary speech recognition on mobile devices. In Proc. Interspeech'13, vol. 1, 662-665.Google ScholarGoogle Scholar
  28. J. Li, L. Deng, Y. Gong, R. Haeb-Umbach. 2014. An overview of noise-robust automatic speech recognition. IEEE/ACM Trans. Audio, Speech and Lang. Proc. 22, 4 (April 2014), 745-777. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. J.L. Luckner, C.M. Handley. 2008. A summary of the reading comprehension research undertaken with students who are deaf or hard of hearing. American Annals of the Deaf, 153(1):6-36.Google ScholarGoogle ScholarCross RefCross Ref
  30. I.A. McCowan, D. Moore, J. Dines, D. Gatica-Perez, M. Flynn, P. Wellner, H. Bourlard. 2004. On the use of information retrieval measures for speech recognition evaluation. Technical report, IDIAP.Google ScholarGoogle Scholar
  31. T. Mishra, A. Ljolje, M. Gilbert. 2011. Predicting human perceived accuracy of ASR systems. In Proc. Interspeech'11, 1945-1948Google ScholarGoogle Scholar
  32. A.C. Morris, V. Maier, P.D. Green. 2004. From WER and RIL to MER and WIL: improved evaluation measures for connected speech recognition. In Interspeech'04, 2765-2768.Google ScholarGoogle Scholar
  33. H. Nanjo, T. Kawahara. 2005. A new ASR evaluation measure and minimum Bayes-risk decoding for open-domain speech understanding. In Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 1, 1053-1056.Google ScholarGoogle ScholarCross RefCross Ref
  34. K. Rayner. 1998. Eye movements in reading and information processing: 20 years of research. Psychological Bulletin, 124(3):372-422Google ScholarGoogle ScholarCross RefCross Ref
  35. K. Rayner, X. Li, B.J. Juhasz, G. Yan. 2005. The effect of word predictability on the eye movements of Chinese readers. Psychonomic Bulletin & Review, 12(6):1089-1093.Google ScholarGoogle ScholarCross RefCross Ref
  36. K. Rayner, E.D. Reichle, M.J. Stroud, C.C. Williams, A. Pollatsek. 2006. The effect of word frequency, word predictability, and font difficulty on the eye movements of young and older readers. Psychology and Aging, 21(3):448.Google ScholarGoogle ScholarCross RefCross Ref
  37. K. Rayner, T.J. Slattery, D. Drieghe, S.P. Liversedge. 2011. Eye movements and word skipping during reading: effects of word length and predictability. Journal of Experimental Psychology: Human Perception and Performance, 37(2):514-528.Google ScholarGoogle ScholarCross RefCross Ref
  38. A. Rousseau, P. Deleglise, Y. Esteve. 2012. Ted-lium: an automatic speech recognition dedicated corpus. In Proc. of LREC,'12 125-129. ELRA, Paris, France.Google ScholarGoogle Scholar
  39. M.S. Stinson, P. Francis, L.B. Elliot, D.Easton. 2014. Real-time caption challenge: C-print. In Proc. of ASSETS '14, ACM, New York, NY, USA, 317-318. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. H. Takagi, T. Itoh, K. Shinkawa. 2015. Evaluation of real-time captioning by machine recognition with human support. In Proc. W4A'15, ACM, New York, NY, USA, Article 5, 4 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. M. Wald. 2011. Crowdsourcing correction of speech recognition captioning errors. In Proc. W4A'11, ACM, New York, NY, USA, Article 22, 2 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Y.-Y. Wang, A. Acero, C. Chelba. 2003. Is word error rate a good indicator for spoken language understanding accuracy. In Proc. IEEE Workshop on Automatic Speech Recognition and Understanding, 577-582. IEEE.Google ScholarGoogle Scholar
  43. W. Xiong, J. Droppo, X. Huang, F. Seide,. Seltzer, A. Stolcke, D. Yu, G. Zweig. 2016. Achieving human parity in conversational speech recognition. Computing Research Repository (CoRR), http://arxiv.org/abs/1610.05256Google ScholarGoogle Scholar

Index Terms

  1. Evaluating the Usability of Automatically Generated Captions for People who are Deaf or Hard of Hearing

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        ASSETS '17: Proceedings of the 19th International ACM SIGACCESS Conference on Computers and Accessibility
        October 2017
        450 pages
        ISBN:9781450349260
        DOI:10.1145/3132525

        Copyright © 2017 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 19 October 2017

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        ASSETS '17 Paper Acceptance Rate28of126submissions,22%Overall Acceptance Rate436of1,556submissions,28%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader