skip to main content
10.1145/3132525.3132541acmconferencesArticle/Chapter ViewAbstractPublication PagesassetsConference Proceedingsconference-collections
research-article

Deaf and Hard-of-Hearing Perspectives on Imperfect Automatic Speech Recognition for Captioning One-on-One Meetings

Published:19 October 2017Publication History

ABSTRACT

Recent advances in Automatic Speech Recognition (ASR) have made this technology a potential solution for transcribing audio input in real-time for people who are Deaf or Hard of Hearing (DHH). However, ASR is imperfect; users must cope with errors in the output. While some prior research has studied ASR-generated transcriptions to provide captions for DHH people, there has not been a systematic study of how to best present captions that may include errors from ASR software nor how to make use of the ASR system's word-level confidence. We conducted two studies, with 21 and 107 DHH participants, to compare various methods of visually presenting the ASR output with certainty values. Participants answered subjective preference questions and provided feedback on how ASR captioning could be used with confidence display markup. Users preferred captioning styles with which they were already most familiar (that did not display confidence information), and they were concerned about the accuracy of ASR systems. While they expressed interest in systems that display word confidence during captions, they were concerned that text appearance changes may be distracting. The findings of this study should be useful for researchers and companies developing automated captioning systems for DHH users.

References

  1. J. Albertini, C. Mayer. 2011. Using miscue analysis to assess comprehension in deaf college readers. The Journal of Deaf Studies and Deaf Education, 16(1):35.Google ScholarGoogle ScholarCross RefCross Ref
  2. Consumer Electronics Association. 2013. Cea-708-e (ansi): Digital television (dtv) closed captioning.Google ScholarGoogle Scholar
  3. F. G. Bowe. 2002. Deaf and hard of hearing Americans' instant messaging and e-mail use: A national survey. American Annals of the Deaf, 147(4):6-10.Google ScholarGoogle ScholarCross RefCross Ref
  4. A. Brandão, H. Nicolau, S. Tadas, V.L. Hanson. 2016. Slidepacer: A presentation delivery tool for instructors of deaf and hard of hearing students. In Proceedings of the 18th International ACM SIGACCESS Conference on Computers and Accessibility (ASSETS '16), ACM, New York, NY, USA, 25-32. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. CaptionMax. 1993. Suggested Styles and Conventions for Closed Captioning. WGBH Educational Foundation, Boston.Google ScholarGoogle Scholar
  6. Federal Communications Commission. 2015. Video relay services. http://www.fcc.gov/guides/video-relay-servicesGoogle ScholarGoogle Scholar
  7. M. Crabb, R. Jones, M. Armstrong, C.J. Hughes. 2015. Online news videos: The UX of subtitle position. In Proceedings of the 17th International ACM SIGACCESS Conference on Computers & Accessibility (ASSETS '15), ACM, New York, NY, USA, 215-222. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. M. Davoudi, M. B. Menhaj, N. S. Naraghi, A. Aref, M. Davoodi, M. Davoudi. 2012. A fuzzy logic-based video subtitle and caption coloring system. Advances in Fuzzy Systems. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. DCMP. 2016. DCMP Captioning Key - Quality Captioning. http://www.captioningkey.org/quality_captioning.htmlGoogle ScholarGoogle Scholar
  10. L. Elliot, M. Stinson, J. Mallory, D. Easton, M. Huenerfauth. 2016. Deaf and hard of hearing individuals' perceptions of communication with hearing colleagues in small groups. In Proceedings of the 18th International ACM SIGACCESS Conference on Computers and Accessibility (ASSETS '16), ACM, New York, NY, USA, 271-272. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. L.B. Elliot, M.S. Stinson, B.G. McKee, V. S. Everhart, P.J. Francis. 2001. College students' perceptions of the c-print speech-to-text transcription system. Journal of deaf studies and deaf education, 6(4):285-298.Google ScholarGoogle ScholarCross RefCross Ref
  12. V. Ferdiansyah, S. Nakagawa. 2013. Effect of captioning lecture videos for learning in foreign language. In Proc. SLP Meeting of Info. Processing Society of Japan. SLP-97, No. 3.Google ScholarGoogle Scholar
  13. D.R. Flatla, C. Gutwin. 2012. Situation-specific models of color differentiation. ACM Trans. Access. Comput. 4, 3, Article 13, 44 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. R. Flesch. 1948. A new readability yardstick. Journal of applied psychology, 32(3):221-223.Google ScholarGoogle ScholarCross RefCross Ref
  15. Y. Gaur, W.S. Lasecki, F. Metze, J.P. Bigham. 2016. The effects of automatic speech recognition quality on human transcription latency. In Proceedings of the 13th Web for All Conference (W4A '16). ACM, New York, NY, USA, Article 23, 8 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. S.R. Gulliver, G. Ghinea. 2003. How level and type of deafness affect user perception of multimedia video clips. Universal Access in the Information Society, 2(4):374-386. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. R.P. Harrington, G.C. Vanderheiden. 2013. Crowd caption correction (ccc). In Proceedings of the 15th International ACM SIGACCESS Conference on Computers and Accessibility (ASSETS '13). ACM, New York, NY, USA, Article 45, 2 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. R. Hong, M. Wang, M. Xu, S. Yan, T.-S. Chua. 2010. Dynamic captioning: video accessibility enhancement for hearing impairment. In Proceedings of the 18th ACM international conference on Multimedia (MM '10), ACM, New York, NY, USA, 421-430. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. X. Huang, J. Baker, R. Reddy. 2014. A historical perspective of speech recognition. Comm. ACM, 57(1):94-103. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. D.W. Jackson, P.V. Paul, J.C. Smith. 1997. Prior knowledge and reading comprehension ability of deaf adolescents. Journal of Deaf Studies and Deaf Education, 2(3):172-184.Google ScholarGoogle ScholarCross RefCross Ref
  21. J. Jankowski, K. Samp, I. Irzynska, M. Jozwowicz, S. Decker. 2010. Integrating text with video and 3d graphics: The effects of text drawing styles on text readability. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '10). ACM, New York, NY, USA, 1321-1330. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. L. Jefferson, R. Harvey. 2006. Accommodating color blind computer users. In Proceedings of the 8th international ACM SIGACCESS conference on Computers and accessibility (Assets '06). ACM, New York, NY, USA, 40-47, Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. S. Kawas, G. Karalis, T. Wen, R.E. Ladner. 2016. Improving real-time captioning experiences for deaf and hard of hearing students. In Proceedings of the 18th International ACM SIGACCESS Conference on Computers and Accessibility (ASSETS '16). ACM, New York, NY, USA, 15-23, Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. R. Kheir, T. Way. 2007. Inclusion of deaf students in computer science classes using real-time speech transcription. In ITiCSE'07, ACM, NY, NY, USA, 261-265, Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. R.S. Kushalnagar, G.W. Behm, A.W. Kelstone, S. Ali. 2015. Tracked speech-to-text display: Enhancing accessibility and readability of real-time speech-to-text. In Proc ASSETS'15, ACM, NY, NY, USA, 223-230, Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. R.S. Kushalnagar, W.S. Lasecki, J.P. Bigham. 2014. Accessibility evaluation of classroom captions. ACM Trans. Access. Comput. 5, 3, Article 7, 24 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. P. Lamere, P. Kwok, E. Gouvea, B. Raj, R. Singh, W. Walker, M. Warmuth, P. Wolf. 2003. The Carnegie Mellon University sphinx-4 speech recognition system. In Proc ICASSP'03, 2-5.Google ScholarGoogle Scholar
  28. D.G. Lee, D.I. Fels, J.P. Udo. 2007. Emotive captioning. Computers in Entertainment (CIE), 5(2):11. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. M. Marschark, G. Leigh, P. Sapere, D. Burnham, C. Convertino, M. Stinson, H. Knoors, M. P. Vervloed, W. Noble. 2006. Benefits of sign language interpreting and text alternatives for deaf students' classroom learning. Journal of deaf studies and deaf education, 11(4):421-437.Google ScholarGoogle ScholarCross RefCross Ref
  30. D. Miller, K. Gyllstrom, D. Stotts, J. Culp. 2007. Semi-transparent video interfaces to assist deaf persons in meetings. In Proc ACM-SE 45, ACM, NY, NY, USA, 501-506. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. M. R. Mirzaei, S. Ghorshi, M. Mortazavi. 2012. Using augmented reality and automatic speech recognition techniques to help deaf and hard of hearing people. In Proc VRIC'12, ACM, NY, NY, USA, Article 5, 4 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. H. Nambo, S. Seto, H. Arai, K. Sugimori, Y. Shimomura, H. Kawabe. 2012. Visualization of non-verbal expressions in voice for hearing impaired: Ambient font and onomatopoeic subsystem. In Proc ICCHP'12 volume 1, 492-499. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. S.J. Parault, H.M. Williams. 2010. Reading motivation, reading amount, and text comprehension in deaf and hearing adults. J. of Deaf Studies and Deaf Educ., 15(2):120-135.Google ScholarGoogle ScholarCross RefCross Ref
  34. A. Piquard-Kipffer, O. Mella, J. Miranda, D. Jouvet, L. Orosanu. 2015. Qualitative investigation of the display of speech recognition results for communication with deaf people. In SLPAT'15, 36-41. Assoc. of Comp. Linguistics.Google ScholarGoogle ScholarCross RefCross Ref
  35. S.S. Prietch, N.S. de Souza, L.V.L. Filgueiras. 2014. A speech-to-text system's acceptance evaluation: would deaf individuals adopt this technology in their lives? In UAHCI'14, 440-449.Google ScholarGoogle ScholarCross RefCross Ref
  36. R. Rashid, Q. Vy, R.G. Hunt, D.I. Fels. 2007. Dancing with words. In Proc C&C'07, ACM, NY, NY, USA, 269-270. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. K. Rathbun, L. Berke, C. Caulfield, M. Stinson, M. Huenerfauth. 2017. Eye movements of deaf and hard of hearing viewers of automatic captions. Journal on Technology & Persons with Disabilities, 5, 130-140, http://hdl.handle.net/10211.3/190208Google ScholarGoogle Scholar
  38. A. Secarã. 2011. RU ready 4 new subtitles? Investigating the potential of social translation practices and creative spellings. In M. O'Hagan (ed.), Translation as social activity: Community translation 2.0 {Special issue}. Linguistica Antverpiensia New Series, 10, 153-172.Google ScholarGoogle Scholar
  39. B. Shiver, R. Wolfe. 2015. Evaluating alternatives for better deaf accessibility to selected web-based multimedia. In Proc ASSETS'15, ACM, New York, NY, USA, 223-230. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Strauss, A., & Corbin, J. (1998). Basics of qualitative research: Techniques and procedures for developing grounded theory (2nd ed.). Thousand Oaks, CA: Sage.Google ScholarGoogle Scholar
  41. A. Szarkowska, I. Krejtz, Z. Klyszejko, A. Wieczorek. 2011. Verbatim, standard, or edited?: Reading patterns of different captioning styles among deaf, hard of hearing, and hearing viewers. American annals of the deaf, 156(4):363-378.Google ScholarGoogle Scholar
  42. K. Vertanen, P.O. Kristensson. 2008. On the benefits of confidence visualization in speech recognition. In Proc CHI'08, ACM, New York, NY, USA, 1497-1500. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Q.V. Vy. 2012. Enhanced captioning: speaker identification using graphical and text-based identifiers. Theses and dissertations. Paper 1702.Google ScholarGoogle Scholar
  44. M. Wald. 2005. Using automatic speech recognition to enhance education for all students: Turning a vision into reality. In Proc IEEE FIE'05, S3G-S3G. IEEE.Google ScholarGoogle ScholarCross RefCross Ref
  45. F. Wang, H. Nagano, K. Kashino, T. Igarashi. 2015. Visualizing video sounds with sound word animation. In Proc ICME'15, 1-6. IEEE.Google ScholarGoogle ScholarCross RefCross Ref
  46. W. Xiong, J. Droppo, X. Huang, F. Seide,. Seltzer, A. Stolcke, D. Yu, G. Zweig. 2016. Achieving human parity in conversational speech recognition. Computing Research Repository (CoRR), http://arxiv.org/abs/1610.05256Google ScholarGoogle Scholar
  47. G.N. Yannakakis, J. Hallam. 2011. Ranking vs. Preference: A Comparative Study of Self-reporting, In: S. D'Mello, A. Graesser, B. Schuller, J.C. Martin JC (eds) Affective Computing and Intelligent Interaction. ACII 2011. Lecture Notes in Computer Science, 6974. Springer, Berlin. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Deaf and Hard-of-Hearing Perspectives on Imperfect Automatic Speech Recognition for Captioning One-on-One Meetings

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          ASSETS '17: Proceedings of the 19th International ACM SIGACCESS Conference on Computers and Accessibility
          October 2017
          450 pages
          ISBN:9781450349260
          DOI:10.1145/3132525

          Copyright © 2017 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 19 October 2017

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          ASSETS '17 Paper Acceptance Rate28of126submissions,22%Overall Acceptance Rate436of1,556submissions,28%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader