ABSTRACT
Recent advances in Automatic Speech Recognition (ASR) have made this technology a potential solution for transcribing audio input in real-time for people who are Deaf or Hard of Hearing (DHH). However, ASR is imperfect; users must cope with errors in the output. While some prior research has studied ASR-generated transcriptions to provide captions for DHH people, there has not been a systematic study of how to best present captions that may include errors from ASR software nor how to make use of the ASR system's word-level confidence. We conducted two studies, with 21 and 107 DHH participants, to compare various methods of visually presenting the ASR output with certainty values. Participants answered subjective preference questions and provided feedback on how ASR captioning could be used with confidence display markup. Users preferred captioning styles with which they were already most familiar (that did not display confidence information), and they were concerned about the accuracy of ASR systems. While they expressed interest in systems that display word confidence during captions, they were concerned that text appearance changes may be distracting. The findings of this study should be useful for researchers and companies developing automated captioning systems for DHH users.
- J. Albertini, C. Mayer. 2011. Using miscue analysis to assess comprehension in deaf college readers. The Journal of Deaf Studies and Deaf Education, 16(1):35.Google ScholarCross Ref
- Consumer Electronics Association. 2013. Cea-708-e (ansi): Digital television (dtv) closed captioning.Google Scholar
- F. G. Bowe. 2002. Deaf and hard of hearing Americans' instant messaging and e-mail use: A national survey. American Annals of the Deaf, 147(4):6-10.Google ScholarCross Ref
- A. Brandão, H. Nicolau, S. Tadas, V.L. Hanson. 2016. Slidepacer: A presentation delivery tool for instructors of deaf and hard of hearing students. In Proceedings of the 18th International ACM SIGACCESS Conference on Computers and Accessibility (ASSETS '16), ACM, New York, NY, USA, 25-32. Google ScholarDigital Library
- CaptionMax. 1993. Suggested Styles and Conventions for Closed Captioning. WGBH Educational Foundation, Boston.Google Scholar
- Federal Communications Commission. 2015. Video relay services. http://www.fcc.gov/guides/video-relay-servicesGoogle Scholar
- M. Crabb, R. Jones, M. Armstrong, C.J. Hughes. 2015. Online news videos: The UX of subtitle position. In Proceedings of the 17th International ACM SIGACCESS Conference on Computers & Accessibility (ASSETS '15), ACM, New York, NY, USA, 215-222. Google ScholarDigital Library
- M. Davoudi, M. B. Menhaj, N. S. Naraghi, A. Aref, M. Davoodi, M. Davoudi. 2012. A fuzzy logic-based video subtitle and caption coloring system. Advances in Fuzzy Systems. Google ScholarDigital Library
- DCMP. 2016. DCMP Captioning Key - Quality Captioning. http://www.captioningkey.org/quality_captioning.htmlGoogle Scholar
- L. Elliot, M. Stinson, J. Mallory, D. Easton, M. Huenerfauth. 2016. Deaf and hard of hearing individuals' perceptions of communication with hearing colleagues in small groups. In Proceedings of the 18th International ACM SIGACCESS Conference on Computers and Accessibility (ASSETS '16), ACM, New York, NY, USA, 271-272. Google ScholarDigital Library
- L.B. Elliot, M.S. Stinson, B.G. McKee, V. S. Everhart, P.J. Francis. 2001. College students' perceptions of the c-print speech-to-text transcription system. Journal of deaf studies and deaf education, 6(4):285-298.Google ScholarCross Ref
- V. Ferdiansyah, S. Nakagawa. 2013. Effect of captioning lecture videos for learning in foreign language. In Proc. SLP Meeting of Info. Processing Society of Japan. SLP-97, No. 3.Google Scholar
- D.R. Flatla, C. Gutwin. 2012. Situation-specific models of color differentiation. ACM Trans. Access. Comput. 4, 3, Article 13, 44 pages. Google ScholarDigital Library
- R. Flesch. 1948. A new readability yardstick. Journal of applied psychology, 32(3):221-223.Google ScholarCross Ref
- Y. Gaur, W.S. Lasecki, F. Metze, J.P. Bigham. 2016. The effects of automatic speech recognition quality on human transcription latency. In Proceedings of the 13th Web for All Conference (W4A '16). ACM, New York, NY, USA, Article 23, 8 pages. Google ScholarDigital Library
- S.R. Gulliver, G. Ghinea. 2003. How level and type of deafness affect user perception of multimedia video clips. Universal Access in the Information Society, 2(4):374-386. Google ScholarDigital Library
- R.P. Harrington, G.C. Vanderheiden. 2013. Crowd caption correction (ccc). In Proceedings of the 15th International ACM SIGACCESS Conference on Computers and Accessibility (ASSETS '13). ACM, New York, NY, USA, Article 45, 2 pages. Google ScholarDigital Library
- R. Hong, M. Wang, M. Xu, S. Yan, T.-S. Chua. 2010. Dynamic captioning: video accessibility enhancement for hearing impairment. In Proceedings of the 18th ACM international conference on Multimedia (MM '10), ACM, New York, NY, USA, 421-430. Google ScholarDigital Library
- X. Huang, J. Baker, R. Reddy. 2014. A historical perspective of speech recognition. Comm. ACM, 57(1):94-103. Google ScholarDigital Library
- D.W. Jackson, P.V. Paul, J.C. Smith. 1997. Prior knowledge and reading comprehension ability of deaf adolescents. Journal of Deaf Studies and Deaf Education, 2(3):172-184.Google ScholarCross Ref
- J. Jankowski, K. Samp, I. Irzynska, M. Jozwowicz, S. Decker. 2010. Integrating text with video and 3d graphics: The effects of text drawing styles on text readability. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '10). ACM, New York, NY, USA, 1321-1330. Google ScholarDigital Library
- L. Jefferson, R. Harvey. 2006. Accommodating color blind computer users. In Proceedings of the 8th international ACM SIGACCESS conference on Computers and accessibility (Assets '06). ACM, New York, NY, USA, 40-47, Google ScholarDigital Library
- S. Kawas, G. Karalis, T. Wen, R.E. Ladner. 2016. Improving real-time captioning experiences for deaf and hard of hearing students. In Proceedings of the 18th International ACM SIGACCESS Conference on Computers and Accessibility (ASSETS '16). ACM, New York, NY, USA, 15-23, Google ScholarDigital Library
- R. Kheir, T. Way. 2007. Inclusion of deaf students in computer science classes using real-time speech transcription. In ITiCSE'07, ACM, NY, NY, USA, 261-265, Google ScholarDigital Library
- R.S. Kushalnagar, G.W. Behm, A.W. Kelstone, S. Ali. 2015. Tracked speech-to-text display: Enhancing accessibility and readability of real-time speech-to-text. In Proc ASSETS'15, ACM, NY, NY, USA, 223-230, Google ScholarDigital Library
- R.S. Kushalnagar, W.S. Lasecki, J.P. Bigham. 2014. Accessibility evaluation of classroom captions. ACM Trans. Access. Comput. 5, 3, Article 7, 24 pages. Google ScholarDigital Library
- P. Lamere, P. Kwok, E. Gouvea, B. Raj, R. Singh, W. Walker, M. Warmuth, P. Wolf. 2003. The Carnegie Mellon University sphinx-4 speech recognition system. In Proc ICASSP'03, 2-5.Google Scholar
- D.G. Lee, D.I. Fels, J.P. Udo. 2007. Emotive captioning. Computers in Entertainment (CIE), 5(2):11. Google ScholarDigital Library
- M. Marschark, G. Leigh, P. Sapere, D. Burnham, C. Convertino, M. Stinson, H. Knoors, M. P. Vervloed, W. Noble. 2006. Benefits of sign language interpreting and text alternatives for deaf students' classroom learning. Journal of deaf studies and deaf education, 11(4):421-437.Google ScholarCross Ref
- D. Miller, K. Gyllstrom, D. Stotts, J. Culp. 2007. Semi-transparent video interfaces to assist deaf persons in meetings. In Proc ACM-SE 45, ACM, NY, NY, USA, 501-506. Google ScholarDigital Library
- M. R. Mirzaei, S. Ghorshi, M. Mortazavi. 2012. Using augmented reality and automatic speech recognition techniques to help deaf and hard of hearing people. In Proc VRIC'12, ACM, NY, NY, USA, Article 5, 4 pages. Google ScholarDigital Library
- H. Nambo, S. Seto, H. Arai, K. Sugimori, Y. Shimomura, H. Kawabe. 2012. Visualization of non-verbal expressions in voice for hearing impaired: Ambient font and onomatopoeic subsystem. In Proc ICCHP'12 volume 1, 492-499. Google ScholarDigital Library
- S.J. Parault, H.M. Williams. 2010. Reading motivation, reading amount, and text comprehension in deaf and hearing adults. J. of Deaf Studies and Deaf Educ., 15(2):120-135.Google ScholarCross Ref
- A. Piquard-Kipffer, O. Mella, J. Miranda, D. Jouvet, L. Orosanu. 2015. Qualitative investigation of the display of speech recognition results for communication with deaf people. In SLPAT'15, 36-41. Assoc. of Comp. Linguistics.Google ScholarCross Ref
- S.S. Prietch, N.S. de Souza, L.V.L. Filgueiras. 2014. A speech-to-text system's acceptance evaluation: would deaf individuals adopt this technology in their lives? In UAHCI'14, 440-449.Google ScholarCross Ref
- R. Rashid, Q. Vy, R.G. Hunt, D.I. Fels. 2007. Dancing with words. In Proc C&C'07, ACM, NY, NY, USA, 269-270. Google ScholarDigital Library
- K. Rathbun, L. Berke, C. Caulfield, M. Stinson, M. Huenerfauth. 2017. Eye movements of deaf and hard of hearing viewers of automatic captions. Journal on Technology & Persons with Disabilities, 5, 130-140, http://hdl.handle.net/10211.3/190208Google Scholar
- A. Secarã. 2011. RU ready 4 new subtitles? Investigating the potential of social translation practices and creative spellings. In M. O'Hagan (ed.), Translation as social activity: Community translation 2.0 {Special issue}. Linguistica Antverpiensia New Series, 10, 153-172.Google Scholar
- B. Shiver, R. Wolfe. 2015. Evaluating alternatives for better deaf accessibility to selected web-based multimedia. In Proc ASSETS'15, ACM, New York, NY, USA, 223-230. Google ScholarDigital Library
- Strauss, A., & Corbin, J. (1998). Basics of qualitative research: Techniques and procedures for developing grounded theory (2nd ed.). Thousand Oaks, CA: Sage.Google Scholar
- A. Szarkowska, I. Krejtz, Z. Klyszejko, A. Wieczorek. 2011. Verbatim, standard, or edited?: Reading patterns of different captioning styles among deaf, hard of hearing, and hearing viewers. American annals of the deaf, 156(4):363-378.Google Scholar
- K. Vertanen, P.O. Kristensson. 2008. On the benefits of confidence visualization in speech recognition. In Proc CHI'08, ACM, New York, NY, USA, 1497-1500. Google ScholarDigital Library
- Q.V. Vy. 2012. Enhanced captioning: speaker identification using graphical and text-based identifiers. Theses and dissertations. Paper 1702.Google Scholar
- M. Wald. 2005. Using automatic speech recognition to enhance education for all students: Turning a vision into reality. In Proc IEEE FIE'05, S3G-S3G. IEEE.Google ScholarCross Ref
- F. Wang, H. Nagano, K. Kashino, T. Igarashi. 2015. Visualizing video sounds with sound word animation. In Proc ICME'15, 1-6. IEEE.Google ScholarCross Ref
- W. Xiong, J. Droppo, X. Huang, F. Seide,. Seltzer, A. Stolcke, D. Yu, G. Zweig. 2016. Achieving human parity in conversational speech recognition. Computing Research Repository (CoRR), http://arxiv.org/abs/1610.05256Google Scholar
- G.N. Yannakakis, J. Hallam. 2011. Ranking vs. Preference: A Comparative Study of Self-reporting, In: S. D'Mello, A. Graesser, B. Schuller, J.C. Martin JC (eds) Affective Computing and Intelligent Interaction. ACII 2011. Lecture Notes in Computer Science, 6974. Springer, Berlin. Google ScholarDigital Library
Index Terms
- Deaf and Hard-of-Hearing Perspectives on Imperfect Automatic Speech Recognition for Captioning One-on-One Meetings
Recommendations
Behavioral Changes in Speakers who are Automatically Captioned in Meetings with Deaf or Hard-of-Hearing Peers
ASSETS '18: Proceedings of the 20th International ACM SIGACCESS Conference on Computers and AccessibilityDeaf and hard of hearing (DHH) individuals face barriers to communication in small-group meetings with hearing peers; we examine generation of captions on mobile devices by automatic speech recognition (ASR). While ASR output displays errors, we study ...
Exploration of Automatic Speech Recognition for Deaf and Hard of Hearing Students in Higher Education Classes
ASSETS '19: Proceedings of the 21st International ACM SIGACCESS Conference on Computers and AccessibilityAutomatic speech recognition (ASR) programs that generate real-time speech-to-text captions can be provided as supplemental access technologies for deaf and hard of hearing (DHH) students in higher education classes. As part of a pilot program, we ...
Improving Real-Time Captioning Experiences for Deaf and Hard of Hearing Students
ASSETS '16: Proceedings of the 18th International ACM SIGACCESS Conference on Computers and AccessibilityWe take a qualitative approach to understanding deaf and hard of hearing (DHH) students' experiences with real-time captioning as an access technology in mainstream university classrooms. We consider both existing human-based captioning as well as new ...
Comments