research-article

Deaf and Hard-of-Hearing Perspectives on Imperfect Automatic Speech Recognition for Captioning One-on-One Meetings

Authors:
Larwan Berke

Rochester Institute of Technology, Rochester, NY, USA

Rochester Institute of Technology, Rochester, NY, USA
View Profile

,
Christopher Caulfield

Rochester Institute of Technology, Rochester, NY, USA

Rochester Institute of Technology, Rochester, NY, USA
View Profile

,
Matt Huenerfauth

Rochester Institute of Technology, Rochester, NY, USA

Rochester Institute of Technology, Rochester, NY, USA
View Profile

ASSETS '17: Proceedings of the 19th International ACM SIGACCESS Conference on Computers and AccessibilityOctober 2017Pages 155–164https://doi.org/10.1145/3132525.3132541

Published:19 October 2017Publication History

ASSETS '17: Proceedings of the 19th International ACM SIGACCESS Conference on Computers and Accessibility

Pages 155–164

ABSTRACT

Recent advances in Automatic Speech Recognition (ASR) have made this technology a potential solution for transcribing audio input in real-time for people who are Deaf or Hard of Hearing (DHH). However, ASR is imperfect; users must cope with errors in the output. While some prior research has studied ASR-generated transcriptions to provide captions for DHH people, there has not been a systematic study of how to best present captions that may include errors from ASR software nor how to make use of the ASR system's word-level confidence. We conducted two studies, with 21 and 107 DHH participants, to compare various methods of visually presenting the ASR output with certainty values. Participants answered subjective preference questions and provided feedback on how ASR captioning could be used with confidence display markup. Users preferred captioning styles with which they were already most familiar (that did not display confidence information), and they were concerned about the accuracy of ASR systems. While they expressed interest in systems that display word confidence during captions, they were concerned that text appearance changes may be distracting. The findings of this study should be useful for researchers and companies developing automated captioning systems for DHH users.

References

J. Albertini, C. Mayer. 2011. Using miscue analysis to assess comprehension in deaf college readers. The Journal of Deaf Studies and Deaf Education, 16(1):35.Google ScholarCross Ref
Consumer Electronics Association. 2013. Cea-708-e (ansi): Digital television (dtv) closed captioning.Google Scholar
F. G. Bowe. 2002. Deaf and hard of hearing Americans' instant messaging and e-mail use: A national survey. American Annals of the Deaf, 147(4):6-10.Google ScholarCross Ref
A. Brandão, H. Nicolau, S. Tadas, V.L. Hanson. 2016. Slidepacer: A presentation delivery tool for instructors of deaf and hard of hearing students. In Proceedings of the 18th International ACM SIGACCESS Conference on Computers and Accessibility (ASSETS '16), ACM, New York, NY, USA, 25-32. Google ScholarDigital Library
CaptionMax. 1993. Suggested Styles and Conventions for Closed Captioning. WGBH Educational Foundation, Boston.Google Scholar
Federal Communications Commission. 2015. Video relay services. http://www.fcc.gov/guides/video-relay-servicesGoogle Scholar
M. Crabb, R. Jones, M. Armstrong, C.J. Hughes. 2015. Online news videos: The UX of subtitle position. In Proceedings of the 17th International ACM SIGACCESS Conference on Computers & Accessibility (ASSETS '15), ACM, New York, NY, USA, 215-222. Google ScholarDigital Library
M. Davoudi, M. B. Menhaj, N. S. Naraghi, A. Aref, M. Davoodi, M. Davoudi. 2012. A fuzzy logic-based video subtitle and caption coloring system. Advances in Fuzzy Systems. Google ScholarDigital Library
DCMP. 2016. DCMP Captioning Key - Quality Captioning. http://www.captioningkey.org/quality_captioning.htmlGoogle Scholar
L. Elliot, M. Stinson, J. Mallory, D. Easton, M. Huenerfauth. 2016. Deaf and hard of hearing individuals' perceptions of communication with hearing colleagues in small groups. In Proceedings of the 18th International ACM SIGACCESS Conference on Computers and Accessibility (ASSETS '16), ACM, New York, NY, USA, 271-272. Google ScholarDigital Library
L.B. Elliot, M.S. Stinson, B.G. McKee, V. S. Everhart, P.J. Francis. 2001. College students' perceptions of the c-print speech-to-text transcription system. Journal of deaf studies and deaf education, 6(4):285-298.Google ScholarCross Ref
V. Ferdiansyah, S. Nakagawa. 2013. Effect of captioning lecture videos for learning in foreign language. In Proc. SLP Meeting of Info. Processing Society of Japan. SLP-97, No. 3.Google Scholar
D.R. Flatla, C. Gutwin. 2012. Situation-specific models of color differentiation. ACM Trans. Access. Comput. 4, 3, Article 13, 44 pages. Google ScholarDigital Library
R. Flesch. 1948. A new readability yardstick. Journal of applied psychology, 32(3):221-223.Google ScholarCross Ref
Y. Gaur, W.S. Lasecki, F. Metze, J.P. Bigham. 2016. The effects of automatic speech recognition quality on human transcription latency. In Proceedings of the 13th Web for All Conference (W4A '16). ACM, New York, NY, USA, Article 23, 8 pages. Google ScholarDigital Library
S.R. Gulliver, G. Ghinea. 2003. How level and type of deafness affect user perception of multimedia video clips. Universal Access in the Information Society, 2(4):374-386. Google ScholarDigital Library
R.P. Harrington, G.C. Vanderheiden. 2013. Crowd caption correction (ccc). In Proceedings of the 15th International ACM SIGACCESS Conference on Computers and Accessibility (ASSETS '13). ACM, New York, NY, USA, Article 45, 2 pages. Google ScholarDigital Library
R. Hong, M. Wang, M. Xu, S. Yan, T.-S. Chua. 2010. Dynamic captioning: video accessibility enhancement for hearing impairment. In Proceedings of the 18th ACM international conference on Multimedia (MM '10), ACM, New York, NY, USA, 421-430. Google ScholarDigital Library
X. Huang, J. Baker, R. Reddy. 2014. A historical perspective of speech recognition. Comm. ACM, 57(1):94-103. Google ScholarDigital Library
D.W. Jackson, P.V. Paul, J.C. Smith. 1997. Prior knowledge and reading comprehension ability of deaf adolescents. Journal of Deaf Studies and Deaf Education, 2(3):172-184.Google ScholarCross Ref
J. Jankowski, K. Samp, I. Irzynska, M. Jozwowicz, S. Decker. 2010. Integrating text with video and 3d graphics: The effects of text drawing styles on text readability. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '10). ACM, New York, NY, USA, 1321-1330. Google ScholarDigital Library
L. Jefferson, R. Harvey. 2006. Accommodating color blind computer users. In Proceedings of the 8th international ACM SIGACCESS conference on Computers and accessibility (Assets '06). ACM, New York, NY, USA, 40-47, Google ScholarDigital Library
S. Kawas, G. Karalis, T. Wen, R.E. Ladner. 2016. Improving real-time captioning experiences for deaf and hard of hearing students. In Proceedings of the 18th International ACM SIGACCESS Conference on Computers and Accessibility (ASSETS '16). ACM, New York, NY, USA, 15-23, Google ScholarDigital Library
R. Kheir, T. Way. 2007. Inclusion of deaf students in computer science classes using real-time speech transcription. In ITiCSE'07, ACM, NY, NY, USA, 261-265, Google ScholarDigital Library
R.S. Kushalnagar, G.W. Behm, A.W. Kelstone, S. Ali. 2015. Tracked speech-to-text display: Enhancing accessibility and readability of real-time speech-to-text. In Proc ASSETS'15, ACM, NY, NY, USA, 223-230, Google ScholarDigital Library
R.S. Kushalnagar, W.S. Lasecki, J.P. Bigham. 2014. Accessibility evaluation of classroom captions. ACM Trans. Access. Comput. 5, 3, Article 7, 24 pages. Google ScholarDigital Library
P. Lamere, P. Kwok, E. Gouvea, B. Raj, R. Singh, W. Walker, M. Warmuth, P. Wolf. 2003. The Carnegie Mellon University sphinx-4 speech recognition system. In Proc ICASSP'03, 2-5.Google Scholar
D.G. Lee, D.I. Fels, J.P. Udo. 2007. Emotive captioning. Computers in Entertainment (CIE), 5(2):11. Google ScholarDigital Library
M. Marschark, G. Leigh, P. Sapere, D. Burnham, C. Convertino, M. Stinson, H. Knoors, M. P. Vervloed, W. Noble. 2006. Benefits of sign language interpreting and text alternatives for deaf students' classroom learning. Journal of deaf studies and deaf education, 11(4):421-437.Google ScholarCross Ref
D. Miller, K. Gyllstrom, D. Stotts, J. Culp. 2007. Semi-transparent video interfaces to assist deaf persons in meetings. In Proc ACM-SE 45, ACM, NY, NY, USA, 501-506. Google ScholarDigital Library
M. R. Mirzaei, S. Ghorshi, M. Mortazavi. 2012. Using augmented reality and automatic speech recognition techniques to help deaf and hard of hearing people. In Proc VRIC'12, ACM, NY, NY, USA, Article 5, 4 pages. Google ScholarDigital Library
H. Nambo, S. Seto, H. Arai, K. Sugimori, Y. Shimomura, H. Kawabe. 2012. Visualization of non-verbal expressions in voice for hearing impaired: Ambient font and onomatopoeic subsystem. In Proc ICCHP'12 volume 1, 492-499. Google ScholarDigital Library
S.J. Parault, H.M. Williams. 2010. Reading motivation, reading amount, and text comprehension in deaf and hearing adults. J. of Deaf Studies and Deaf Educ., 15(2):120-135.Google ScholarCross Ref
A. Piquard-Kipffer, O. Mella, J. Miranda, D. Jouvet, L. Orosanu. 2015. Qualitative investigation of the display of speech recognition results for communication with deaf people. In SLPAT'15, 36-41. Assoc. of Comp. Linguistics.Google ScholarCross Ref
S.S. Prietch, N.S. de Souza, L.V.L. Filgueiras. 2014. A speech-to-text system's acceptance evaluation: would deaf individuals adopt this technology in their lives? In UAHCI'14, 440-449.Google ScholarCross Ref
R. Rashid, Q. Vy, R.G. Hunt, D.I. Fels. 2007. Dancing with words. In Proc C&C'07, ACM, NY, NY, USA, 269-270. Google ScholarDigital Library
K. Rathbun, L. Berke, C. Caulfield, M. Stinson, M. Huenerfauth. 2017. Eye movements of deaf and hard of hearing viewers of automatic captions. Journal on Technology & Persons with Disabilities, 5, 130-140, http://hdl.handle.net/10211.3/190208Google Scholar
A. Secarã. 2011. RU ready 4 new subtitles? Investigating the potential of social translation practices and creative spellings. In M. O'Hagan (ed.), Translation as social activity: Community translation 2.0 {Special issue}. Linguistica Antverpiensia New Series, 10, 153-172.Google Scholar
B. Shiver, R. Wolfe. 2015. Evaluating alternatives for better deaf accessibility to selected web-based multimedia. In Proc ASSETS'15, ACM, New York, NY, USA, 223-230. Google ScholarDigital Library
Strauss, A., & Corbin, J. (1998). Basics of qualitative research: Techniques and procedures for developing grounded theory (2nd ed.). Thousand Oaks, CA: Sage.Google Scholar
A. Szarkowska, I. Krejtz, Z. Klyszejko, A. Wieczorek. 2011. Verbatim, standard, or edited?: Reading patterns of different captioning styles among deaf, hard of hearing, and hearing viewers. American annals of the deaf, 156(4):363-378.Google Scholar
K. Vertanen, P.O. Kristensson. 2008. On the benefits of confidence visualization in speech recognition. In Proc CHI'08, ACM, New York, NY, USA, 1497-1500. Google ScholarDigital Library
Q.V. Vy. 2012. Enhanced captioning: speaker identification using graphical and text-based identifiers. Theses and dissertations. Paper 1702.Google Scholar
M. Wald. 2005. Using automatic speech recognition to enhance education for all students: Turning a vision into reality. In Proc IEEE FIE'05, S3G-S3G. IEEE.Google ScholarCross Ref
F. Wang, H. Nagano, K. Kashino, T. Igarashi. 2015. Visualizing video sounds with sound word animation. In Proc ICME'15, 1-6. IEEE.Google ScholarCross Ref
W. Xiong, J. Droppo, X. Huang, F. Seide,. Seltzer, A. Stolcke, D. Yu, G. Zweig. 2016. Achieving human parity in conversational speech recognition. Computing Research Repository (CoRR), http://arxiv.org/abs/1610.05256Google Scholar
G.N. Yannakakis, J. Hallam. 2011. Ranking vs. Preference: A Comparative Study of Self-reporting, In: S. D'Mello, A. Graesser, B. Schuller, J.C. Martin JC (eds) Affective Computing and Intelligent Interaction. ACII 2011. Lecture Notes in Computer Science, 6974. Springer, Berlin. Google ScholarDigital Library

Index Terms

Deaf and Hard-of-Hearing Perspectives on Imperfect Automatic Speech Recognition for Captioning One-on-One Meetings
1. Human-centered computing
  1. Accessibility
    1. Accessibility design and evaluation methods
    2. Empirical studies in accessibility
2. Social and professional topics
  1. Professional topics
    1. Computing profession
      1. Assistive technologies

Recommendations

Behavioral Changes in Speakers who are Automatically Captioned in Meetings with Deaf or Hard-of-Hearing Peers
ASSETS '18: Proceedings of the 20th International ACM SIGACCESS Conference on Computers and Accessibility

Deaf and hard of hearing (DHH) individuals face barriers to communication in small-group meetings with hearing peers; we examine generation of captions on mobile devices by automatic speech recognition (ASR). While ASR output displays errors, we study ...
Read More
Exploration of Automatic Speech Recognition for Deaf and Hard of Hearing Students in Higher Education Classes
ASSETS '19: Proceedings of the 21st International ACM SIGACCESS Conference on Computers and Accessibility

Automatic speech recognition (ASR) programs that generate real-time speech-to-text captions can be provided as supplemental access technologies for deaf and hard of hearing (DHH) students in higher education classes. As part of a pilot program, we ...
Read More
Improving Real-Time Captioning Experiences for Deaf and Hard of Hearing Students
ASSETS '16: Proceedings of the 18th International ACM SIGACCESS Conference on Computers and Accessibility

We take a qualitative approach to understanding deaf and hard of hearing (DHH) students' experiences with real-time captioning as an access technology in mainstream university classrooms. We consider both existing human-based captioning as well as new ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ASSETS '17: Proceedings of the 19th International ACM SIGACCESS Conference on Computers and Accessibility
October 2017
450 pages
ISBN:9781450349260
DOI:10.1145/3132525
General Chair:
Amy Hurst
University of Maryland, Baltimore County, USA
,
Program Chairs:
Leah Findlater
University of Washington, USA
,
Meredith Ringel Morris
Microsoft Research, USA
Copyright © 2017 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 19 October 2017
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
automatic speech recognition
communication
deaf and hard of hearing
feedback
real-time captions
user study
Qualifiers
- research-article
Conference

Acceptance Rates
ASSETS '17 Paper Acceptance Rate28of126submissions,22%Overall Acceptance Rate436of1,556submissions,28%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 40
  Total Citations
  View Citations
- 1,002
  Total Downloads
- Downloads (Last 12 months)156
- Downloads (Last 6 weeks)14
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Deaf and Hard-of-Hearing Perspectives on Imperfect Automatic Speech Recognition for Captioning One-on-One Meetings

ASSETS '17: Proceedings of the 19th International ACM SIGACCESS Conference on Computers and Accessibility

ABSTRACT

References

Cited By

Index Terms

Recommendations

Behavioral Changes in Speakers who are Automatically Captioned in Meetings with Deaf or Hard-of-Hearing Peers

Exploration of Automatic Speech Recognition for Deaf and Hard of Hearing Students in Higher Education Classes

Improving Real-Time Captioning Experiences for Deaf and Hard of Hearing Students

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media