ABSTRACT
Deaf and hard of hearing (DHH) individuals face barriers to communication in small-group meetings with hearing peers; we examine generation of captions on mobile devices by automatic speech recognition (ASR). While ASR output displays errors, we study whether such tools benefit users and influence conversational behaviors. An experiment was conducted where DHH and hearing individuals collaborated in discussions in three conditions (without an ASR-based application, with the application, and with a version indicating words for which the ASR has low confidence). An analysis of audio recordings, from each participant across conditions, revealed significant differences in speech features. When using the ASR-based automatic captioning application, hearing individuals spoke more loudly, with improved voice quality (harmonics-to-noise ratio), with a non-standard articulation (changes in F1 and F2 formants), and at a faster rate. Identifying non-standard speech in this setting has implications on the composition of data used for ASR training/testing, which should be representative of its usage context. Understanding these behavioral influences may also enable designers of ASR captioning systems to leverage these effects, to promote communication success.
- ANSI/ASA S3.5--1997. 1997. American National Standard Methods for Calculation of the Speech Intelligibility Index. Acoustical Society of America (ASA) and American National Standards Institute (ANSI), New York, NY, USA.Google Scholar
- Keith Bain, Sara H. Basson, and Mike Wald. 2002. Speech recognition in university classrooms: liberated learning project. In Proceedings of the fifth international ACM conference on Assistive technologies (Assets '02). ACM, New York, NY, USA, 192--196. Google ScholarDigital Library
- Jon P. Barker, Ricard Marxer, Emmanuel Vincent, Shinji Watanabe. 2017. The CHiME challenges: Robust speech recognition in everyday environments. In: Watanabe S., Delcroix M., Metze F., Hershey J. (eds.), New Era for Robust Speech Recognition. Springer, Cham, 327--344.Google Scholar
- Larwan Berke, Christopher Caulfield, and Matt Huenerfauth. 2017. Deaf and Hard-of-Hearing Perspectives on Imperfect Automatic Speech Recognition for Captioning One-on-One Meetings. In Proceedings of the 19th International ACM SIGACCESS Conference on Computers and Accessibility (ASSETS '17). ACM, NY, NY, USA, 155--164. Google ScholarDigital Library
- Larwan Berke, Sushant Kafle, Matt Huenerfauth. 2018. Methods for Evaluation of Imperfect Captioning Tools by Deaf or Hard-of-Hearing Users at Different Reading Literacy Levels. In Proceedings of the ACM CHI Conference on Human Factors in Computing Systems (CHI'18). ACM, New York, NY, USA. Google ScholarDigital Library
- Paul Boersma and David Weenink. 2018. Praat: doing phonetics by computer {Computer program}. Version 6.0.39. Retrieved April 3, 2018, from http://www.praat.org/Google Scholar
- Denis Burnham, Sebastian Joeffry, Lauren Rice. 2010. Computer- and Human-Directed Speech Before and After Correction. In Proceedings of the 9th Speech Science and Technology Conference 2010, Melbourne, Australia. Australian Speech Science and Technology Association.Google Scholar
- Esteban Buz, Michael K. Tanenhaus, and T. Florian Jaeger. 2016. Dynamically adapted context-specific hyper-articulation: Feedback from interlocutors affects speakers' subsequent pronunciations. Journal of Memory and Language 89, Supplement C (2016), 68 -- 86.Google ScholarCross Ref
- ELAN (Version 5.0.0-beta) {Computer software}. 2017. Nijmegen: Max Planck Institute for Psycholinguistics. Retrieved from https://tla.mpi.nl/tools/tla-tools/elan/Google Scholar
- Lisa B. Elliot, Michael Stinson, James Mallory, Donna Easton, and Matt Huenerfauth. 2016. Deaf and Hard of Hearing Individuals' Perceptions of Communication with Hearing Colleagues in Small Groups. In Proceedings of the 18th International ACM SIGACCESS Conference on Computers and Accessibility (ASSETS '16). ACM, NY, NY, USA, 271--272. Google ScholarDigital Library
- Lisa B. Elliot, Michael Stinson, Syed Ahmed, and Donna Easton. 2017. User Experiences When Testing a Messaging App for Communication Between Individuals Who Are Hearing and Deaf or Hard of Hearing. In Proceedings of the 19th International ACM SIGACCESS Conference on Computers and Accessibility (ASSETS '17). ACM, NY, NY, USA, 405--406. Google ScholarDigital Library
- Maria Federico and Marco Furini. 2012. Enhancing Learning Accessibility Through Fully Automatic Captioning. In Proceedings of the International CrossDisciplinary Conference on Web Accessibility (W4A '12). ACM, New York, NY, USA, Article 40, 4 pages. Google ScholarDigital Library
- Ira R. Forman, Ben Fletcher, John Hartley, Bill Rippon, and Allen Wilson. 2012. Blue Herd: Automated Captioning for Videoconferences. In Proceedings of the 14th International ACM SIGACCESS Conference on Computers and Accessibility (ASSETS '12). ACM, NY, NY, USA, 227--228. Google ScholarDigital Library
- Carrie Lou Garberoglio, Stephanie Cawthon and Mark Bond. 2016. Deaf People and Employment in the United States: 2016. National Deaf Center on Postsecondary Outcomes. Retrieved on April 15, 2018, from https://www.nationaldeafcenter.org/sites/default/files/Deaf%20Employment%20Report_final.pdfGoogle Scholar
- Jay Hall. 1983. The rejection of deviates as a function of threat. Doctoral Dissertation, University of Texas.Google Scholar
- Hearing Loss Association of America. 2017. Basic Facts About Hearing Loss. Retrieved December 17, 2017 from http://www.hearingloss.org/content/basic-facts-about-hearing-lossGoogle Scholar
- Sushant Kafle and Matt Huenerfauth. 2017. Evaluating the Usability of Automatically Generated Captions for People Who Are Deaf or Hard of Hearing. In Proceedings of the 19th International ACM SIGACCESS Conference on Computers and Accessibility (ASSETS '17). ACM, NY, NY, USA, 165--174. Google ScholarDigital Library
- Saba Kawas, George Karalis, Tzu Wen, and Richard E. Ladner. 2016. Improving Real-Time Captioning Experiences for Deaf and Hard of Hearing Students. In Proceedings of the 18th International ACM SIGACCESS Conference on Computers and Accessibility (ASSETS '16). ACM, NY, NY, USA, 15--23. Google ScholarDigital Library
- Ronald R. Kelly. 2015. The Employment and Career Growth of Deaf and Hard of Hearing Individuals. Raising and Educating Deaf Children: Foundations for Policy, Practice, and Outcomes. Retrieved from http://www.raisingandeducatingdeafchildren.org/2015/01/12/the-employment-and-career-growth-of-deaf-and-hard-of-hearing-individuals/Google Scholar
- S. Koster. 2001. Acoustic-phonetic characteristics of hyperarticulated speech for different speaking styles. In Proceedings of the 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. (Cat. No.01CH37221). IEEE. Google ScholarDigital Library
- Raja Kushalnagar, Walter Lasecki, Jeffrey Bigham. 2014. Accessibility evaluation of classroom captions. ACM Trans. Access. Comput. 5, 3, Article 7 (Jan. 2014), 24 pp. Google ScholarDigital Library
- Walter Lasecki, Christopher Miller, Adam Sadilek, Andrew Abumoussa, Donato Borrello, Raja Kushalnagar, and Jeffrey Bigham. 2012. Real-time captioning by groups of non-experts. In Proceedings of the 25th annual ACM symposium on User interface software and technology (UIST '12). ACM, New York, NY, USA, 23--34. Google ScholarDigital Library
- Uwe Ligges. 2017. Package 'tuneR' (version 1.3.2): Analysis of Music and Speech. Retrieved on April 15, 2018, from https://cran.r-project.org/web/packages/tuneR/tuneR.pdfGoogle Scholar
- Steven R. Livingstone, Deanna H. Choi, and Frank A. Russo. 2014. The influence of vocal training and acting experience on measures of voice quality and emotional genuineness. Frontiers in Psychology 5 (Mar 2014).Google Scholar
- Catherine L. Lortie, Mélanie Thibeault, Matthieu J. Guitton, and Pascale Tremblay. 2015. Effects of age on the amplitude, frequency and perceived quality of voice. AGE 37, 6 (Nov 2015).Google ScholarCross Ref
- James R. Mallory, Michael Stinson, Lisa Elliot, and Donna Easton. 2017. Personal Perspectives on Using Automatic Speech Recognition to Facilitate Communication Between Deaf Students and Hearing Customers. In Proceedings of the 19th International ACM SIGACCESS Conference on Computers and Accessibility (ASSETS '17). ACM, NY, NY, USA, 419--421. Google ScholarDigital Library
- National Institute of Standards and Technology. 2015. Sclite (version 2.4.10). Retrieved on April 10, 2018, from ftp://jaguar.ncsl.nist.gov/pub/sctk-2.4.10--20151007--1312Z.tar.bz2Google Scholar
- Nuance. 2017. Documentation for the SpeechKit 2 SDK for Android. Retrieved on June 1, 2018, from https://developer.nuance.com/public/Help/DragonMobileSDKReference_Android/index.htmlGoogle Scholar
- Sharon Oviatt, Gina-Anne Levow, Elliott Moreton, and Margaret MacEachern. 1998. Modeling Global and Focal Hyperarticulation during Human--Computer Error Resolution. J. Acoust. Soc. Amer. 104, 3080--3098.Google ScholarCross Ref
- Benjamin Picart, Thomas Drugman, and Thierry Dutoit. 2010. Analysis and synthesis of hypo-and hyperarticulated speech. In Proceedings of the Seventh ISCA Workshop on Speech Synthesis.Google Scholar
- Koenraad S. Rhebergena, Johannes Lyzenga, Wouter A. Dreschler, Joost M. Festen. 2010. Modeling speech intelligibility in quiet and noise in listeners with normal and impaired hearing. The Journal of the Acoustical Society of America 127(3), 1570--1583. Acoustical Society of America.Google ScholarCross Ref
- RIT News. 2018. RIT/NTID and Microsoft launch partnership for AI driven accessibility. Retrieved on April 15, 2018, from http://www.ntid.rit.edu/news/ritntidand-microsoft-launch-partnership-ai-driven-accessibilityGoogle Scholar
- Rein Ove Sikveland. 2006. How do We Speak to Foreigners? - Phonetic Analyses of Speech Communication between L1 and L2 Speakers of Norwegian. Working Papers 52, 109--112. Centre for Language and Literature, Lund University, Sweden.Google Scholar
- Hagen Soltau and Alex Waibel. 1998. On the Influence of Hyperarticulated Speech on Recognition Performance. In Proceedings of the 5th International Conference on Spoken Language Processing, ICSLP 1998, Sydney, Australia.Google ScholarCross Ref
- Amanda J. Stent, Marie K. Huffman, and Susan E. Brennan. 2008. Adapting Speaking After Evidence of Misrecognition: Local and Global Hyperarticulation. Speech Commun. 50, 3 (March 2008), 163--178. Google ScholarDigital Library
- Michael Stinson, Carly Linneah, Jonathan MacDonald, and Chelsea Powers. 2014. Using technology to improve communication in small groups with deaf and hearing students. Presentation at the 2nd Annual Effective Access Technology Conference. Rochester Institute of Technology, Rochester, NY, USA.Google Scholar
- Hironobu Takagi, Takashi Itoh, and Kaoru Shinkawa. 2015. Evaluation of Realtime Captioning by Machine Recognition with Human Support. In Proceedings of the 12th Web for All Conference (W4A '15). ACM, New York, NY, USA, Article 5, 4 pages. Google ScholarDigital Library
- M. Wald. 2011. Crowdsourcing Correction of Speech Recognition Captioning Errors. In Proceedings of the International Cross-Disciplinary Conference on Web Accessibility (W4A '11). ACM, New York, NY, USA, Article 22, 2 pages. Google ScholarDigital Library
- Gregory R. Warnes. 2013. Calculating Speech Intelligibility Index (SII) using R. Retrieved on April 12, 2018, from https://cran.r-project.org/web/packages/SII/vignettes/SII.pdfGoogle Scholar
- W. Xiong, J. Droppo, X. Huang, F. Seide, . Seltzer, A. Stolcke, D. Yu, G. Zweig. 2016. Achieving human parity in conversational speech recognition. Computing Research Repository (CoRR), http://arxiv.org/abs/1610.05256Google Scholar
- Jiahong Yuan, Mark Liberman, Christopher Cieri. 2007. Towards an Integrated Understanding of Speaking Rate in Conversation. In Proceedings of the 16th International Congress of Phonetic Sciences ICPhS XVI, 6--10 August 2007, Saarbrücken, Germany, 1337--1340, University of Saarbrüken.Google Scholar
Index Terms
- Behavioral Changes in Speakers who are Automatically Captioned in Meetings with Deaf or Hard-of-Hearing Peers
Recommendations
Deaf Individuals' Views on Speaking Behaviors of Hearing Peers when Using an Automatic Captioning App
CHI EA '20: Extended Abstracts of the 2020 CHI Conference on Human Factors in Computing SystemsAs automatic speech recognition (ASR) becomes more accurate, many deaf and hard-of-hearing (DHH) individuals are interested in ASR-based mobile applications to facilitate in-person communication with hearing peers. We investigate DHH users' preferences ...
Deaf and Hard-of-Hearing Perspectives on Imperfect Automatic Speech Recognition for Captioning One-on-One Meetings
ASSETS '17: Proceedings of the 19th International ACM SIGACCESS Conference on Computers and AccessibilityRecent advances in Automatic Speech Recognition (ASR) have made this technology a potential solution for transcribing audio input in real-time for people who are Deaf or Hard of Hearing (DHH). However, ASR is imperfect; users must cope with errors in ...
Designing Automatic Speech Recognition Technologies to Improve Accessibility for Deaf and Hard-of-Hearing People in Small Group Meetings
CHI EA '20: Extended Abstracts of the 2020 CHI Conference on Human Factors in Computing SystemsDeaf and hard of hearing (DHH) individuals face several barriers to communication in the workplace, particularly in small-group meetings with their hearing peers. The impromptu nature of these meetings makes scheduling sign-language interpreting or ...
Comments