research-article

Public Access

Behavioral Changes in Speakers who are Automatically Captioned in Meetings with Deaf or Hard-of-Hearing Peers

Authors:
Matthew Seita

Rochester Institute of Technology, Rochester, NY, USA

Rochester Institute of Technology, Rochester, NY, USA
View Profile

,
Khaled Albusays

Rochester Institute of Technology, Rochester, NY, USA

Rochester Institute of Technology, Rochester, NY, USA
View Profile

,
Sushant Kafle

Rochester Institute of Technology, Rochester, NY, USA

Rochester Institute of Technology, Rochester, NY, USA
View Profile

,
Michael Stinson

Rochester Institute of Technology, Rochester, NY, USA

Rochester Institute of Technology, Rochester, NY, USA
View Profile

,
Matt Huenerfauth

Rochester Institute of Technology, Rochester, NY, USA

Rochester Institute of Technology, Rochester, NY, USA
View Profile

ASSETS '18: Proceedings of the 20th International ACM SIGACCESS Conference on Computers and AccessibilityOctober 2018Pages 68–80https://doi.org/10.1145/3234695.3236355

Published:08 October 2018Publication History

ASSETS '18: Proceedings of the 20th International ACM SIGACCESS Conference on Computers and Accessibility

Pages 68–80

ABSTRACT

Deaf and hard of hearing (DHH) individuals face barriers to communication in small-group meetings with hearing peers; we examine generation of captions on mobile devices by automatic speech recognition (ASR). While ASR output displays errors, we study whether such tools benefit users and influence conversational behaviors. An experiment was conducted where DHH and hearing individuals collaborated in discussions in three conditions (without an ASR-based application, with the application, and with a version indicating words for which the ASR has low confidence). An analysis of audio recordings, from each participant across conditions, revealed significant differences in speech features. When using the ASR-based automatic captioning application, hearing individuals spoke more loudly, with improved voice quality (harmonics-to-noise ratio), with a non-standard articulation (changes in F1 and F2 formants), and at a faster rate. Identifying non-standard speech in this setting has implications on the composition of data used for ASR training/testing, which should be representative of its usage context. Understanding these behavioral influences may also enable designers of ASR captioning systems to leverage these effects, to promote communication success.

References

ANSI/ASA S3.5--1997. 1997. American National Standard Methods for Calculation of the Speech Intelligibility Index. Acoustical Society of America (ASA) and American National Standards Institute (ANSI), New York, NY, USA.Google Scholar
Keith Bain, Sara H. Basson, and Mike Wald. 2002. Speech recognition in university classrooms: liberated learning project. In Proceedings of the fifth international ACM conference on Assistive technologies (Assets '02). ACM, New York, NY, USA, 192--196. Google ScholarDigital Library
Jon P. Barker, Ricard Marxer, Emmanuel Vincent, Shinji Watanabe. 2017. The CHiME challenges: Robust speech recognition in everyday environments. In: Watanabe S., Delcroix M., Metze F., Hershey J. (eds.), New Era for Robust Speech Recognition. Springer, Cham, 327--344.Google Scholar
Larwan Berke, Christopher Caulfield, and Matt Huenerfauth. 2017. Deaf and Hard-of-Hearing Perspectives on Imperfect Automatic Speech Recognition for Captioning One-on-One Meetings. In Proceedings of the 19th International ACM SIGACCESS Conference on Computers and Accessibility (ASSETS '17). ACM, NY, NY, USA, 155--164. Google ScholarDigital Library
Larwan Berke, Sushant Kafle, Matt Huenerfauth. 2018. Methods for Evaluation of Imperfect Captioning Tools by Deaf or Hard-of-Hearing Users at Different Reading Literacy Levels. In Proceedings of the ACM CHI Conference on Human Factors in Computing Systems (CHI'18). ACM, New York, NY, USA. Google ScholarDigital Library
Paul Boersma and David Weenink. 2018. Praat: doing phonetics by computer {Computer program}. Version 6.0.39. Retrieved April 3, 2018, from http://www.praat.org/Google Scholar
Denis Burnham, Sebastian Joeffry, Lauren Rice. 2010. Computer- and Human-Directed Speech Before and After Correction. In Proceedings of the 9th Speech Science and Technology Conference 2010, Melbourne, Australia. Australian Speech Science and Technology Association.Google Scholar
Esteban Buz, Michael K. Tanenhaus, and T. Florian Jaeger. 2016. Dynamically adapted context-specific hyper-articulation: Feedback from interlocutors affects speakers' subsequent pronunciations. Journal of Memory and Language 89, Supplement C (2016), 68 -- 86.Google ScholarCross Ref
ELAN (Version 5.0.0-beta) {Computer software}. 2017. Nijmegen: Max Planck Institute for Psycholinguistics. Retrieved from https://tla.mpi.nl/tools/tla-tools/elan/Google Scholar
Lisa B. Elliot, Michael Stinson, James Mallory, Donna Easton, and Matt Huenerfauth. 2016. Deaf and Hard of Hearing Individuals' Perceptions of Communication with Hearing Colleagues in Small Groups. In Proceedings of the 18th International ACM SIGACCESS Conference on Computers and Accessibility (ASSETS '16). ACM, NY, NY, USA, 271--272. Google ScholarDigital Library
Lisa B. Elliot, Michael Stinson, Syed Ahmed, and Donna Easton. 2017. User Experiences When Testing a Messaging App for Communication Between Individuals Who Are Hearing and Deaf or Hard of Hearing. In Proceedings of the 19th International ACM SIGACCESS Conference on Computers and Accessibility (ASSETS '17). ACM, NY, NY, USA, 405--406. Google ScholarDigital Library
Maria Federico and Marco Furini. 2012. Enhancing Learning Accessibility Through Fully Automatic Captioning. In Proceedings of the International CrossDisciplinary Conference on Web Accessibility (W4A '12). ACM, New York, NY, USA, Article 40, 4 pages. Google ScholarDigital Library
Ira R. Forman, Ben Fletcher, John Hartley, Bill Rippon, and Allen Wilson. 2012. Blue Herd: Automated Captioning for Videoconferences. In Proceedings of the 14th International ACM SIGACCESS Conference on Computers and Accessibility (ASSETS '12). ACM, NY, NY, USA, 227--228. Google ScholarDigital Library
Carrie Lou Garberoglio, Stephanie Cawthon and Mark Bond. 2016. Deaf People and Employment in the United States: 2016. National Deaf Center on Postsecondary Outcomes. Retrieved on April 15, 2018, from https://www.nationaldeafcenter.org/sites/default/files/Deaf%20Employment%20Report_final.pdfGoogle Scholar
Jay Hall. 1983. The rejection of deviates as a function of threat. Doctoral Dissertation, University of Texas.Google Scholar
Hearing Loss Association of America. 2017. Basic Facts About Hearing Loss. Retrieved December 17, 2017 from http://www.hearingloss.org/content/basic-facts-about-hearing-lossGoogle Scholar
Sushant Kafle and Matt Huenerfauth. 2017. Evaluating the Usability of Automatically Generated Captions for People Who Are Deaf or Hard of Hearing. In Proceedings of the 19th International ACM SIGACCESS Conference on Computers and Accessibility (ASSETS '17). ACM, NY, NY, USA, 165--174. Google ScholarDigital Library
Saba Kawas, George Karalis, Tzu Wen, and Richard E. Ladner. 2016. Improving Real-Time Captioning Experiences for Deaf and Hard of Hearing Students. In Proceedings of the 18th International ACM SIGACCESS Conference on Computers and Accessibility (ASSETS '16). ACM, NY, NY, USA, 15--23. Google ScholarDigital Library
Ronald R. Kelly. 2015. The Employment and Career Growth of Deaf and Hard of Hearing Individuals. Raising and Educating Deaf Children: Foundations for Policy, Practice, and Outcomes. Retrieved from http://www.raisingandeducatingdeafchildren.org/2015/01/12/the-employment-and-career-growth-of-deaf-and-hard-of-hearing-individuals/Google Scholar
S. Koster. 2001. Acoustic-phonetic characteristics of hyperarticulated speech for different speaking styles. In Proceedings of the 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. (Cat. No.01CH37221). IEEE. Google ScholarDigital Library
Raja Kushalnagar, Walter Lasecki, Jeffrey Bigham. 2014. Accessibility evaluation of classroom captions. ACM Trans. Access. Comput. 5, 3, Article 7 (Jan. 2014), 24 pp. Google ScholarDigital Library
Walter Lasecki, Christopher Miller, Adam Sadilek, Andrew Abumoussa, Donato Borrello, Raja Kushalnagar, and Jeffrey Bigham. 2012. Real-time captioning by groups of non-experts. In Proceedings of the 25th annual ACM symposium on User interface software and technology (UIST '12). ACM, New York, NY, USA, 23--34. Google ScholarDigital Library
Uwe Ligges. 2017. Package 'tuneR' (version 1.3.2): Analysis of Music and Speech. Retrieved on April 15, 2018, from https://cran.r-project.org/web/packages/tuneR/tuneR.pdfGoogle Scholar
Steven R. Livingstone, Deanna H. Choi, and Frank A. Russo. 2014. The influence of vocal training and acting experience on measures of voice quality and emotional genuineness. Frontiers in Psychology 5 (Mar 2014).Google Scholar
Catherine L. Lortie, Mélanie Thibeault, Matthieu J. Guitton, and Pascale Tremblay. 2015. Effects of age on the amplitude, frequency and perceived quality of voice. AGE 37, 6 (Nov 2015).Google ScholarCross Ref
James R. Mallory, Michael Stinson, Lisa Elliot, and Donna Easton. 2017. Personal Perspectives on Using Automatic Speech Recognition to Facilitate Communication Between Deaf Students and Hearing Customers. In Proceedings of the 19th International ACM SIGACCESS Conference on Computers and Accessibility (ASSETS '17). ACM, NY, NY, USA, 419--421. Google ScholarDigital Library
National Institute of Standards and Technology. 2015. Sclite (version 2.4.10). Retrieved on April 10, 2018, from ftp://jaguar.ncsl.nist.gov/pub/sctk-2.4.10--20151007--1312Z.tar.bz2Google Scholar
Nuance. 2017. Documentation for the SpeechKit 2 SDK for Android. Retrieved on June 1, 2018, from https://developer.nuance.com/public/Help/DragonMobileSDKReference_Android/index.htmlGoogle Scholar
Sharon Oviatt, Gina-Anne Levow, Elliott Moreton, and Margaret MacEachern. 1998. Modeling Global and Focal Hyperarticulation during Human--Computer Error Resolution. J. Acoust. Soc. Amer. 104, 3080--3098.Google ScholarCross Ref
Benjamin Picart, Thomas Drugman, and Thierry Dutoit. 2010. Analysis and synthesis of hypo-and hyperarticulated speech. In Proceedings of the Seventh ISCA Workshop on Speech Synthesis.Google Scholar
Koenraad S. Rhebergena, Johannes Lyzenga, Wouter A. Dreschler, Joost M. Festen. 2010. Modeling speech intelligibility in quiet and noise in listeners with normal and impaired hearing. The Journal of the Acoustical Society of America 127(3), 1570--1583. Acoustical Society of America.Google ScholarCross Ref
RIT News. 2018. RIT/NTID and Microsoft launch partnership for AI driven accessibility. Retrieved on April 15, 2018, from http://www.ntid.rit.edu/news/ritntidand-microsoft-launch-partnership-ai-driven-accessibilityGoogle Scholar
Rein Ove Sikveland. 2006. How do We Speak to Foreigners? - Phonetic Analyses of Speech Communication between L1 and L2 Speakers of Norwegian. Working Papers 52, 109--112. Centre for Language and Literature, Lund University, Sweden.Google Scholar
Hagen Soltau and Alex Waibel. 1998. On the Influence of Hyperarticulated Speech on Recognition Performance. In Proceedings of the 5th International Conference on Spoken Language Processing, ICSLP 1998, Sydney, Australia.Google ScholarCross Ref
Amanda J. Stent, Marie K. Huffman, and Susan E. Brennan. 2008. Adapting Speaking After Evidence of Misrecognition: Local and Global Hyperarticulation. Speech Commun. 50, 3 (March 2008), 163--178. Google ScholarDigital Library
Michael Stinson, Carly Linneah, Jonathan MacDonald, and Chelsea Powers. 2014. Using technology to improve communication in small groups with deaf and hearing students. Presentation at the 2nd Annual Effective Access Technology Conference. Rochester Institute of Technology, Rochester, NY, USA.Google Scholar
Hironobu Takagi, Takashi Itoh, and Kaoru Shinkawa. 2015. Evaluation of Realtime Captioning by Machine Recognition with Human Support. In Proceedings of the 12th Web for All Conference (W4A '15). ACM, New York, NY, USA, Article 5, 4 pages. Google ScholarDigital Library
M. Wald. 2011. Crowdsourcing Correction of Speech Recognition Captioning Errors. In Proceedings of the International Cross-Disciplinary Conference on Web Accessibility (W4A '11). ACM, New York, NY, USA, Article 22, 2 pages. Google ScholarDigital Library
Gregory R. Warnes. 2013. Calculating Speech Intelligibility Index (SII) using R. Retrieved on April 12, 2018, from https://cran.r-project.org/web/packages/SII/vignettes/SII.pdfGoogle Scholar
W. Xiong, J. Droppo, X. Huang, F. Seide, . Seltzer, A. Stolcke, D. Yu, G. Zweig. 2016. Achieving human parity in conversational speech recognition. Computing Research Repository (CoRR), http://arxiv.org/abs/1610.05256Google Scholar
Jiahong Yuan, Mark Liberman, Christopher Cieri. 2007. Towards an Integrated Understanding of Speaking Rate in Conversation. In Proceedings of the 16th International Congress of Phonetic Sciences ICPhS XVI, 6--10 August 2007, Saarbrücken, Germany, 1337--1340, University of Saarbrüken.Google Scholar

Index Terms

Behavioral Changes in Speakers who are Automatically Captioned in Meetings with Deaf or Hard-of-Hearing Peers
1. Human-centered computing
  1. Accessibility

Recommendations

Deaf Individuals' Views on Speaking Behaviors of Hearing Peers when Using an Automatic Captioning App
CHI EA '20: Extended Abstracts of the 2020 CHI Conference on Human Factors in Computing Systems

As automatic speech recognition (ASR) becomes more accurate, many deaf and hard-of-hearing (DHH) individuals are interested in ASR-based mobile applications to facilitate in-person communication with hearing peers. We investigate DHH users' preferences ...
Read More
Deaf and Hard-of-Hearing Perspectives on Imperfect Automatic Speech Recognition for Captioning One-on-One Meetings
ASSETS '17: Proceedings of the 19th International ACM SIGACCESS Conference on Computers and Accessibility

Recent advances in Automatic Speech Recognition (ASR) have made this technology a potential solution for transcribing audio input in real-time for people who are Deaf or Hard of Hearing (DHH). However, ASR is imperfect; users must cope with errors in ...
Read More
Designing Automatic Speech Recognition Technologies to Improve Accessibility for Deaf and Hard-of-Hearing People in Small Group Meetings
CHI EA '20: Extended Abstracts of the 2020 CHI Conference on Human Factors in Computing Systems

Deaf and hard of hearing (DHH) individuals face several barriers to communication in the workplace, particularly in small-group meetings with their hearing peers. The impromptu nature of these meetings makes scheduling sign-language interpreting or ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ASSETS '18: Proceedings of the 20th International ACM SIGACCESS Conference on Computers and Accessibility
October 2018
508 pages
ISBN:9781450356503
DOI:10.1145/3234695
General Chair:
Faustina Hwang
University of Reading, UK
,
Program Chairs:
Joanna McGrenere
University of British Columbia, Canada
,
David Flatla
University of Guelph, Canada
Copyright © 2018 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 8 October 2018
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
accessibility
automatic speech recognition
communication
deaf and hard of hearing
speaking behavior
Qualifiers
- research-article
Conference

Acceptance Rates
ASSETS '18 Paper Acceptance Rate28of108submissions,26%Overall Acceptance Rate436of1,556submissions,28%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 24
  Total Citations
  View Citations
- 690
  Total Downloads
- Downloads (Last 12 months)108
- Downloads (Last 6 weeks)13
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Behavioral Changes in Speakers who are Automatically Captioned in Meetings with Deaf or Hard-of-Hearing Peers

ASSETS '18: Proceedings of the 20th International ACM SIGACCESS Conference on Computers and Accessibility

ABSTRACT

References

Cited By

Index Terms

Recommendations

Deaf Individuals' Views on Speaking Behaviors of Hearing Peers when Using an Automatic Captioning App

Deaf and Hard-of-Hearing Perspectives on Imperfect Automatic Speech Recognition for Captioning One-on-One Meetings

Designing Automatic Speech Recognition Technologies to Improve Accessibility for Deaf and Hard-of-Hearing People in Small Group Meetings

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media