skip to main content
10.1145/3234695.3236355acmconferencesArticle/Chapter ViewAbstractPublication PagesassetsConference Proceedingsconference-collections
research-article
Public Access

Behavioral Changes in Speakers who are Automatically Captioned in Meetings with Deaf or Hard-of-Hearing Peers

Authors Info & Claims
Published:08 October 2018Publication History

ABSTRACT

Deaf and hard of hearing (DHH) individuals face barriers to communication in small-group meetings with hearing peers; we examine generation of captions on mobile devices by automatic speech recognition (ASR). While ASR output displays errors, we study whether such tools benefit users and influence conversational behaviors. An experiment was conducted where DHH and hearing individuals collaborated in discussions in three conditions (without an ASR-based application, with the application, and with a version indicating words for which the ASR has low confidence). An analysis of audio recordings, from each participant across conditions, revealed significant differences in speech features. When using the ASR-based automatic captioning application, hearing individuals spoke more loudly, with improved voice quality (harmonics-to-noise ratio), with a non-standard articulation (changes in F1 and F2 formants), and at a faster rate. Identifying non-standard speech in this setting has implications on the composition of data used for ASR training/testing, which should be representative of its usage context. Understanding these behavioral influences may also enable designers of ASR captioning systems to leverage these effects, to promote communication success.

References

  1. ANSI/ASA S3.5--1997. 1997. American National Standard Methods for Calculation of the Speech Intelligibility Index. Acoustical Society of America (ASA) and American National Standards Institute (ANSI), New York, NY, USA.Google ScholarGoogle Scholar
  2. Keith Bain, Sara H. Basson, and Mike Wald. 2002. Speech recognition in university classrooms: liberated learning project. In Proceedings of the fifth international ACM conference on Assistive technologies (Assets '02). ACM, New York, NY, USA, 192--196. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Jon P. Barker, Ricard Marxer, Emmanuel Vincent, Shinji Watanabe. 2017. The CHiME challenges: Robust speech recognition in everyday environments. In: Watanabe S., Delcroix M., Metze F., Hershey J. (eds.), New Era for Robust Speech Recognition. Springer, Cham, 327--344.Google ScholarGoogle Scholar
  4. Larwan Berke, Christopher Caulfield, and Matt Huenerfauth. 2017. Deaf and Hard-of-Hearing Perspectives on Imperfect Automatic Speech Recognition for Captioning One-on-One Meetings. In Proceedings of the 19th International ACM SIGACCESS Conference on Computers and Accessibility (ASSETS '17). ACM, NY, NY, USA, 155--164. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Larwan Berke, Sushant Kafle, Matt Huenerfauth. 2018. Methods for Evaluation of Imperfect Captioning Tools by Deaf or Hard-of-Hearing Users at Different Reading Literacy Levels. In Proceedings of the ACM CHI Conference on Human Factors in Computing Systems (CHI'18). ACM, New York, NY, USA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Paul Boersma and David Weenink. 2018. Praat: doing phonetics by computer {Computer program}. Version 6.0.39. Retrieved April 3, 2018, from http://www.praat.org/Google ScholarGoogle Scholar
  7. Denis Burnham, Sebastian Joeffry, Lauren Rice. 2010. Computer- and Human-Directed Speech Before and After Correction. In Proceedings of the 9th Speech Science and Technology Conference 2010, Melbourne, Australia. Australian Speech Science and Technology Association.Google ScholarGoogle Scholar
  8. Esteban Buz, Michael K. Tanenhaus, and T. Florian Jaeger. 2016. Dynamically adapted context-specific hyper-articulation: Feedback from interlocutors affects speakers' subsequent pronunciations. Journal of Memory and Language 89, Supplement C (2016), 68 -- 86.Google ScholarGoogle ScholarCross RefCross Ref
  9. ELAN (Version 5.0.0-beta) {Computer software}. 2017. Nijmegen: Max Planck Institute for Psycholinguistics. Retrieved from https://tla.mpi.nl/tools/tla-tools/elan/Google ScholarGoogle Scholar
  10. Lisa B. Elliot, Michael Stinson, James Mallory, Donna Easton, and Matt Huenerfauth. 2016. Deaf and Hard of Hearing Individuals' Perceptions of Communication with Hearing Colleagues in Small Groups. In Proceedings of the 18th International ACM SIGACCESS Conference on Computers and Accessibility (ASSETS '16). ACM, NY, NY, USA, 271--272. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Lisa B. Elliot, Michael Stinson, Syed Ahmed, and Donna Easton. 2017. User Experiences When Testing a Messaging App for Communication Between Individuals Who Are Hearing and Deaf or Hard of Hearing. In Proceedings of the 19th International ACM SIGACCESS Conference on Computers and Accessibility (ASSETS '17). ACM, NY, NY, USA, 405--406. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Maria Federico and Marco Furini. 2012. Enhancing Learning Accessibility Through Fully Automatic Captioning. In Proceedings of the International CrossDisciplinary Conference on Web Accessibility (W4A '12). ACM, New York, NY, USA, Article 40, 4 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Ira R. Forman, Ben Fletcher, John Hartley, Bill Rippon, and Allen Wilson. 2012. Blue Herd: Automated Captioning for Videoconferences. In Proceedings of the 14th International ACM SIGACCESS Conference on Computers and Accessibility (ASSETS '12). ACM, NY, NY, USA, 227--228. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Carrie Lou Garberoglio, Stephanie Cawthon and Mark Bond. 2016. Deaf People and Employment in the United States: 2016. National Deaf Center on Postsecondary Outcomes. Retrieved on April 15, 2018, from https://www.nationaldeafcenter.org/sites/default/files/Deaf%20Employment%20Report_final.pdfGoogle ScholarGoogle Scholar
  15. Jay Hall. 1983. The rejection of deviates as a function of threat. Doctoral Dissertation, University of Texas.Google ScholarGoogle Scholar
  16. Hearing Loss Association of America. 2017. Basic Facts About Hearing Loss. Retrieved December 17, 2017 from http://www.hearingloss.org/content/basic-facts-about-hearing-lossGoogle ScholarGoogle Scholar
  17. Sushant Kafle and Matt Huenerfauth. 2017. Evaluating the Usability of Automatically Generated Captions for People Who Are Deaf or Hard of Hearing. In Proceedings of the 19th International ACM SIGACCESS Conference on Computers and Accessibility (ASSETS '17). ACM, NY, NY, USA, 165--174. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Saba Kawas, George Karalis, Tzu Wen, and Richard E. Ladner. 2016. Improving Real-Time Captioning Experiences for Deaf and Hard of Hearing Students. In Proceedings of the 18th International ACM SIGACCESS Conference on Computers and Accessibility (ASSETS '16). ACM, NY, NY, USA, 15--23. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Ronald R. Kelly. 2015. The Employment and Career Growth of Deaf and Hard of Hearing Individuals. Raising and Educating Deaf Children: Foundations for Policy, Practice, and Outcomes. Retrieved from http://www.raisingandeducatingdeafchildren.org/2015/01/12/the-employment-and-career-growth-of-deaf-and-hard-of-hearing-individuals/Google ScholarGoogle Scholar
  20. S. Koster. 2001. Acoustic-phonetic characteristics of hyperarticulated speech for different speaking styles. In Proceedings of the 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. (Cat. No.01CH37221). IEEE. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Raja Kushalnagar, Walter Lasecki, Jeffrey Bigham. 2014. Accessibility evaluation of classroom captions. ACM Trans. Access. Comput. 5, 3, Article 7 (Jan. 2014), 24 pp. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Walter Lasecki, Christopher Miller, Adam Sadilek, Andrew Abumoussa, Donato Borrello, Raja Kushalnagar, and Jeffrey Bigham. 2012. Real-time captioning by groups of non-experts. In Proceedings of the 25th annual ACM symposium on User interface software and technology (UIST '12). ACM, New York, NY, USA, 23--34. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Uwe Ligges. 2017. Package 'tuneR' (version 1.3.2): Analysis of Music and Speech. Retrieved on April 15, 2018, from https://cran.r-project.org/web/packages/tuneR/tuneR.pdfGoogle ScholarGoogle Scholar
  24. Steven R. Livingstone, Deanna H. Choi, and Frank A. Russo. 2014. The influence of vocal training and acting experience on measures of voice quality and emotional genuineness. Frontiers in Psychology 5 (Mar 2014).Google ScholarGoogle Scholar
  25. Catherine L. Lortie, Mélanie Thibeault, Matthieu J. Guitton, and Pascale Tremblay. 2015. Effects of age on the amplitude, frequency and perceived quality of voice. AGE 37, 6 (Nov 2015).Google ScholarGoogle ScholarCross RefCross Ref
  26. James R. Mallory, Michael Stinson, Lisa Elliot, and Donna Easton. 2017. Personal Perspectives on Using Automatic Speech Recognition to Facilitate Communication Between Deaf Students and Hearing Customers. In Proceedings of the 19th International ACM SIGACCESS Conference on Computers and Accessibility (ASSETS '17). ACM, NY, NY, USA, 419--421. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. National Institute of Standards and Technology. 2015. Sclite (version 2.4.10). Retrieved on April 10, 2018, from ftp://jaguar.ncsl.nist.gov/pub/sctk-2.4.10--20151007--1312Z.tar.bz2Google ScholarGoogle Scholar
  28. Nuance. 2017. Documentation for the SpeechKit 2 SDK for Android. Retrieved on June 1, 2018, from https://developer.nuance.com/public/Help/DragonMobileSDKReference_Android/index.htmlGoogle ScholarGoogle Scholar
  29. Sharon Oviatt, Gina-Anne Levow, Elliott Moreton, and Margaret MacEachern. 1998. Modeling Global and Focal Hyperarticulation during Human--Computer Error Resolution. J. Acoust. Soc. Amer. 104, 3080--3098.Google ScholarGoogle ScholarCross RefCross Ref
  30. Benjamin Picart, Thomas Drugman, and Thierry Dutoit. 2010. Analysis and synthesis of hypo-and hyperarticulated speech. In Proceedings of the Seventh ISCA Workshop on Speech Synthesis.Google ScholarGoogle Scholar
  31. Koenraad S. Rhebergena, Johannes Lyzenga, Wouter A. Dreschler, Joost M. Festen. 2010. Modeling speech intelligibility in quiet and noise in listeners with normal and impaired hearing. The Journal of the Acoustical Society of America 127(3), 1570--1583. Acoustical Society of America.Google ScholarGoogle ScholarCross RefCross Ref
  32. RIT News. 2018. RIT/NTID and Microsoft launch partnership for AI driven accessibility. Retrieved on April 15, 2018, from http://www.ntid.rit.edu/news/ritntidand-microsoft-launch-partnership-ai-driven-accessibilityGoogle ScholarGoogle Scholar
  33. Rein Ove Sikveland. 2006. How do We Speak to Foreigners? - Phonetic Analyses of Speech Communication between L1 and L2 Speakers of Norwegian. Working Papers 52, 109--112. Centre for Language and Literature, Lund University, Sweden.Google ScholarGoogle Scholar
  34. Hagen Soltau and Alex Waibel. 1998. On the Influence of Hyperarticulated Speech on Recognition Performance. In Proceedings of the 5th International Conference on Spoken Language Processing, ICSLP 1998, Sydney, Australia.Google ScholarGoogle ScholarCross RefCross Ref
  35. Amanda J. Stent, Marie K. Huffman, and Susan E. Brennan. 2008. Adapting Speaking After Evidence of Misrecognition: Local and Global Hyperarticulation. Speech Commun. 50, 3 (March 2008), 163--178. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Michael Stinson, Carly Linneah, Jonathan MacDonald, and Chelsea Powers. 2014. Using technology to improve communication in small groups with deaf and hearing students. Presentation at the 2nd Annual Effective Access Technology Conference. Rochester Institute of Technology, Rochester, NY, USA.Google ScholarGoogle Scholar
  37. Hironobu Takagi, Takashi Itoh, and Kaoru Shinkawa. 2015. Evaluation of Realtime Captioning by Machine Recognition with Human Support. In Proceedings of the 12th Web for All Conference (W4A '15). ACM, New York, NY, USA, Article 5, 4 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. M. Wald. 2011. Crowdsourcing Correction of Speech Recognition Captioning Errors. In Proceedings of the International Cross-Disciplinary Conference on Web Accessibility (W4A '11). ACM, New York, NY, USA, Article 22, 2 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Gregory R. Warnes. 2013. Calculating Speech Intelligibility Index (SII) using R. Retrieved on April 12, 2018, from https://cran.r-project.org/web/packages/SII/vignettes/SII.pdfGoogle ScholarGoogle Scholar
  40. W. Xiong, J. Droppo, X. Huang, F. Seide, . Seltzer, A. Stolcke, D. Yu, G. Zweig. 2016. Achieving human parity in conversational speech recognition. Computing Research Repository (CoRR), http://arxiv.org/abs/1610.05256Google ScholarGoogle Scholar
  41. Jiahong Yuan, Mark Liberman, Christopher Cieri. 2007. Towards an Integrated Understanding of Speaking Rate in Conversation. In Proceedings of the 16th International Congress of Phonetic Sciences ICPhS XVI, 6--10 August 2007, Saarbrücken, Germany, 1337--1340, University of Saarbrüken.Google ScholarGoogle Scholar

Index Terms

  1. Behavioral Changes in Speakers who are Automatically Captioned in Meetings with Deaf or Hard-of-Hearing Peers

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          ASSETS '18: Proceedings of the 20th International ACM SIGACCESS Conference on Computers and Accessibility
          October 2018
          508 pages
          ISBN:9781450356503
          DOI:10.1145/3234695

          Copyright © 2018 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 8 October 2018

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          ASSETS '18 Paper Acceptance Rate28of108submissions,26%Overall Acceptance Rate436of1,556submissions,28%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader