ABSTRACT
Prior work has revealed that Deaf and Hard of Hearing (DHH) viewers are concerned about captions occluding other onscreen content, e.g. text or faces, especially for live television programming, for which captions are generally not manually placed. To support evaluation or placement of captions for several genres of live television, empirical evidence is needed on how DHH viewers prioritize onscreen information, and whether this varies by genre. Nineteen DHH participants rated the importance of various onscreen content regions across 6 genres: News, Interviews, Emergency Announcements, Political Debates, Weather News, and Sports. Importance of content regions varied significantly across several genres, motivating genre-specific caption placement. We also demonstrate how the dataset informs creation of importance-weights for a metric to predict the severity of captions occluding onscreen content. This metric correlated significantly better to 23 DHH participants' judgements of caption quality, compared to a metric with uniform importance-weights of content regions.
- Ahmed Ali and Steve Renals. 2018. Word Error Rate Estimation for Speech Recognition: e-WER. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). Association for Computational Linguistics, Melbourne, Australia, 20--24. Google ScholarCross Ref
- Ba Tu Truong and C. Dorai. 2000. Automatic genre identification for content-based video categorization. In Proceedings 15th International Conference on Pattern Recognition. ICPR-2000, Vol. 4. 230--233 vol.4.Google ScholarCross Ref
- BBC. 2019. BBC Subtitle Guidelines, 2018. https://bbc.github.io/subtitle-guidelinesGoogle Scholar
- Larwan Berke, Khaled Albusays, Matthew Seita, and Matt Huenerfauth. 2019. Preferred Appearance of Captions Generated by Automatic Speech Recognition for Deaf and Hard-of-Hearing Viewers. In Extended Abstracts of the 2019 CHI Conference on Human FaWERctors in Computing Systems (Glasgow, Scotland Uk) (CHI EA '19). Association for Computing Machinery, New York, NY, USA, 1--6. Google ScholarDigital Library
- Bonnie B. Blanchfield, Jacob J. Feldman, Jennifer L. Dunbar, and Eric N. Gardner. 2001. The severely to profoundly hearing-impaired population in the United States: prevalence estimates and demographics. Journal of the American Academy of Audiology 12, 4 (2001), 183--9. http://www.ncbi.nlm.nih.gov/pubmed/11332518Google ScholarCross Ref
- Andy Brown, Rhia Jones, Mike Crabb, James Sandford, Matthew Brooks, Mike Armstrong, and Caroline Jay. 2015. Dynamic subtitles: The user experience. In Proceedings of the ACM International Conference on Interactive Experiences for TV and Online Video. 103--112.Google ScholarDigital Library
- Federal Communications Commission. 2014. Closed Captioning Quality Report and Order, Declaratory Ruling, FNPRM. Retrieved from:. https://www.fcc.gov/document/closed-captioning-quality-report-and-order-declaratory-ruling-fnprmGoogle Scholar
- Leon Cruickshank, Emmanuel Tsekleves, Roger Whitham, Annette Hill, and Kaoruko Kondo. 2007. Making Interactive tv Easier to Use: Interface Design for a Second Screen Approach. The Design Journal 10, 3 (2007), 41--53. arXiv:https://doi.org/10.2752/146069207789271920 Google ScholarCross Ref
- S. Cushion. 2015. News and Politics: The Rise of Live and Interpretive Journalism. Taylor & Francis. https://books.google.com/books?id=d8kqBwAAQBAJGoogle Scholar
- Stephen Cushion, Rachel Lewis, and Hugh Roger. 2015. Adopting or resisting 24-hour news logic on evening bulletins? The mediatization of UK television news 19912012. Journalism 16, 7 (2015), 866--883. arXiv:https://doi.org/10.1177/1464884914550975 Google ScholarCross Ref
- The Described and Captioned Media Program. [n.d.]. Captioning Key for Educational Media, Guidelines and Preferred Technique. Retrieved from:. http://access-ed.r2d2.uwm.edu/resources/captioning-key.pdfGoogle Scholar
- FFMPEG Developers. 2016. ffmpeg tool (Version be1d324) [Software]. http://ffmpeg.org/Google Scholar
- David Fairbairn and Milad Niroumand Jadidi. 2013. Influential Visual Design Parameters on TV Weather Maps. The Cartographic Journal 50, 4 (2013), 311--323. arXiv:https://doi.org/10.1179/1743277413Y.0000000040 Google ScholarCross Ref
- NFL Football Operation. Retrieved from:. [n.d.]. QUICK GUIDE TO NFL TV GRAPHICS. https://operations.nfl.com/football-101/quick-guide-to-nfl-tv-graphics/Google Scholar
- Olivia Gerber-Morón, Agnieszka Szarkowska, and Bencie Woll. 2018. The impact of text segmentation on subtitle reading. Journal of Eye Movement Research 6 5 (2018).Google Scholar
- Stephen R. Gulliver and Gheorghita Ghinea. 2003a. How level and type of deafness affect user perception of multimedia video clips. Inform. Soc. J. 2 2, 4 (2003a), 374--386.Google Scholar
- Stephen R. Gulliver and Gheorghita Ghinea. 2003b. Impact of captions on hearing impaired and hearing perception of multimedia video clipsb. In Proceedings of the IEEE International Conference on Multimedia and Expo.Google Scholar
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarCross Ref
- Richang Hong, Meng Wang, Xiao-Tong Yuan, Mengdi Xu, Jianguo Jiang, Shuicheng Yan, and Tat-Seng Chua. 2011. Video Accessibility Enhancement for Hearing-Impaired Users. ACM Trans. Multimedia Comput. Commun. Appl. 7S, 1, Article 24 (Nov. 2011), 19 pages. Google ScholarDigital Library
- Yongtao Hu, Jan Kautz, Yizhou Yu, and Wenping Wang. 2014. Speaker-following video subtitles. ACM Transactions on Multimedia Computing, Communications, and Applications 11 2 (2014).Google Scholar
- Compix Media Inc. 2019. Compix BROADCAST GRAPHICS. https://www.compix.tv/Google Scholar
- Bo Jiang, Sijiang Liu, Liping He, Weimin Wu, Hongli Chen, and Yunfei Shen. 2017. Subtitle positioning for e-learning videos based on rough gaze estimation and saliency detection. In SIGGRAPH Asia Posters. 15--16.Google Scholar
- Sushant Kafle and Matt Huenerfauth. 2019. Predicting the Understandability of Imperfect English Captions for People Who Are Deaf or Hard of Hearing. ACM Trans. Access. Comput. 12, 2, Article 7 June 2019), 32 pages. Google ScholarDigital Library
- Sushant Kafle, Peter Yeung, and Matt Huenerfauth. 2019. Evaluating the Benefit of Highlighting Key Words in Captions for People Who Are Deaf or Hard of Hearing. In The 21st International ACM SIGACCESS Conference on Computers and Accessibility (Pittsburgh, PA, USA) (ASSETS '19). Association for Computing Machinery, New York, NY, USA, 43--55. Google ScholarDigital Library
- Terry K. Koo and Mae Y. Li. 2016. A Guideline of Selecting and Reporting Intraclass Correlation Coefficients for Reliability Research. Journal of Chiropractic Medicine 15, 2 (2016), 155 -- 163. Google ScholarCross Ref
- Kuno Kurzhals, Emine Cetinkaya, Yongtao Hu, Wenping Wang, and Daniel Weiskopf. 2017. Close to the action: Eye-tracking evaluation of speaker-following subtitles. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. 6559--6568.Google ScholarDigital Library
- Kuno Kurzhals, Fabian Göbel, Katrin Angerbauer, Michael Sedlmair, and Martin Raubal. 2020. A View on the Viewer: Gaze-Adaptive Captions for Videos. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems (Honolulu, HI, USA) (CHI '20). Association for Computing Machinery, New York, NY, USA, 1--12. Google ScholarDigital Library
- Raja S. Kushalnagar, Walter S. Lasecki, and Jeffrey P. Bigham. 2014. Accessibility Evaluation of Classroom Captions. ACM Trans. Access. Comput. 5, 3, Article 7 (Jan. 2014), 24 pages. Google ScholarDigital Library
- DANIEL G. LEE, DEBORAH I. FELS, and JOHN PATRICK UDO. 2007. Emotive Captioning. Comput. Entertain. 5, 2, Article 11 (April 2007), 15 pages. Google ScholarDigital Library
- W. Luplow and J. Kutzner. 2013. Emergency alerts to people on-the-go via terrestrial broadcasting: The M-EAS system. In 2013 IEEE International Conference on Technologies for Homeland Security (HST). 779--783.Google Scholar
- Obach M, Lehr M, and Arruti A. 2007. Automatic speech recognition for live TV subtitling for hearing-impaired people. Challenges for Assistive Technology: AAATE 07 20 (2007), 286.Google Scholar
- Konrad Maj and Stephan Lewandowsky. 2020. Is bad news on TV tickers good news? The effects of voiceover and visual elements in video on viewers' assessment. PLOS ONE 15, 4 (04 2020), 1--18. Google ScholarCross Ref
- S. Nam, D. I. Fels, and M. H. Chignell. 2020. Modeling Closed Captioning Subjective Quality Assessment by Deaf and Hard of Hearing Viewers. IEEE Transactions on Computational Social Systems 7, 3 (2020), 621--631.Google ScholarCross Ref
- Ofcom. [n.d.]. Measuring live subtitling quality, UK. https://www.ofcom.org.uk/__data/assets/pdf_file/0019/45136/sampling-report.pdfGoogle Scholar
- Andrew D. Ouzts, Nicole E. Snell, Prabudh Maini, and Andrew T. Duchowski. 2013. Determining Optima Caption Placement Using Eye Tracking. In Proceedings of the 31st ACM International Conference on Design of Communication (Greenville, North Carolina, USA) (SIGDOC '13). Association for Computing Machinery, New York, NY, USA, 189--190. Google ScholarDigital Library
- Anni Rander and Peter Olaf Looms. 2010. The Accessibility of Television News with Live Subtitling on Digital Television. In Proceedings of the 8th European Conference on Interactive TV and Video (Tampere, Finland) (EuroITV '10). Association for Computing Machinery, New York, NY, USA, 155--160. Google ScholarDigital Library
- Pablo Romero-Fresco and Juan Martínez Pérez. 2015. Accuracy Rate in Live Subtitling: The NER Model. Audiovisual Translation in a Global Context. Palgrave Studies in Translating and Interpreting. Palgrave Macmillan, London.Google Scholar
- Society of Cable Telecommunications Engineers. SCTE. 2012. STANDARD FOR CARRIAGE OF VBI DATA IN CABLE DIGITAL TRANSPORT STREAMS. Technical Report.Google Scholar
- Ruxandra Tapu, Bogdan Mocanu, and Titus Zaharia. 2019. DEEP-HEAR: A multimodal subtitle positioning system dedicated to deaf and hearing-impaired people. IEEE Access 7, 150--162 88 (2019).Google ScholarCross Ref
- The Nielsen Company (US), LLC. 2020. THE NIELSEN TOTAL AUDIENCE REPORT: APRIL 2020. Technical Report.Google Scholar
- Marcia Brooks Tom Apone, Brad Botkin and Larry Goldberg. 2011. Caption Accuracy Metrics Project Research into Automated Error Ranking of Real-time Captions in Live Television News Programs.Google Scholar
- NewscastStudio. The trade publication for TV production professionals. 2020. TV News Graphics Package. https://www.newscaststudio.com/tv-news-graphics-package/Google Scholar
- Friedrich Ungerer (Ed.). 2000. English Media Texts - Past and Present: Language and textual structure. John Benjamins. https://www.jbe-platform.com/content/books/9789027298959Google Scholar
- Toinon Vigier, Yoann Baveye, Josselin Rousseau, and Patrick Le Callet. 2016. Visual attention as a dimension of QoE: Subtitles in UHD videos. In Proceedings of the Eighth International Conference on Quality of Multimedia Experience. 1--6.Google ScholarCross Ref
- James M. Waller and Raja S. Kushalnagar. 2016. Evaluation of Automatic Caption Segmentation. In Proceedings of the 18th International ACM SIGACCESS Conference on Computers and Accessibility (Reno, Nevada, USA) (ASSETS '16). Association for Computing Machinery, New York, NY, USA, 331--332. Google ScholarDigital Library
- Jennifer Wehrmeyer. 2014. Eye-tracking Deaf and hearing viewing of sign language interpreted news broadcasts. Journal of Eye Movement Research.Google ScholarCross Ref
- Media Access Group (WGBH). 2019. Closed Captioning on TV in the United States 101. https://blog.snapstream.com/closed-captioning-on-tv-in-the-united-states-101Google Scholar
- wTVision. 2020. Broadcast Design. https://www.wtvision.com/en/solutions/broadcast-design/Google Scholar
- Xinyu Zhou, Cong Yao, He Wen, Yuzhi Wang, Shuchang Zhou, Weiran He, and Jiajun Liang. 2017. EAST: An Efficient and Accurate Scene Text Detector. In the Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google Scholar
Index Terms
- Caption-occlusion severity judgments across live-television genres from deaf and hard-of-hearing viewers
Recommendations
Understanding How Deaf and Hard of Hearing Viewers Visually Explore Captioned Live TV News
W4A '23: Proceedings of the 20th International Web for All ConferenceCaptions blocking visual information in live television news leads to dissatisfaction among Deaf and Hard of Hearing (DHH) viewers, who cannot see important information on the screen. Prior work has proposed generic guidelines for caption placement but ...
Watch It, Don’t Imagine It: Creating a Better Caption-Occlusion Metric by Collecting More Ecologically Valid Judgments from DHH Viewers
CHI '22: Proceedings of the 2022 CHI Conference on Human Factors in Computing SystemsTelevision captions blocking visual information causes dissatisfaction among Deaf and Hard of Hearing (DHH) viewers, yet existing caption evaluation metrics do not consider occlusion. To create such a metric, DHH participants in a recent study imagined ...
Who is speaking: Unpacking In-text Speaker Identification Preference of Viewers who are Deaf and Hard of Hearing while Watching Live Captioned Television Program
W4A '23: Proceedings of the 20th International Web for All ConferenceLive TV news and interviews often include multiple individuals speaking, with rapid turn-taking, which makes it difficult for viewers who are Deaf and Hard of Hearing (DHH) to follow who is speaking when reading captions. Prior research has proposed ...
Comments