skip to main content
10.1145/3459637.3482457acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article
Public Access

Non-Visual Accessibility Assessment of Videos

Published:30 October 2021Publication History

ABSTRACT

Video accessibility is crucial for blind screen-reader users as online videos are increasingly playing an essential role in education, employment, and entertainment. While there exist quite a few techniques and guidelines that focus on creating accessible videos, there is a dearth of research that attempts to characterize the accessibility of existing videos. Therefore in this paper, we define and investigate a diverse set of video and audio-based accessibility features in an effort to characterize accessible and inaccessible videos. As a ground truth for our investigation, we built a custom dataset of 600 videos, in which each video was assigned an accessibility score based on the number of its wins in a Swiss-system tournament, where human annotators performed pairwise accessibility comparisons of videos. In contrast to existing accessibility research where the assessments are typically done by blind users, we recruited sighted users for our effort, since videos comprise a special case where sight could be required to better judge if any particular scene in a video is presently accessible or not. Subsequently, by examining the extent of association between the accessibility features and the accessibility scores, we could determine the features that significantly (positively or negatively) impact video accessibility and therefore serve as good indicators for assessing the accessibility of videos. Using the custom dataset, we also trained machine learning models that leveraged our handcrafted features to either classify an arbitrary video as accessible/inaccessible or predict an accessibility score for the video. Evaluation of our models yielded an F1 score of 0.675 for binary classification and a mean absolute error of 0.53 for score prediction, thereby demonstrating their potential in video accessibility assessment while also illuminating their current limitations and the need for further research in this area.

References

  1. [n.d.]. iOS Accessibility Scanner Framework. https://github.com/google/GSCXScanner.Google ScholarGoogle Scholar
  2. [n.d.]. WAVE Web Accessibility Evaluation Tool. https://wave.webaim.org/.Google ScholarGoogle Scholar
  3. 2021. Improve your code with lint checks. https://developer.android.com/studio/write/lint.Google ScholarGoogle Scholar
  4. Nayyer Aafaq, Ajmal Mian, Wei Liu, Syed Zulqarnain Gilani, and Mubarak Shah. 2019. Video description: A survey of methods, datasets, and evaluation metrics. ACM Computing Surveys (CSUR) 52, 6 (2019), 1--37. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Tania Acosta, Patricia Acosta-Vargas, Jose Zambrano-Miranda, and Sergio Lujan-Mora. 2020. Web Accessibility evaluation of videos published on YouTube by worldwide top-ranking universities. IEEE Access 8 (2020), 110994--111011.Google ScholarGoogle ScholarCross RefCross Ref
  6. Amazon. [n.d.]. Amazon Rekognition -- Video and Image - AWS. https://aws. amazon.com/rekognition/.Google ScholarGoogle Scholar
  7. Inc Amazon Web Services. [n.d.]. Amazon Transcribe -- Speech to Text - AWS. https://aws.amazon.com/transcribe/.Google ScholarGoogle Scholar
  8. Chieko Asakawa, Takashi Itoh, Hironobu Takagi, and Hisashi Miyashita. 2007. Accessibility evaluation for multimedia content. In International Conference on Universal Access in Human-Computer Interaction. Springer, 11--19. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Ali Selman Aydin, Shirin Feiz, Vikas Ashok, and IV Ramakrishnan. 2020. Towards making videos accessible for low vision screen magnifier users. In Proceedings of the 25th International Conference on Intelligent User Interfaces. 10--21. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Alexey Bochkovskiy, Chien-Yao Wang, and Hong-Yuan Mark Liao. 2020. YOLOv4: Optimal Speed and Accuracy of Object Detection. arXiv:2004.10934 [cs.CV]Google ScholarGoogle Scholar
  11. Vicente Luque Centeno, Carlos Delgado Kloos, Jesús Arias Fisteus, and Luis Ál-varez Álvarez. 2006. Web accessibility evaluation tools: A survey and some improvements. Electronic notes in theoretical computer science 157, 2 (2006), 87--100. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Corinna Cortes and Vladimir Vapnik. 1995. Support-vector networks. Machine learning 20, 3 (1995), 273--297. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Alireza Darvishy, Hans-Peter Hutter, and Oliver Mannhart. 2011. Web appli-cation for analysis, manipulation and generation of accessible PDF documents. In International Conference on Universal Access in Human-Computer Interaction. Springer, 121--128. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Luchin Doblies, David Stolz, Alireza Darvishy, and Hans-Peter Hutter. 2014. PAVE: A web application to identify and correct accessibility problems in PDF documents. In International Conference on Computers for Handicapped Persons. Springer, 185--192.Google ScholarGoogle ScholarCross RefCross Ref
  15. Benoît Encelle, Magali Ollagnier-Beldame, Stéphanie Pouchot, and Yannick Prié. 2011. Annotation-based video enrichment for blind people: A pilot study on the use of earcons and speech synthesis. In The proceedings of the 13th international ACM SIGACCESS conference on Computers and accessibility. 123--130. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Gunnar Farnebäck. 2003. Two-frame motion estimation based on polynomial expansion. In Scandinavian conference on Image analysis. Springer, 363--370. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. FIDE. [n.d.]. FIDE Handbook C. General Rules and Technical Recommendations for Tournaments / 04. FIDE Swiss Rules / C.04.1 Basic rules for Swiss Systems /. https://handbook.fide.com/chapter/C0401.Google ScholarGoogle Scholar
  18. Denis Fortun, Patrick Bouthemy, and Charles Kervrann. 2015. Optical flow modeling and computation: A survey. Computer Vision and Image Understanding 134 (2015), 1--21. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Kentarou Fukuda, Shin Saito, Hironobu Takagi, and Chieko Asakawa. 2005. Proposing New Metrics to Evaluate Web Usability for the Blind. In CHI '05 Ex-tended Abstracts on Human Factors in Computing Systems (Portland, OR, USA) (CHI EA '05). Association for Computing Machinery, New York, NY, USA, 1387--1390. https://doi.org/10.1145/1056808.1056923 Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Jort F Gemmeke, Daniel PW Ellis, Dylan Freedman, Aren Jansen, Wade Lawrence, R Channing Moore, Manoj Plakal, and Marvin Ritter. 2017. Audio set: An ontology and human-labeled dataset for audio events. In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 776--780.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Julia González, Mercedes Macías, Roberto Rodríguez, and Fernando Sánchez. 2003. Accessibility metrics of web pages for blind end-users. In International Conference on Web Engineering. Springer, 374--383. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.Google ScholarGoogle ScholarCross RefCross Ref
  23. Lai Jiang, Mai Xu, Tie Liu, Minglang Qiao, and Zulin Wang. 2018. DeepVS: A Deep Learning Based Video Saliency Prediction Approach. In The European Conference on Computer Vision (ECCV).Google ScholarGoogle Scholar
  24. Yu-Gang Jiang, Yanran Wang, Rui Feng, Xiangyang Xue, Yingbin Zheng, and Hanfang Yang. 2013. Understanding and predicting interestingness of videos. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 27. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Qiuqiang Kong, Yin Cao, Turab Iqbal, Yuxuan Wang, Wenwu Wang, and Mark D Plumbley. 2019. Panns: Large-scale pretrained audio neural networks for audio pattern recognition. arXiv preprint arXiv:1912.10211 (2019).Google ScholarGoogle Scholar
  26. Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. 2014. Microsoft coco: Common objects in context. In European conference on computer vision. Springer, 740--755.Google ScholarGoogle ScholarCross RefCross Ref
  27. Xingyu Liu, Patrick Carrington, Xiang 'Anthony' Chen, and Amy Pavel. 2021. What Makes Videos Accessible to Blind and Visually Impaired People?. In Pro-ceedings of the 2021 CHI Conference on Human Factors in Computing Systems (Yokohama, Japan) (CHI '21). Association for Computing Machinery, New York, NY, USA, Article 272, 14 pages. https://doi.org/10.1145/3411764.3445233 Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Edward Loper and Steven Bird. 2002. NLTK: The Natural Language Toolkit. In Proceedings of the ACL-02 Workshop on Effective Tools and Methodologies for Teaching Natural Language Processing and Computational Linguistics - Volume 1 (Philadelphia, Pennsylvania) (ETMTNLP '02). Association for Computational Linguistics, USA, 63--70. https://doi.org/10.3115/1118108.1118117 Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. AJ Piergiovanni and Michael S Ryoo. 2020. AViD Dataset: Anonymized Videos from Diverse Countries. arXiv preprint arXiv:2007.05515 (2020).Google ScholarGoogle Scholar
  30. Nikolay Ponomarenko, Lina Jin, Oleg Ieremeiev, Vladimir Lukin, Karen Egiazar-ian, Jaakko Astola, Benoit Vozel, Kacem Chehdi, Marco Carli, Federica Battisti, et al. 2015. Image database TID2013: Peculiarities, results and perspectives. Signal processing: Image communication 30 (2015), 57--77. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Nikolay Ponomarenko, Vladimir Lukin, Alexander Zelensky, Karen Egiazarian, Marco Carli, and Federica Battisti. 2009. TID2008-a database for evaluation of full-reference visual quality assessment metrics. Advances of Modern Radioelectronics 10, 4 (2009), 30--45.Google ScholarGoogle Scholar
  32. Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. 2015. You Only Look Once: Unified, Real-Time Object Detection. arXiv:1506.02640 [cs.CV]Google ScholarGoogle Scholar
  33. Joseph Redmon and Ali Farhadi. 2016. YOLO9000: Better, Faster, Stronger. arXiv:1612.08242 [cs.CV]Google ScholarGoogle Scholar
  34. Joseph Redmon and Ali Farhadi. 2018. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767 (2018).Google ScholarGoogle Scholar
  35. Camila Silva, Marcelo Medeiros Eler, and Gordon Fraser. 2018. A survey on the tool support for the automatic evaluation of mobile accessibility. In Proceedings of the 8th International Conference on Software Development and Technologies for Enhancing Accessibility and Fighting Info-exclusion. 286--293. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Dan Stowell, Dimitrios Giannoulis, Emmanouil Benetos, Mathieu Lagrange, and Mark D Plumbley. 2015. Detection and classification of acoustic scenes and events. IEEE Transactions on Multimedia 17, 10 (2015), 1733--1746.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Markel Vigo and Giorgio Brajnik. 2011. Automatic web accessibility metrics: Where we are and where we can go. Interacting with computers 23, 2 (2011), 137--155. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. W3C. 2021. Audio Description of Visual Information | Web Accessibility Initiative (WAI) | W3C. https://www.w3.org/WAI/media/av/description/.Google ScholarGoogle Scholar
  39. W3C. 2021. Making Audio and Video Media Accessible | Web Accessibility Initiative (WAI) | W3C. https://www.w3.org/WAI/media/av/.Google ScholarGoogle Scholar
  40. YouTube. [n.d.]. YouTube for Press. https://blog.youtube/press/.Google ScholarGoogle Scholar
  41. Beste F Yuksel, Pooyan Fazli, Umang Mathur, Vaishali Bisht, Soo Jung Kim, Joshua Junhee Lee, Seung Jung Jin, Yue-Ting Siu, Joshua A Miele, and Ilmi Yoon. 2020. Human-in-the-Loop Machine Learning to Increase Video Accessibility for Visually Impaired and Blind Users. In Proceedings of the 2020aCM Designing Interactive Systems Conference. 47--60. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Zhengxia Zou, Zhenwei Shi, Yuhong Guo, and Jieping Ye. 2019. Object Detection in 20 Years: A Survey. arXiv:1905.05055 [cs.CV]Google ScholarGoogle Scholar

Index Terms

  1. Non-Visual Accessibility Assessment of Videos

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        CIKM '21: Proceedings of the 30th ACM International Conference on Information & Knowledge Management
        October 2021
        4966 pages
        ISBN:9781450384469
        DOI:10.1145/3459637

        Copyright © 2021 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 30 October 2021

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate1,861of8,427submissions,22%

        Upcoming Conference

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader