ABSTRACT
Video accessibility is crucial for blind screen-reader users as online videos are increasingly playing an essential role in education, employment, and entertainment. While there exist quite a few techniques and guidelines that focus on creating accessible videos, there is a dearth of research that attempts to characterize the accessibility of existing videos. Therefore in this paper, we define and investigate a diverse set of video and audio-based accessibility features in an effort to characterize accessible and inaccessible videos. As a ground truth for our investigation, we built a custom dataset of 600 videos, in which each video was assigned an accessibility score based on the number of its wins in a Swiss-system tournament, where human annotators performed pairwise accessibility comparisons of videos. In contrast to existing accessibility research where the assessments are typically done by blind users, we recruited sighted users for our effort, since videos comprise a special case where sight could be required to better judge if any particular scene in a video is presently accessible or not. Subsequently, by examining the extent of association between the accessibility features and the accessibility scores, we could determine the features that significantly (positively or negatively) impact video accessibility and therefore serve as good indicators for assessing the accessibility of videos. Using the custom dataset, we also trained machine learning models that leveraged our handcrafted features to either classify an arbitrary video as accessible/inaccessible or predict an accessibility score for the video. Evaluation of our models yielded an F1 score of 0.675 for binary classification and a mean absolute error of 0.53 for score prediction, thereby demonstrating their potential in video accessibility assessment while also illuminating their current limitations and the need for further research in this area.
- [n.d.]. iOS Accessibility Scanner Framework. https://github.com/google/GSCXScanner.Google Scholar
- [n.d.]. WAVE Web Accessibility Evaluation Tool. https://wave.webaim.org/.Google Scholar
- 2021. Improve your code with lint checks. https://developer.android.com/studio/write/lint.Google Scholar
- Nayyer Aafaq, Ajmal Mian, Wei Liu, Syed Zulqarnain Gilani, and Mubarak Shah. 2019. Video description: A survey of methods, datasets, and evaluation metrics. ACM Computing Surveys (CSUR) 52, 6 (2019), 1--37. Google ScholarDigital Library
- Tania Acosta, Patricia Acosta-Vargas, Jose Zambrano-Miranda, and Sergio Lujan-Mora. 2020. Web Accessibility evaluation of videos published on YouTube by worldwide top-ranking universities. IEEE Access 8 (2020), 110994--111011.Google ScholarCross Ref
- Amazon. [n.d.]. Amazon Rekognition -- Video and Image - AWS. https://aws. amazon.com/rekognition/.Google Scholar
- Inc Amazon Web Services. [n.d.]. Amazon Transcribe -- Speech to Text - AWS. https://aws.amazon.com/transcribe/.Google Scholar
- Chieko Asakawa, Takashi Itoh, Hironobu Takagi, and Hisashi Miyashita. 2007. Accessibility evaluation for multimedia content. In International Conference on Universal Access in Human-Computer Interaction. Springer, 11--19. Google ScholarDigital Library
- Ali Selman Aydin, Shirin Feiz, Vikas Ashok, and IV Ramakrishnan. 2020. Towards making videos accessible for low vision screen magnifier users. In Proceedings of the 25th International Conference on Intelligent User Interfaces. 10--21. Google ScholarDigital Library
- Alexey Bochkovskiy, Chien-Yao Wang, and Hong-Yuan Mark Liao. 2020. YOLOv4: Optimal Speed and Accuracy of Object Detection. arXiv:2004.10934 [cs.CV]Google Scholar
- Vicente Luque Centeno, Carlos Delgado Kloos, Jesús Arias Fisteus, and Luis Ál-varez Álvarez. 2006. Web accessibility evaluation tools: A survey and some improvements. Electronic notes in theoretical computer science 157, 2 (2006), 87--100. Google ScholarDigital Library
- Corinna Cortes and Vladimir Vapnik. 1995. Support-vector networks. Machine learning 20, 3 (1995), 273--297. Google ScholarDigital Library
- Alireza Darvishy, Hans-Peter Hutter, and Oliver Mannhart. 2011. Web appli-cation for analysis, manipulation and generation of accessible PDF documents. In International Conference on Universal Access in Human-Computer Interaction. Springer, 121--128. Google ScholarDigital Library
- Luchin Doblies, David Stolz, Alireza Darvishy, and Hans-Peter Hutter. 2014. PAVE: A web application to identify and correct accessibility problems in PDF documents. In International Conference on Computers for Handicapped Persons. Springer, 185--192.Google ScholarCross Ref
- Benoît Encelle, Magali Ollagnier-Beldame, Stéphanie Pouchot, and Yannick Prié. 2011. Annotation-based video enrichment for blind people: A pilot study on the use of earcons and speech synthesis. In The proceedings of the 13th international ACM SIGACCESS conference on Computers and accessibility. 123--130. Google ScholarDigital Library
- Gunnar Farnebäck. 2003. Two-frame motion estimation based on polynomial expansion. In Scandinavian conference on Image analysis. Springer, 363--370. Google ScholarDigital Library
- FIDE. [n.d.]. FIDE Handbook C. General Rules and Technical Recommendations for Tournaments / 04. FIDE Swiss Rules / C.04.1 Basic rules for Swiss Systems /. https://handbook.fide.com/chapter/C0401.Google Scholar
- Denis Fortun, Patrick Bouthemy, and Charles Kervrann. 2015. Optical flow modeling and computation: A survey. Computer Vision and Image Understanding 134 (2015), 1--21. Google ScholarDigital Library
- Kentarou Fukuda, Shin Saito, Hironobu Takagi, and Chieko Asakawa. 2005. Proposing New Metrics to Evaluate Web Usability for the Blind. In CHI '05 Ex-tended Abstracts on Human Factors in Computing Systems (Portland, OR, USA) (CHI EA '05). Association for Computing Machinery, New York, NY, USA, 1387--1390. https://doi.org/10.1145/1056808.1056923 Google ScholarDigital Library
- Jort F Gemmeke, Daniel PW Ellis, Dylan Freedman, Aren Jansen, Wade Lawrence, R Channing Moore, Manoj Plakal, and Marvin Ritter. 2017. Audio set: An ontology and human-labeled dataset for audio events. In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 776--780.Google ScholarDigital Library
- Julia González, Mercedes Macías, Roberto Rodríguez, and Fernando Sánchez. 2003. Accessibility metrics of web pages for blind end-users. In International Conference on Web Engineering. Springer, 374--383. Google ScholarDigital Library
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.Google ScholarCross Ref
- Lai Jiang, Mai Xu, Tie Liu, Minglang Qiao, and Zulin Wang. 2018. DeepVS: A Deep Learning Based Video Saliency Prediction Approach. In The European Conference on Computer Vision (ECCV).Google Scholar
- Yu-Gang Jiang, Yanran Wang, Rui Feng, Xiangyang Xue, Yingbin Zheng, and Hanfang Yang. 2013. Understanding and predicting interestingness of videos. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 27. Google ScholarDigital Library
- Qiuqiang Kong, Yin Cao, Turab Iqbal, Yuxuan Wang, Wenwu Wang, and Mark D Plumbley. 2019. Panns: Large-scale pretrained audio neural networks for audio pattern recognition. arXiv preprint arXiv:1912.10211 (2019).Google Scholar
- Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. 2014. Microsoft coco: Common objects in context. In European conference on computer vision. Springer, 740--755.Google ScholarCross Ref
- Xingyu Liu, Patrick Carrington, Xiang 'Anthony' Chen, and Amy Pavel. 2021. What Makes Videos Accessible to Blind and Visually Impaired People?. In Pro-ceedings of the 2021 CHI Conference on Human Factors in Computing Systems (Yokohama, Japan) (CHI '21). Association for Computing Machinery, New York, NY, USA, Article 272, 14 pages. https://doi.org/10.1145/3411764.3445233 Google ScholarDigital Library
- Edward Loper and Steven Bird. 2002. NLTK: The Natural Language Toolkit. In Proceedings of the ACL-02 Workshop on Effective Tools and Methodologies for Teaching Natural Language Processing and Computational Linguistics - Volume 1 (Philadelphia, Pennsylvania) (ETMTNLP '02). Association for Computational Linguistics, USA, 63--70. https://doi.org/10.3115/1118108.1118117 Google ScholarDigital Library
- AJ Piergiovanni and Michael S Ryoo. 2020. AViD Dataset: Anonymized Videos from Diverse Countries. arXiv preprint arXiv:2007.05515 (2020).Google Scholar
- Nikolay Ponomarenko, Lina Jin, Oleg Ieremeiev, Vladimir Lukin, Karen Egiazar-ian, Jaakko Astola, Benoit Vozel, Kacem Chehdi, Marco Carli, Federica Battisti, et al. 2015. Image database TID2013: Peculiarities, results and perspectives. Signal processing: Image communication 30 (2015), 57--77. Google ScholarDigital Library
- Nikolay Ponomarenko, Vladimir Lukin, Alexander Zelensky, Karen Egiazarian, Marco Carli, and Federica Battisti. 2009. TID2008-a database for evaluation of full-reference visual quality assessment metrics. Advances of Modern Radioelectronics 10, 4 (2009), 30--45.Google Scholar
- Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. 2015. You Only Look Once: Unified, Real-Time Object Detection. arXiv:1506.02640 [cs.CV]Google Scholar
- Joseph Redmon and Ali Farhadi. 2016. YOLO9000: Better, Faster, Stronger. arXiv:1612.08242 [cs.CV]Google Scholar
- Joseph Redmon and Ali Farhadi. 2018. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767 (2018).Google Scholar
- Camila Silva, Marcelo Medeiros Eler, and Gordon Fraser. 2018. A survey on the tool support for the automatic evaluation of mobile accessibility. In Proceedings of the 8th International Conference on Software Development and Technologies for Enhancing Accessibility and Fighting Info-exclusion. 286--293. Google ScholarDigital Library
- Dan Stowell, Dimitrios Giannoulis, Emmanouil Benetos, Mathieu Lagrange, and Mark D Plumbley. 2015. Detection and classification of acoustic scenes and events. IEEE Transactions on Multimedia 17, 10 (2015), 1733--1746.Google ScholarDigital Library
- Markel Vigo and Giorgio Brajnik. 2011. Automatic web accessibility metrics: Where we are and where we can go. Interacting with computers 23, 2 (2011), 137--155. Google ScholarDigital Library
- W3C. 2021. Audio Description of Visual Information | Web Accessibility Initiative (WAI) | W3C. https://www.w3.org/WAI/media/av/description/.Google Scholar
- W3C. 2021. Making Audio and Video Media Accessible | Web Accessibility Initiative (WAI) | W3C. https://www.w3.org/WAI/media/av/.Google Scholar
- YouTube. [n.d.]. YouTube for Press. https://blog.youtube/press/.Google Scholar
- Beste F Yuksel, Pooyan Fazli, Umang Mathur, Vaishali Bisht, Soo Jung Kim, Joshua Junhee Lee, Seung Jung Jin, Yue-Ting Siu, Joshua A Miele, and Ilmi Yoon. 2020. Human-in-the-Loop Machine Learning to Increase Video Accessibility for Visually Impaired and Blind Users. In Proceedings of the 2020aCM Designing Interactive Systems Conference. 47--60. Google ScholarDigital Library
- Zhengxia Zou, Zhenwei Shi, Yuhong Guo, and Jieping Ye. 2019. Object Detection in 20 Years: A Survey. arXiv:1905.05055 [cs.CV]Google Scholar
Index Terms
- Non-Visual Accessibility Assessment of Videos
Recommendations
SmartLearn: Visual-Temporal Accessibility for Slide-based e-learning Videos
CHI EA '24: Extended Abstracts of the 2024 CHI Conference on Human Factors in Computing SystemsIn the realm of e-learning, video-based content is increasingly prevalent but brings with it unique accessibility challenges. Our research, beginning with a formative study involving 53 participants, has pinpointed the primary accessibility barriers in ...
Human-in-the-Loop Machine Learning to Increase Video Accessibility for Visually Impaired and Blind Users
DIS '20: Proceedings of the 2020 ACM Designing Interactive Systems ConferenceVideo accessibility is crucial for blind and visually impaired individuals for education, employment, and entertainment purposes. However, professional video descriptions are costly and time-consuming. Volunteer-created video descriptions could be a ...
AuDIVA: A tool for embedding Audio Descriptions to enhance Video Accessibility for Persons with Visual Impairments
Inclusion of online videos on a website attracts a large number of audience because of its ease of perception. Even though rendering videos on the web is advantageous, persons with disabilities face difficulties in accessing the content. Web Content ...
Comments