skip to main content
10.1145/3173574.3174092acmconferencesArticle/Chapter ViewAbstractPublication PageschiConference Proceedingsconference-collections
research-article
Honorable Mention

Caption Crawler: Enabling Reusable Alternative Text Descriptions using Reverse Image Search

Published:21 April 2018Publication History

ABSTRACT

Accessing images online is often difficult for users with vision impairments. This population relies on text descriptions of images that vary based on website authors' accessibility practices. Where one author might provide a descriptive caption for an image, another might provide no caption for the same image, leading to inconsistent experiences. In this work, we present the Caption Crawler system, which uses reverse image search to find existing captions on the web and make them accessible to a user's screen reader. We report our system's performance on a set of 481 websites from alexa.com's list of most popular sites to estimate caption coverage and latency, and also report blind and sighted users' ratings of our system's output quality. Finally, we conducted a user study with fourteen screen reader users to examine how the system might be used for personal browsing.

Skip Supplemental Material Section

Supplemental Material

pn4157-file3.mp4

mp4

21.6 MB

pn4157-file5.mp4

mp4

4.2 MB

pn4157.mp4

mp4

247.4 MB

References

  1. Alexa top 500 global sites on the web, 2017. https://www.alexa.com/topsites.Google ScholarGoogle Scholar
  2. Bigham, J. P. (2007, January). Increasing web accessibility by automatically judging alternative text quality. In Proceedings of the 12th international conference on Intelligent user interfaces (pp. 349--352). ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Bigham, J. P., Cavender, A. C., Brudvik, J. T., Wobbrock, J. O.,&Ladner, R. E. (2007, October). WebinSitu: a comparative analysis of blind and sighted browsing behavior. In Proceedings of the 9th international ACM SIGACCESS conference on Computers and accessibility (pp. 51--58). ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Bigham, J. P., Jayant, C., Ji, H., Little, G., Miller, A., Miller, R. C., ...&Yeh, T. (2010, October). VizWiz: nearly real-time answers to visual questions. In Proceedings of the 23nd annual ACM symposium on User interface software and technology (pp. 333342). ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Bigham, J. P., Jayant, C., Miller, A., White, B.,&Yeh, T. (2010, June). VizWiz:: LocateIt-enabling blind people to locate objects in their environment. In Computer Vision and Pattern Recognition Workshops (CVPRW), 2010 IEEE Computer Society Conference on (pp. 65--72). IEEE.Google ScholarGoogle Scholar
  6. Bigham, J. P., Kaminsky, R. S., Ladner, R. E., Danielsson, O. M.,&Hempton, G. L. (2006, October). WebInSight:: making web images accessible. In Proceedings of the 8th international ACM SIGACCESS conference on Computers and accessibility (pp. 181--188). ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Blattner, M. M., Sumikawa, D. A.,&Greenberg, R. M. (1989). Earcons and icons: Their structure and common design principles. Human-Computer Interaction, 4(1), 11--44. Blattner, Meera M., Denise A. Sumikawa, and Robert M. Greenberg. "Earcons and icons: Their structure and common design principles." Human-Computer Interaction 4, no. 1 (1989): 11--44. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Brady, E. L., Zhong, Y., Morris, M. R.,&Bigham, J. P. (2013, February). Investigating the appropriateness of social network question asking as a resource for blind users. In Proceedings of the 2013 conference on Computer supported cooperative work (pp. 12251236). ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Brady, E., Morris, M.R., and Bigham, J.P. Gauging Receptiveness to Social Microvolunteering. Proceedings of CHI 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. CaptionBot -- For pictures worth the thousand words, 2017. https://www.captionbot.ai.Google ScholarGoogle Scholar
  11. Diaper, D.,&Worman, L. (2004). Two Falls out of Three in the Automated Accessibility Assessment of World Wide Web Sites: A-Prompt vs. Bobby. In People and Computers XVII-Designing for Society (pp. 349--363). Springer, London.Dan Diaper, and Lindzy Worman. 2004. Two Falls out of Three in the Automated Accessibility Assessment of World Wide Web Sites: A-Prompt vs. Bobby. In People and Computers XVII-Designing for Society (pp. 349--363). Springer, London.Google ScholarGoogle Scholar
  12. Elzer, S., Schwartz, E., Carberry, S., Chester, D., Demir, S.,&Wu, P. (2007, March). A Browser Extension for Providing Visually Impaired Users Access to the Content of Bar Charts on the Web. In WEBIST (2) (pp. 59--66).Elzer, Stephanie, Edward Schwartz, Sandra Carberry, Daniel Chester, Seniz Demir, and Peng Wu. "A Browser Extension for Providing Visually Impaired Users Access to the Content of Bar Charts on the Web." In WEBIST (2), pp. 59--66. 2007.Google ScholarGoogle Scholar
  13. Fang, H., Gupta, S., Iandola, F., Srivastava, R.K., Deng, L., Dollar, P., Gao, J., He, X., Mitchell, M., Platt, J.C., Zitnick, C.L., and Zweig, G. From captions to visual concepts and back. Proceedings of CVPR 2015.Google ScholarGoogle ScholarCross RefCross Ref
  14. Goodwin, M., Susar, D., Nietzio, A., Snaprud, M., and Jensen, C.S. 2011. Global web accessibility analysis of national government portals and ministry web sites. Journal of Information Technology and Politics, 8(1), 41--67.Google ScholarGoogle ScholarCross RefCross Ref
  15. Harper, F.M., Raban, D., Rafaeli, S., and Konstan, J.A. Predictors of Answer Quality in Online Q&A Sites. Proceedings of CHI 2008, 865--874. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Image Insights | Microsoft Developer Network https://msdn.microsoft.com/enus/library/mt712790(v=bsynd.50).aspxGoogle ScholarGoogle Scholar
  17. Keysers, D., Renn, M.,&Breuel, T. M. (2007, October). Improving accessibility of HTML documents bygenerating image-tags in a proxy. In Proceedings of the 9th international ACM SIGACCESS conference on Computers and accessibility (pp. 249--250). ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. LaBarre, S.C. 2007. ABA Resolution and Report on Website Accessibility. Mental and Physical Disability Law Reporter. 31(4), 504--507.Google ScholarGoogle Scholar
  19. Loiacono, E.T., Romano, N.C., and McCoy, S. 2009. The state of corporate website accessibility. Communications of the ACM, 52(9), September 2009, 128--132. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. MacLeod, H., Bennett, C. L., Morris, M. R.,&Cutrell, E. (2017, May). Understanding Blind People's Experiences with Computer-Generated Captions of Social Media Images. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems (pp. 5988--5999). ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. MDN Docs - Figcaption.& Aria-label | Mozilla Developer Network https://developer.mozilla.org/enUS/docs/Web/HTML/Element/figcaption https://developer.mozilla.org/enUS/docs/Web/Accessibility/ARIA/ARIA_Techniques/ Using_the_aria-label_attributeGoogle ScholarGoogle Scholar
  22. Morris, M. R., Zolyomi, A., Yao, C., Bahram, S., Bigham, J. P.,&Kane, S. K. (2016, May). With most of it being pictures now, I rarely use it: Understanding Twitter's Evolving Accessibility to Blind Users. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems (pp. 5506--5516). ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Olalere, A. and Lazar, J. 2011. Accessibility of U.S. Federal Government Home Pages: Section 508 Compliance and Site Accessibility Statements. Government Information Quarterly, 28(3), 303--309.Google ScholarGoogle ScholarCross RefCross Ref
  24. Patil Swati, P., Pawar, B. V.,&Patil Ajay, S. (2013). Search Engine Optimization: A Study. Research Journal of Computer&Information Technology Sciences, 1(1), 10--13.Patil Swati, P., Pawar, B.V. and Patil Ajay, S., 2013. Search Engine Optimization: A Study. Research Journal of Computer&Information Technology Sciences, 1(1), pp.10--13.Google ScholarGoogle Scholar
  25. Power, C., Freire, A., Petrie, H., and Swallow, D. Guidelines are only half of the story: Accessibility problems encountered by blind users on the web. Proceedings of CHI 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Ramnath, K., Baker, S., Vanderwende, L., El-Saban, M., Sinha, S. N., Kannan, A., ...&Bergamo, A. (2014, March). Autocaption: Automatic caption generation for personal photos. In Applications of Computer Vision (WACV), 2014 IEEE Winter Conference on (pp. 10501057). IEEE. Krishnan Ramnath, Simon Baker, Lucy Vanderwende, Motaz El-Saban, Sudipta N. Sinha, Anitha Kannan, Noran Hassan, and Michel Galley, 2014, March. Autocaption: Automatic caption generation for personal photos. In Applications of Computer Vision (WACV), 2014 IEEE Winter Conference on (pp. 1050--1057). IEEE.Google ScholarGoogle ScholarCross RefCross Ref
  27. Rodríguez Vázquez, S. (2016, April). Measuring the impact of automated evaluation tools on alternative text quality: a web translation study. In Proceedings of the 13th Web for All Conference (p. 32). ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Rowe, N. C. (2002). Marie-4: A high-recall, selfimproving web crawler that finds images using captions. IEEE Intelligent Systems, 17(4), 8--14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Salisbury, E., Kamar, E., and Morris, M.R. Toward Scalable Social Alt Text: Conversational Crowdsourcing as a Tool for Refining Vision-toLanguage Technology for the Blind. Proceedings of HCOMP 2017.Google ScholarGoogle Scholar
  30. Takagi, H., Kawanaka, S., Kobayashi, M., Itoh, T.,&Asakawa, C. (2008, October). Social accessibility: achieving accessibility through collaborative metadata authoring. In Proceedings of the 10th international ACM SIGACCESS conference on Computers and accessibility (pp. 193--200). ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Teevan, J., Morris, M.R., and Panovich, K. Factors Affecting Response Quantity, Quality, and Speed for Questions Asked via Social Network Status Messages. Proceedings of ICWSM 2011.Google ScholarGoogle Scholar
  32. Telleen-Lawton, D., Chang, E. Y., Cheng, K. T.,&Chang, C. W. B. (2006, January). On usage models of content-based image search, filtering, and annotation. In Internet Imaging VII(Vol. 6061, p. 606102). International Society for Optics and Photonics.Google ScholarGoogle Scholar
  33. Tran, K., He, X., Zhang, L., Sun, J., Carapcea, C., Thrasher, C., Buehler, C., and Sienkiewicz, C. Rich Image Captioning in the Wild. Proceedings of CVPR 2016.Google ScholarGoogle ScholarCross RefCross Ref
  34. von Ahn, L., Ginosar, S., Kedia, M., Liu, R., and Blum, M. Improving accessibility of the web with a computer game. Proceedings of CHI 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Voykinska, V., Azenkot, S., Wu, S., and Leshed, G. How Blind People Interact with Visual Content on Social Networking Services. Proceedings of CSCW 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Web Content Accessibility Guidelines 2.0, W3C World Wide Web Consortium Recommendation 05 September 2017. (http://www.w3.org/TR/200X/RECWCAG20-20081211/)Google ScholarGoogle Scholar
  37. Wu, S., Wieland, J., Farivar, O.,&Schiller, J. (2017, February). Automatic alt-text: Computer-generated image descriptions for blind users on a social network service. In Proceedings of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing (pp. 1180--1192). ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Zhang, X., Li, Z.,&Chao, W. (2013). Improving image tags by exploiting web search results. Multimedia tools and applications, 62(3), 601--631. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Zhang, X., Ross, A. S., Caspi, A., Fogarty, J.,&Wobbrock, J. O. (2017, May). Interaction Proxies for Runtime Repair and Enhancement of Mobile Application Accessibility. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems (pp. 6024--6037). ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Caption Crawler: Enabling Reusable Alternative Text Descriptions using Reverse Image Search

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      CHI '18: Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems
      April 2018
      8489 pages
      ISBN:9781450356206
      DOI:10.1145/3173574

      Copyright © 2018 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 21 April 2018

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      CHI '18 Paper Acceptance Rate666of2,590submissions,26%Overall Acceptance Rate6,199of26,314submissions,24%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader