research-article

Caption Crawler: Enabling Reusable Alternative Text Descriptions using Reverse Image Search

Authors:
Darren Guinness

University of Colorado Boulder&Microsoft Research, Boulder, CO, USA

University of Colorado Boulder&Microsoft Research, Boulder, CO, USA
View Profile

,
Edward Cutrell

Microsoft Research, Redmond, WA, USA

Microsoft Research, Redmond, WA, USA
View Profile

,
Meredith Ringel Morris

Microsoft Research, Redmond, WA, USA

Microsoft Research, Redmond, WA, USA
View Profile

CHI '18: Proceedings of the 2018 CHI Conference on Human Factors in Computing SystemsApril 2018Paper No.: 518Pages 1–11https://doi.org/10.1145/3173574.3174092

Published:21 April 2018Publication History

CHI '18: Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems

Pages 1–11

ABSTRACT

Accessing images online is often difficult for users with vision impairments. This population relies on text descriptions of images that vary based on website authors' accessibility practices. Where one author might provide a descriptive caption for an image, another might provide no caption for the same image, leading to inconsistent experiences. In this work, we present the Caption Crawler system, which uses reverse image search to find existing captions on the web and make them accessible to a user's screen reader. We report our system's performance on a set of 481 websites from alexa.com's list of most popular sites to estimate caption coverage and latency, and also report blind and sighted users' ratings of our system's output quality. Finally, we conducted a user study with fourteen screen reader users to examine how the system might be used for personal browsing.

Supplemental Material

pn4157-file3.mp4

mp4

21.6 MB

Download

pn4157-file5.mp4

mp4

4.2 MB

Download

pn4157.mp4

mp4

247.4 MB

Download

References

Alexa top 500 global sites on the web, 2017. https://www.alexa.com/topsites.Google Scholar
Bigham, J. P. (2007, January). Increasing web accessibility by automatically judging alternative text quality. In Proceedings of the 12th international conference on Intelligent user interfaces (pp. 349--352). ACM. Google ScholarDigital Library
Bigham, J. P., Cavender, A. C., Brudvik, J. T., Wobbrock, J. O.,&Ladner, R. E. (2007, October). WebinSitu: a comparative analysis of blind and sighted browsing behavior. In Proceedings of the 9th international ACM SIGACCESS conference on Computers and accessibility (pp. 51--58). ACM. Google ScholarDigital Library
Bigham, J. P., Jayant, C., Ji, H., Little, G., Miller, A., Miller, R. C., ...&Yeh, T. (2010, October). VizWiz: nearly real-time answers to visual questions. In Proceedings of the 23nd annual ACM symposium on User interface software and technology (pp. 333342). ACM. Google ScholarDigital Library
Bigham, J. P., Jayant, C., Miller, A., White, B.,&Yeh, T. (2010, June). VizWiz:: LocateIt-enabling blind people to locate objects in their environment. In Computer Vision and Pattern Recognition Workshops (CVPRW), 2010 IEEE Computer Society Conference on (pp. 65--72). IEEE.Google Scholar
Bigham, J. P., Kaminsky, R. S., Ladner, R. E., Danielsson, O. M.,&Hempton, G. L. (2006, October). WebInSight:: making web images accessible. In Proceedings of the 8th international ACM SIGACCESS conference on Computers and accessibility (pp. 181--188). ACM. Google ScholarDigital Library
Blattner, M. M., Sumikawa, D. A.,&Greenberg, R. M. (1989). Earcons and icons: Their structure and common design principles. Human-Computer Interaction, 4(1), 11--44. Blattner, Meera M., Denise A. Sumikawa, and Robert M. Greenberg. "Earcons and icons: Their structure and common design principles." Human-Computer Interaction 4, no. 1 (1989): 11--44. Google ScholarDigital Library
Brady, E. L., Zhong, Y., Morris, M. R.,&Bigham, J. P. (2013, February). Investigating the appropriateness of social network question asking as a resource for blind users. In Proceedings of the 2013 conference on Computer supported cooperative work (pp. 12251236). ACM. Google ScholarDigital Library
Brady, E., Morris, M.R., and Bigham, J.P. Gauging Receptiveness to Social Microvolunteering. Proceedings of CHI 2015. Google ScholarDigital Library
CaptionBot -- For pictures worth the thousand words, 2017. https://www.captionbot.ai.Google Scholar
Diaper, D.,&Worman, L. (2004). Two Falls out of Three in the Automated Accessibility Assessment of World Wide Web Sites: A-Prompt vs. Bobby. In People and Computers XVII-Designing for Society (pp. 349--363). Springer, London.Dan Diaper, and Lindzy Worman. 2004. Two Falls out of Three in the Automated Accessibility Assessment of World Wide Web Sites: A-Prompt vs. Bobby. In People and Computers XVII-Designing for Society (pp. 349--363). Springer, London.Google Scholar
Elzer, S., Schwartz, E., Carberry, S., Chester, D., Demir, S.,&Wu, P. (2007, March). A Browser Extension for Providing Visually Impaired Users Access to the Content of Bar Charts on the Web. In WEBIST (2) (pp. 59--66).Elzer, Stephanie, Edward Schwartz, Sandra Carberry, Daniel Chester, Seniz Demir, and Peng Wu. "A Browser Extension for Providing Visually Impaired Users Access to the Content of Bar Charts on the Web." In WEBIST (2), pp. 59--66. 2007.Google Scholar
Fang, H., Gupta, S., Iandola, F., Srivastava, R.K., Deng, L., Dollar, P., Gao, J., He, X., Mitchell, M., Platt, J.C., Zitnick, C.L., and Zweig, G. From captions to visual concepts and back. Proceedings of CVPR 2015.Google ScholarCross Ref
Goodwin, M., Susar, D., Nietzio, A., Snaprud, M., and Jensen, C.S. 2011. Global web accessibility analysis of national government portals and ministry web sites. Journal of Information Technology and Politics, 8(1), 41--67.Google ScholarCross Ref
Harper, F.M., Raban, D., Rafaeli, S., and Konstan, J.A. Predictors of Answer Quality in Online Q&A Sites. Proceedings of CHI 2008, 865--874. Google ScholarDigital Library
Image Insights | Microsoft Developer Network https://msdn.microsoft.com/enus/library/mt712790(v=bsynd.50).aspxGoogle Scholar
Keysers, D., Renn, M.,&Breuel, T. M. (2007, October). Improving accessibility of HTML documents bygenerating image-tags in a proxy. In Proceedings of the 9th international ACM SIGACCESS conference on Computers and accessibility (pp. 249--250). ACM. Google ScholarDigital Library
LaBarre, S.C. 2007. ABA Resolution and Report on Website Accessibility. Mental and Physical Disability Law Reporter. 31(4), 504--507.Google Scholar
Loiacono, E.T., Romano, N.C., and McCoy, S. 2009. The state of corporate website accessibility. Communications of the ACM, 52(9), September 2009, 128--132. Google ScholarDigital Library
MacLeod, H., Bennett, C. L., Morris, M. R.,&Cutrell, E. (2017, May). Understanding Blind People's Experiences with Computer-Generated Captions of Social Media Images. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems (pp. 5988--5999). ACM. Google ScholarDigital Library
MDN Docs - Figcaption.& Aria-label | Mozilla Developer Network https://developer.mozilla.org/enUS/docs/Web/HTML/Element/figcaption https://developer.mozilla.org/enUS/docs/Web/Accessibility/ARIA/ARIA_Techniques/ Using_the_aria-label_attributeGoogle Scholar
Morris, M. R., Zolyomi, A., Yao, C., Bahram, S., Bigham, J. P.,&Kane, S. K. (2016, May). With most of it being pictures now, I rarely use it: Understanding Twitter's Evolving Accessibility to Blind Users. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems (pp. 5506--5516). ACM. Google ScholarDigital Library
Olalere, A. and Lazar, J. 2011. Accessibility of U.S. Federal Government Home Pages: Section 508 Compliance and Site Accessibility Statements. Government Information Quarterly, 28(3), 303--309.Google ScholarCross Ref
Patil Swati, P., Pawar, B. V.,&Patil Ajay, S. (2013). Search Engine Optimization: A Study. Research Journal of Computer&Information Technology Sciences, 1(1), 10--13.Patil Swati, P., Pawar, B.V. and Patil Ajay, S., 2013. Search Engine Optimization: A Study. Research Journal of Computer&Information Technology Sciences, 1(1), pp.10--13.Google Scholar
Power, C., Freire, A., Petrie, H., and Swallow, D. Guidelines are only half of the story: Accessibility problems encountered by blind users on the web. Proceedings of CHI 2012. Google ScholarDigital Library
Ramnath, K., Baker, S., Vanderwende, L., El-Saban, M., Sinha, S. N., Kannan, A., ...&Bergamo, A. (2014, March). Autocaption: Automatic caption generation for personal photos. In Applications of Computer Vision (WACV), 2014 IEEE Winter Conference on (pp. 10501057). IEEE. Krishnan Ramnath, Simon Baker, Lucy Vanderwende, Motaz El-Saban, Sudipta N. Sinha, Anitha Kannan, Noran Hassan, and Michel Galley, 2014, March. Autocaption: Automatic caption generation for personal photos. In Applications of Computer Vision (WACV), 2014 IEEE Winter Conference on (pp. 1050--1057). IEEE.Google ScholarCross Ref
Rodríguez Vázquez, S. (2016, April). Measuring the impact of automated evaluation tools on alternative text quality: a web translation study. In Proceedings of the 13th Web for All Conference (p. 32). ACM. Google ScholarDigital Library
Rowe, N. C. (2002). Marie-4: A high-recall, selfimproving web crawler that finds images using captions. IEEE Intelligent Systems, 17(4), 8--14. Google ScholarDigital Library
Salisbury, E., Kamar, E., and Morris, M.R. Toward Scalable Social Alt Text: Conversational Crowdsourcing as a Tool for Refining Vision-toLanguage Technology for the Blind. Proceedings of HCOMP 2017.Google Scholar
Takagi, H., Kawanaka, S., Kobayashi, M., Itoh, T.,&Asakawa, C. (2008, October). Social accessibility: achieving accessibility through collaborative metadata authoring. In Proceedings of the 10th international ACM SIGACCESS conference on Computers and accessibility (pp. 193--200). ACM. Google ScholarDigital Library
Teevan, J., Morris, M.R., and Panovich, K. Factors Affecting Response Quantity, Quality, and Speed for Questions Asked via Social Network Status Messages. Proceedings of ICWSM 2011.Google Scholar
Telleen-Lawton, D., Chang, E. Y., Cheng, K. T.,&Chang, C. W. B. (2006, January). On usage models of content-based image search, filtering, and annotation. In Internet Imaging VII(Vol. 6061, p. 606102). International Society for Optics and Photonics.Google Scholar
Tran, K., He, X., Zhang, L., Sun, J., Carapcea, C., Thrasher, C., Buehler, C., and Sienkiewicz, C. Rich Image Captioning in the Wild. Proceedings of CVPR 2016.Google ScholarCross Ref
von Ahn, L., Ginosar, S., Kedia, M., Liu, R., and Blum, M. Improving accessibility of the web with a computer game. Proceedings of CHI 2006. Google ScholarDigital Library
Voykinska, V., Azenkot, S., Wu, S., and Leshed, G. How Blind People Interact with Visual Content on Social Networking Services. Proceedings of CSCW 2016. Google ScholarDigital Library
Web Content Accessibility Guidelines 2.0, W3C World Wide Web Consortium Recommendation 05 September 2017. (http://www.w3.org/TR/200X/RECWCAG20-20081211/)Google Scholar
Wu, S., Wieland, J., Farivar, O.,&Schiller, J. (2017, February). Automatic alt-text: Computer-generated image descriptions for blind users on a social network service. In Proceedings of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing (pp. 1180--1192). ACM. Google ScholarDigital Library
Zhang, X., Li, Z.,&Chao, W. (2013). Improving image tags by exploiting web search results. Multimedia tools and applications, 62(3), 601--631. Google ScholarDigital Library
Zhang, X., Ross, A. S., Caspi, A., Fogarty, J.,&Wobbrock, J. O. (2017, May). Interaction Proxies for Runtime Repair and Enhancement of Mobile Application Accessibility. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems (pp. 6024--6037). ACM. Google ScholarDigital Library

Index Terms

Caption Crawler: Enabling Reusable Alternative Text Descriptions using Reverse Image Search
1. Social and professional topics

Recommendations

Rich Representations of Visual Content for Screen Reader Users
CHI '18: Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems

Alt text (short for "alternative text") is descriptive text associated with an image in HTML and other document formats. Screen reader technologies speak the alt text aloud to people who are visually impaired. Introduced with HTML 2.0 in 1995, the alt ...
Read More
A ratification of means: international law and assistive technology in the developing world
ICTD '10: Proceedings of the 4th ACM/IEEE International Conference on Information and Communication Technologies and Development

Several nations around the world have ratified the UN Convention on the Rights of Persons with Disabilities (CRPD) since 2008. Ratifying states commit that national law will guarantee rights enumerated in the CRPD. The use of Assistive Technology (AT) ...
Read More
The role of DAISY digital talking books in the education of individuals with blindness: A pilot study

The present study is characterized as pilot and investigates the impact that different aural renderings have on blind individuals' comprehension. In specific, the present research attempts to compare the effective or active listening of participants ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CHI '18: Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems
April 2018
8489 pages
ISBN:9781450356206
DOI:10.1145/3173574
General Chairs:
Regan Mandryk
University of Saskatchewan, Canada
,
Mark Hancock
University of Waterloo, Canada
,
Program Chairs:
Mark Perry
Brunel University London, UK
,
Anna Cox
University College London, UK
Copyright © 2018 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 21 April 2018
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Badges
- Honorable Mention
Author Tags
accessibility
alt text
alternative text
image captioning
screen readers
vision impairment
Qualifiers
- research-article
Conference

Acceptance Rates
CHI '18 Paper Acceptance Rate666of2,590submissions,26%Overall Acceptance Rate6,199of26,314submissions,24%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 78
  Total Citations
  View Citations
- 855
  Total Downloads
- Downloads (Last 12 months)132
- Downloads (Last 6 weeks)19
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Caption Crawler: Enabling Reusable Alternative Text Descriptions using Reverse Image Search

CHI '18: Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Rich Representations of Visual Content for Screen Reader Users

A ratification of means: international law and assistive technology in the developing world

The role of DAISY digital talking books in the education of individuals with blindness: A pilot study

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Badges

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media