research-article

Public Access

The sketchy database: learning to retrieve badly drawn bunnies

Authors:
Patsorn Sangkloy

Georgia Institute of Technology

Georgia Institute of Technology
View Profile

,
Nathan Burnell

Brown University

Brown University
View Profile

,
Cusuh Ham

Georgia Institute of Technology

Georgia Institute of Technology
View Profile

,
James Hays

Georgia Institute of Technology

Georgia Institute of Technology
View Profile

Authors Info & Claims

ACM Transactions on Graphics Volume 35 Issue 4Article No.: 119pp 1–12https://doi.org/10.1145/2897824.2925954

Published:11 July 2016Publication History

ACM Transactions on Graphics

Abstract

We present the Sketchy database, the first large-scale collection of sketch-photo pairs. We ask crowd workers to sketch particular photographic objects sampled from 125 categories and acquire 75,471 sketches of 12,500 objects. The Sketchy database gives us fine-grained associations between particular photos and sketches, and we use this to train cross-domain convolutional networks which embed sketches and photographs in a common feature space. We use our database as a benchmark for fine-grained retrieval and show that our learned representation significantly outperforms both hand-crafted features as well as deep features trained for sketch or photo classification. Beyond image retrieval, we believe the Sketchy database opens up new opportunities for sketch and image understanding and synthesis.

Supplemental Material

a119.mp4

mp4

346.8 MB

Download

Available for Download

zip

a119-sangkloy-supp.zip (128.6 MB)

Supplemental files.

References

Antol, S., Zitnick, C. L., and Parikh, D. 2014. Zero-Shot Learning via Visual Abstraction. In ECCV.Google Scholar
Bansal, A., Kowdle, A., Parikh, D., Gallagher, A., and Zitnick, L. 2013. Which edges matter? In Computer Vision Workshops (ICCVW), 2013 IEEE International Conference on, 578--585. Google ScholarDigital Library
Bell, S., and Bala, K. 2015. Learning visual similarity for product design with convolutional neural networks. ACM Trans. Graph. 34, 4 (July). Google ScholarDigital Library
Berger, I., Shamir, A., Mahler, M., Carter, E., and Hodgins, J. 2013. Style and abstraction in portrait sketching. ACM Trans. Graph. 32, 4 (July), 55:1--55:12. Google ScholarDigital Library
Brady, T. F., Konkle, T., Alvarez, G. A., and Oliva, A. 2008. Visual long-term memory has a massive storage capacity for object details. Proceedings of the National Academy of Sciences 105, 38, 14325--14329.Google ScholarCross Ref
Brady, T. F., Konkle, T., Gill, J., Oliva, A., and Alvarez, G. A. 2013. Visual long-term memory has the same limit on fidelity as visual working memory. Psychological Science 24, 6.Google ScholarCross Ref
Cao, Y., Wang, C., Zhang, L., and Zhang, L. 2011. Edgel index for large-scale sketch-based image search. In Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on, IEEE, 761--768. Google ScholarDigital Library
Cao, X., Zhang, H., Liu, S., Guo, X., and Lin, L. 2013. Sym-fish: A symmetry-aware flip invariant sketch histogram shape descriptor. In Computer Vision (ICCV), 2013 IEEE International Conference on, 313--320. Google ScholarDigital Library
Chen, T., ming Cheng, M., Tan, P., Shamir, A., and min Hu, S. 2009. Sketch2photo: internet image montage. ACM SIGGRAPH Asia. Google ScholarDigital Library
Chen, T., Tan, P., Ma, L.-Q., Cheng, M.-M., Shamir, A., and Hu, S.-M. 2013. Poseshop: Human image database construction and personalized content synthesis. IEEE Transactions on Visualization and Computer Graphics 19, 5 (May), 824--837. Google ScholarDigital Library
Chopra, S., Hadsell, R., and LeCun, Y. 2005. Learning a similarity metric discriminatively, with application to face verification. In Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on, vol. 1, 539--546. Google ScholarDigital Library
Cole, F., Golovinskiy, A., Limpaecher, A., Barros, H. S., Finkelstein, A., Funkhouser, T., and Rusinkiewicz, S. 2008. Where do people draw lines? ACM Transactions on Graphics (Proc. SIGGRAPH) 27, 3 (Aug.). Google ScholarDigital Library
Del Bimbo, A., and Pala, P. 1997. Visual image retrieval by elastic matching of user sketches. Pattern Analysis and Machine Intelligence, IEEE Transactions on 19, 2 (Feb), 121--132. Google ScholarDigital Library
Dosovitskiy, A., Springenberg, J. T., and Brox, T. 2014. Learning to generate chairs with convolutional neural networks. CoRR abs/1411.5928.Google Scholar
Eitz, M., Hildebrand, K., Boubekeur, T., and Alexa, M. 2010. An evaluation of descriptors for large-scale image retrieval from sketched feature lines. Computers & Graphics 34, 5, 482--498. Google ScholarDigital Library
Eitz, M., Hildebrand, K., Boubekeur, T., and Alexa, M. 2011. Sketch-based image retrieval: Benchmark and bag-of-features descriptors. IEEE Transactions on Visualization and Computer Graphics 17, 11, 1624--1636. Google ScholarDigital Library
Eitz, M., Richter, R., Hildebrand, K., Boubekeur, T., and Alexa, M. 2011. Photosketcher: interactive sketch-based image synthesis. IEEE Computer Graphics and Applications. Google ScholarDigital Library
Eitz, M., Hays, J., and Alexa, M. 2012. How do humans sketch objects? ACM Trans. Graph. (Proc. SIGGRAPH) 31, 4, 44:1--44:10. Google ScholarDigital Library
Eitz, M., Richter, R., Boubekeur, T., Hildebrand, K., and Alexa, M. 2012. Sketch-based shape retrieval. ACM Transactions on Graphics (Proceedings SIGGRAPH) 31, 4, 31:1--31:10. Google ScholarDigital Library
Everingham, M., Gool, L., Williams, C. K., Winn, J., and Zisserman, A. 2010. The pascal visual object classes (voc) challenge. Int. J. Comput. Vision 88, 2 (June), 303--338. Google ScholarDigital Library
Felzenszwalb, P. F., Girshick, R. B., McAllester, D., and Ramanan, D. 2010. Object detection with discriminatively trained part-based models. IEEE Trans. Pattern Anal. Mach. Intell. 32, 9 (Sept.), 1627--1645. Google ScholarDigital Library
Grill-Spector, K., and Kanwisher, N. 2005. Visual recognition: as soon as you see it, you know what it is. Psychological Science 16, 2, 152--160.Google ScholarCross Ref
Hadsell, R., Chopra, S., and LeCun, Y. 2006. Dimensionality reduction by learning an invariant mapping. In Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on, vol. 2, 1735--1742. Google ScholarDigital Library
Han, X., Leung, T., Jia, Y., Sukthankar, R., and Berg, A. 2015. Matchnet: Unifying feature and metric learning for patch-based matching. In Computer Vision and Pattern Recognition (CVPR), 2015 IEEE Conference on, 3279--3286.Google Scholar
Hu, R., and Collomosse, J. 2013. A performance evaluation of gradient field hog descriptor for sketch based image retrieval. Computer Vision and Image Understanding 117, 7, 790--806. Google ScholarDigital Library
Jacobs, C. E., Finkelstein, A., and Salesin, D. H. 1995. Fast multiresolution image querying. In Proceedings of the 22Nd Annual Conference on Computer Graphics and Interactive Techniques, ACM, SIGGRAPH '95, 277--286. Google ScholarDigital Library
Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., and Darrell, T. 2014. Caffe: Convolutional architecture for fast feature embedding. arXiv preprint arXiv:1408.5093.Google Scholar
Jun, X., Aaron, H., Wilmot, L., and Holger, W. 2014. Portraitsketch: Face sketching assistance for novices. In Proceedings of the 27th Annual ACM Symposium on User Interface Software and Technology, ACM. Google ScholarDigital Library
Kato, T., Kurita, T., Otsu, N., and Hirata, K. 1992. A sketch retrieval method for full color image database-query by visual example. In Pattern Recognition, 1992. Vol. I. Conference A: Computer Vision and Applications, Proceedings., 11th IAPR International Conference on, 530--533.Google Scholar
Krizhevsky, A., Sutskever, I., and Hinton, G. E. 2012. Imagenet classification with deep convolutional neural networks. In 26th Annual Conference on Neural Information Processing Systems (NIPS), 1106--1114.Google Scholar
Lee, D., and Chun, M. M. What are the units of visual short-term memory, objects or spatial locations? Perception & Psychophysics 63, 2, 253--257.Google Scholar
Li, Y., Hospedales, T. M., Song, Y.-Z., and Gong, S. 2014. Fine-grained sketch-based image retrieval by matching deformable part models. In British Machine Vision Conference (BMVC).Google Scholar
Li, Y., Su, H., Qi, C. R., Fish, N., Cohen-Or, D., and Guibas, L. J. 2015. Joint embeddings of shapes and images via cnn image purification. ACM Trans. Graph. 34, 6 (Oct.), 234:1--234:12. Google ScholarDigital Library
Limpaecher, A., Feltman, N., Treuille, A., and Cohen, M. 2013. Real-time drawing assistance through crowdsourcing. ACM Trans. Graph. 32, 4 (July), 54:1--54:8. Google ScholarDigital Library
Lin, T., Maire, M., Belongie, S. J., Bourdev, L. D., Girshick, R. B., Hays, J., Perona, P., Ramanan, D., Dollár, P., and Zitnick, C. L. 2014. Microsoft COCO: common objects in context. CoRR abs/1405.0312.Google Scholar
Lin, T.-Y., Cui, Y., Belongie, S., and Hays, J. 2015. Learning deep representations for ground-to-aerial geolocalization. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google Scholar
Mainelli, T., Chau, M., Reith, R., and Shirer, M., 2015. Idc worldwide quarterly smart connected device tracker. http://www.idc.com/getdoc.jsp?containerId=prUS25500515, March 20, 2015.Google Scholar
Martin, D., Fowlkes, C., Tal, D., and Malik, J. 2001. A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In Proc. 8th Int'l Conf. Computer Vision, vol. 2, 416--423.Google ScholarCross Ref
Nieuwenstein, M., and Wyble, B. 2014. Beyond a mask and against the bottleneck: Retroactive dual-task interference during working memory consolidation of a masked visual target. Journal of Experimental Psychology: General 143, 1409--1427.Google ScholarCross Ref
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A. C., and Fei-Fei, L. 2015. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision (IJCV) 115, 3, 211--252. Google ScholarDigital Library
Saavedra, J. M., and Barrios, J. M. 2015. Sketch based image retrieval using learned keyshapes (lks). In Proceedings of the British Machine Vision Conference (BMVC), 164.1--164.11.Google Scholar
Schneider, R. G., and Tuytelaars, T. 2014. Sketch classification and classification-driven analysis using fisher vectors. ACM Trans. Graph. 33, 6 (Nov.), 174:1--174:9. Google ScholarDigital Library
Sclaroff, S. 1997. Deformable prototypes for encoding shape categories in image databases. Pattern Recognition 30, 4, 627--641.Google ScholarCross Ref
Shrivastava, A., Malisiewicz, T., Gupta, A., and Efros, A. A. 2011. Data-driven visual similarity for cross-domain image matching. In ACM Transactions on Graphics (TOG), vol. 30, ACM, 154. Google ScholarDigital Library
Smeulders, A., Worring, M., Santini, S., Gupta, A., and Jain, R. 2000. Content-based image retrieval at the end of the early years. Pattern Analysis and Machine Intelligence, IEEE Transactions on 22, 12 (Dec), 1349--1380. Google ScholarDigital Library
Su, H., Maji, S., Kalogerakis, E., and Learned-Miller, E. G. 2015. Multi-view convolutional neural networks for 3d shape recognition. In Proc. ICCV. Google ScholarDigital Library
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., and Rabinovich, A. 2014. Going deeper with convolutions. arXiv preprint arXiv:1409.4842.Google Scholar
Taigman, Y., Yang, M., Ranzato, M., and Wolf, L. 2014. Deepface: Closing the gap to human-level performance in face verification. In Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on, 1701--1708. Google ScholarDigital Library
van der Maaten, L., and Hinton, G. 2008. Visualizing high-dimensional data using t-sne. Journal of Machine Learning Research 9, 3 (Nov.), 2579--2605.Google Scholar
Wang, J., Song, Y., Leung, T., Rosenberg, C., Wang, J., Philbin, J., Chen, B., and Wu, Y. 2014. Learning fine-grained image similarity with deep ranking. CoRR abs/1404.4661. Google ScholarDigital Library
Wang, F., Kang, L., and Li, Y. 2015. Sketch-based 3d shape retrieval using convolutional neural networks. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google Scholar
Xiao, J., Ehinger, K. A., Hays, J., Torralba, A., and Oliva, A. 2014. Sun database: Exploring a large collection of scene categories. International Journal of Computer Vision, 1--20. Google ScholarDigital Library
Yu, Q., Yang, Y., Song, Y.-Z., Xiang, T., and Hospedales, T. 2015. Sketch-a-net that beats humans. In British Machine Vision Conference (BMVC).Google Scholar
Yu, Q., Liu, F., Song, Y., Xiang, T., Hospedales, T., and Loy, C. C. 2016. Sketch me that shoe. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google Scholar
Zeiler, M. D., and Fergus, R. 2014. Visualizing and understanding convolutional networks. In Computer Vision--ECCV 2014. Springer, 818--833.Google Scholar
Zhou, T., Jae Lee, Y., Yu, S. X., and Efros, A. A. 2015. Flowweb: Joint image set alignment by weaving consistent, pixel-wise correspondences. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google Scholar
Zhu, J.-Y., Lee, Y. J., and Efros, A. A. 2014. Averageexplorer: Interactive exploration and alignment of visual data collections. ACM Transactions on Graphics (SIGGRAPH 2014) 33, 4. Google ScholarDigital Library

Index Terms

The sketchy database: learning to retrieve badly drawn bunnies
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision representations
        Image representations
  2. Computer graphics
    1. Image manipulation
      1. Image processing

Recommendations

DeepSketch 3

Freehand sketches are a simple and powerful tool for communication. They are easily recognized across cultures and suitable for various applications. In this paper, we use deep convolutional neural networks (ConvNets), state-of-the-art in the field of ...
Read More
Multi-granularity Association Learning for On-the-fly Fine-grained Sketch-based Image Retrieval
Abstract
Fine-grained sketch-based image retrieval (FG-SBIR) addresses the problem of retrieving a specific photo from a given query sketch. However, its widespread applicability is limited because it is difficult for most people to draw a ...
Read More
Sketch-based Image Retrieval using Generative Adversarial Networks
MM '17: Proceedings of the 25th ACM international conference on Multimedia

For sketch-based image retrieval (SBIR), we propose a generative adversarial network trained on a large number of sketches and their corresponding real images. To imitate human search process, we attempt to match candidate images with theimaginary image ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM Transactions on Graphics Volume 35, Issue 4
July 2016
1396 pages
ISSN:0730-0301
EISSN:1557-7368
DOI:10.1145/2897824
Issue’s Table of Contents

Copyright © 2016 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 11 July 2016
Published in tog Volume 35, Issue 4

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
deep learning
image synthesis
siamese network
sketch-based image retrieval
triplet network
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 415
  Total Citations
  View Citations
- 3,658
  Total Downloads
- Downloads (Last 12 months)510
- Downloads (Last 6 weeks)73
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

The sketchy database: learning to retrieve badly drawn bunnies

ACM Transactions on Graphics

Abstract

Supplemental Material

Available for Download

References

Cited By

Index Terms

Recommendations

DeepSketch 3

Multi-granularity Association Learning for On-the-fly Fine-grained Sketch-based Image Retrieval

Sketch-based Image Retrieval using Generative Adversarial Networks

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

The sketchy database: learning to retrieve badly drawn bunnies

ACM Transactions on Graphics

Abstract

Supplemental Material

Available for Download

References

Cited By

Index Terms

Recommendations

DeepSketch 3

Multi-granularity Association Learning for On-the-fly Fine-grained Sketch-based Image Retrieval

Sketch-based Image Retrieval using Generative Adversarial Networks

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media