skip to main content
research-article
Public Access

The sketchy database: learning to retrieve badly drawn bunnies

Published:11 July 2016Publication History
Skip Abstract Section

Abstract

We present the Sketchy database, the first large-scale collection of sketch-photo pairs. We ask crowd workers to sketch particular photographic objects sampled from 125 categories and acquire 75,471 sketches of 12,500 objects. The Sketchy database gives us fine-grained associations between particular photos and sketches, and we use this to train cross-domain convolutional networks which embed sketches and photographs in a common feature space. We use our database as a benchmark for fine-grained retrieval and show that our learned representation significantly outperforms both hand-crafted features as well as deep features trained for sketch or photo classification. Beyond image retrieval, we believe the Sketchy database opens up new opportunities for sketch and image understanding and synthesis.

Skip Supplemental Material Section

Supplemental Material

a119.mp4

mp4

346.8 MB

References

  1. Antol, S., Zitnick, C. L., and Parikh, D. 2014. Zero-Shot Learning via Visual Abstraction. In ECCV.Google ScholarGoogle Scholar
  2. Bansal, A., Kowdle, A., Parikh, D., Gallagher, A., and Zitnick, L. 2013. Which edges matter? In Computer Vision Workshops (ICCVW), 2013 IEEE International Conference on, 578--585. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Bell, S., and Bala, K. 2015. Learning visual similarity for product design with convolutional neural networks. ACM Trans. Graph. 34, 4 (July). Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Berger, I., Shamir, A., Mahler, M., Carter, E., and Hodgins, J. 2013. Style and abstraction in portrait sketching. ACM Trans. Graph. 32, 4 (July), 55:1--55:12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Brady, T. F., Konkle, T., Alvarez, G. A., and Oliva, A. 2008. Visual long-term memory has a massive storage capacity for object details. Proceedings of the National Academy of Sciences 105, 38, 14325--14329.Google ScholarGoogle ScholarCross RefCross Ref
  6. Brady, T. F., Konkle, T., Gill, J., Oliva, A., and Alvarez, G. A. 2013. Visual long-term memory has the same limit on fidelity as visual working memory. Psychological Science 24, 6.Google ScholarGoogle ScholarCross RefCross Ref
  7. Cao, Y., Wang, C., Zhang, L., and Zhang, L. 2011. Edgel index for large-scale sketch-based image search. In Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on, IEEE, 761--768. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Cao, X., Zhang, H., Liu, S., Guo, X., and Lin, L. 2013. Sym-fish: A symmetry-aware flip invariant sketch histogram shape descriptor. In Computer Vision (ICCV), 2013 IEEE International Conference on, 313--320. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Chen, T., ming Cheng, M., Tan, P., Shamir, A., and min Hu, S. 2009. Sketch2photo: internet image montage. ACM SIGGRAPH Asia. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Chen, T., Tan, P., Ma, L.-Q., Cheng, M.-M., Shamir, A., and Hu, S.-M. 2013. Poseshop: Human image database construction and personalized content synthesis. IEEE Transactions on Visualization and Computer Graphics 19, 5 (May), 824--837. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Chopra, S., Hadsell, R., and LeCun, Y. 2005. Learning a similarity metric discriminatively, with application to face verification. In Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on, vol. 1, 539--546. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Cole, F., Golovinskiy, A., Limpaecher, A., Barros, H. S., Finkelstein, A., Funkhouser, T., and Rusinkiewicz, S. 2008. Where do people draw lines? ACM Transactions on Graphics (Proc. SIGGRAPH) 27, 3 (Aug.). Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Del Bimbo, A., and Pala, P. 1997. Visual image retrieval by elastic matching of user sketches. Pattern Analysis and Machine Intelligence, IEEE Transactions on 19, 2 (Feb), 121--132. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Dosovitskiy, A., Springenberg, J. T., and Brox, T. 2014. Learning to generate chairs with convolutional neural networks. CoRR abs/1411.5928.Google ScholarGoogle Scholar
  15. Eitz, M., Hildebrand, K., Boubekeur, T., and Alexa, M. 2010. An evaluation of descriptors for large-scale image retrieval from sketched feature lines. Computers & Graphics 34, 5, 482--498. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Eitz, M., Hildebrand, K., Boubekeur, T., and Alexa, M. 2011. Sketch-based image retrieval: Benchmark and bag-of-features descriptors. IEEE Transactions on Visualization and Computer Graphics 17, 11, 1624--1636. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Eitz, M., Richter, R., Hildebrand, K., Boubekeur, T., and Alexa, M. 2011. Photosketcher: interactive sketch-based image synthesis. IEEE Computer Graphics and Applications. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Eitz, M., Hays, J., and Alexa, M. 2012. How do humans sketch objects? ACM Trans. Graph. (Proc. SIGGRAPH) 31, 4, 44:1--44:10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Eitz, M., Richter, R., Boubekeur, T., Hildebrand, K., and Alexa, M. 2012. Sketch-based shape retrieval. ACM Transactions on Graphics (Proceedings SIGGRAPH) 31, 4, 31:1--31:10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Everingham, M., Gool, L., Williams, C. K., Winn, J., and Zisserman, A. 2010. The pascal visual object classes (voc) challenge. Int. J. Comput. Vision 88, 2 (June), 303--338. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Felzenszwalb, P. F., Girshick, R. B., McAllester, D., and Ramanan, D. 2010. Object detection with discriminatively trained part-based models. IEEE Trans. Pattern Anal. Mach. Intell. 32, 9 (Sept.), 1627--1645. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Grill-Spector, K., and Kanwisher, N. 2005. Visual recognition: as soon as you see it, you know what it is. Psychological Science 16, 2, 152--160.Google ScholarGoogle ScholarCross RefCross Ref
  23. Hadsell, R., Chopra, S., and LeCun, Y. 2006. Dimensionality reduction by learning an invariant mapping. In Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on, vol. 2, 1735--1742. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Han, X., Leung, T., Jia, Y., Sukthankar, R., and Berg, A. 2015. Matchnet: Unifying feature and metric learning for patch-based matching. In Computer Vision and Pattern Recognition (CVPR), 2015 IEEE Conference on, 3279--3286.Google ScholarGoogle Scholar
  25. Hu, R., and Collomosse, J. 2013. A performance evaluation of gradient field hog descriptor for sketch based image retrieval. Computer Vision and Image Understanding 117, 7, 790--806. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Jacobs, C. E., Finkelstein, A., and Salesin, D. H. 1995. Fast multiresolution image querying. In Proceedings of the 22Nd Annual Conference on Computer Graphics and Interactive Techniques, ACM, SIGGRAPH '95, 277--286. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., and Darrell, T. 2014. Caffe: Convolutional architecture for fast feature embedding. arXiv preprint arXiv:1408.5093.Google ScholarGoogle Scholar
  28. Jun, X., Aaron, H., Wilmot, L., and Holger, W. 2014. Portraitsketch: Face sketching assistance for novices. In Proceedings of the 27th Annual ACM Symposium on User Interface Software and Technology, ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Kato, T., Kurita, T., Otsu, N., and Hirata, K. 1992. A sketch retrieval method for full color image database-query by visual example. In Pattern Recognition, 1992. Vol. I. Conference A: Computer Vision and Applications, Proceedings., 11th IAPR International Conference on, 530--533.Google ScholarGoogle Scholar
  30. Krizhevsky, A., Sutskever, I., and Hinton, G. E. 2012. Imagenet classification with deep convolutional neural networks. In 26th Annual Conference on Neural Information Processing Systems (NIPS), 1106--1114.Google ScholarGoogle Scholar
  31. Lee, D., and Chun, M. M. What are the units of visual short-term memory, objects or spatial locations? Perception & Psychophysics 63, 2, 253--257.Google ScholarGoogle Scholar
  32. Li, Y., Hospedales, T. M., Song, Y.-Z., and Gong, S. 2014. Fine-grained sketch-based image retrieval by matching deformable part models. In British Machine Vision Conference (BMVC).Google ScholarGoogle Scholar
  33. Li, Y., Su, H., Qi, C. R., Fish, N., Cohen-Or, D., and Guibas, L. J. 2015. Joint embeddings of shapes and images via cnn image purification. ACM Trans. Graph. 34, 6 (Oct.), 234:1--234:12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Limpaecher, A., Feltman, N., Treuille, A., and Cohen, M. 2013. Real-time drawing assistance through crowdsourcing. ACM Trans. Graph. 32, 4 (July), 54:1--54:8. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Lin, T., Maire, M., Belongie, S. J., Bourdev, L. D., Girshick, R. B., Hays, J., Perona, P., Ramanan, D., Dollár, P., and Zitnick, C. L. 2014. Microsoft COCO: common objects in context. CoRR abs/1405.0312.Google ScholarGoogle Scholar
  36. Lin, T.-Y., Cui, Y., Belongie, S., and Hays, J. 2015. Learning deep representations for ground-to-aerial geolocalization. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarGoogle Scholar
  37. Mainelli, T., Chau, M., Reith, R., and Shirer, M., 2015. Idc worldwide quarterly smart connected device tracker. http://www.idc.com/getdoc.jsp?containerId=prUS25500515, March 20, 2015.Google ScholarGoogle Scholar
  38. Martin, D., Fowlkes, C., Tal, D., and Malik, J. 2001. A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In Proc. 8th Int'l Conf. Computer Vision, vol. 2, 416--423.Google ScholarGoogle ScholarCross RefCross Ref
  39. Nieuwenstein, M., and Wyble, B. 2014. Beyond a mask and against the bottleneck: Retroactive dual-task interference during working memory consolidation of a masked visual target. Journal of Experimental Psychology: General 143, 1409--1427.Google ScholarGoogle ScholarCross RefCross Ref
  40. Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A. C., and Fei-Fei, L. 2015. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision (IJCV) 115, 3, 211--252. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Saavedra, J. M., and Barrios, J. M. 2015. Sketch based image retrieval using learned keyshapes (lks). In Proceedings of the British Machine Vision Conference (BMVC), 164.1--164.11.Google ScholarGoogle Scholar
  42. Schneider, R. G., and Tuytelaars, T. 2014. Sketch classification and classification-driven analysis using fisher vectors. ACM Trans. Graph. 33, 6 (Nov.), 174:1--174:9. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Sclaroff, S. 1997. Deformable prototypes for encoding shape categories in image databases. Pattern Recognition 30, 4, 627--641.Google ScholarGoogle ScholarCross RefCross Ref
  44. Shrivastava, A., Malisiewicz, T., Gupta, A., and Efros, A. A. 2011. Data-driven visual similarity for cross-domain image matching. In ACM Transactions on Graphics (TOG), vol. 30, ACM, 154. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Smeulders, A., Worring, M., Santini, S., Gupta, A., and Jain, R. 2000. Content-based image retrieval at the end of the early years. Pattern Analysis and Machine Intelligence, IEEE Transactions on 22, 12 (Dec), 1349--1380. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Su, H., Maji, S., Kalogerakis, E., and Learned-Miller, E. G. 2015. Multi-view convolutional neural networks for 3d shape recognition. In Proc. ICCV. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., and Rabinovich, A. 2014. Going deeper with convolutions. arXiv preprint arXiv:1409.4842.Google ScholarGoogle Scholar
  48. Taigman, Y., Yang, M., Ranzato, M., and Wolf, L. 2014. Deepface: Closing the gap to human-level performance in face verification. In Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on, 1701--1708. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. van der Maaten, L., and Hinton, G. 2008. Visualizing high-dimensional data using t-sne. Journal of Machine Learning Research 9, 3 (Nov.), 2579--2605.Google ScholarGoogle Scholar
  50. Wang, J., Song, Y., Leung, T., Rosenberg, C., Wang, J., Philbin, J., Chen, B., and Wu, Y. 2014. Learning fine-grained image similarity with deep ranking. CoRR abs/1404.4661. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Wang, F., Kang, L., and Li, Y. 2015. Sketch-based 3d shape retrieval using convolutional neural networks. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarGoogle Scholar
  52. Xiao, J., Ehinger, K. A., Hays, J., Torralba, A., and Oliva, A. 2014. Sun database: Exploring a large collection of scene categories. International Journal of Computer Vision, 1--20. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Yu, Q., Yang, Y., Song, Y.-Z., Xiang, T., and Hospedales, T. 2015. Sketch-a-net that beats humans. In British Machine Vision Conference (BMVC).Google ScholarGoogle Scholar
  54. Yu, Q., Liu, F., Song, Y., Xiang, T., Hospedales, T., and Loy, C. C. 2016. Sketch me that shoe. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarGoogle Scholar
  55. Zeiler, M. D., and Fergus, R. 2014. Visualizing and understanding convolutional networks. In Computer Vision--ECCV 2014. Springer, 818--833.Google ScholarGoogle Scholar
  56. Zhou, T., Jae Lee, Y., Yu, S. X., and Efros, A. A. 2015. Flowweb: Joint image set alignment by weaving consistent, pixel-wise correspondences. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarGoogle Scholar
  57. Zhu, J.-Y., Lee, Y. J., and Efros, A. A. 2014. Averageexplorer: Interactive exploration and alignment of visual data collections. ACM Transactions on Graphics (SIGGRAPH 2014) 33, 4. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. The sketchy database: learning to retrieve badly drawn bunnies

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Graphics
        ACM Transactions on Graphics  Volume 35, Issue 4
        July 2016
        1396 pages
        ISSN:0730-0301
        EISSN:1557-7368
        DOI:10.1145/2897824
        Issue’s Table of Contents

        Copyright © 2016 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 11 July 2016
        Published in tog Volume 35, Issue 4

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader