Abstract
We present the Sketchy database, the first large-scale collection of sketch-photo pairs. We ask crowd workers to sketch particular photographic objects sampled from 125 categories and acquire 75,471 sketches of 12,500 objects. The Sketchy database gives us fine-grained associations between particular photos and sketches, and we use this to train cross-domain convolutional networks which embed sketches and photographs in a common feature space. We use our database as a benchmark for fine-grained retrieval and show that our learned representation significantly outperforms both hand-crafted features as well as deep features trained for sketch or photo classification. Beyond image retrieval, we believe the Sketchy database opens up new opportunities for sketch and image understanding and synthesis.
Supplemental Material
Available for Download
Supplemental files.
- Antol, S., Zitnick, C. L., and Parikh, D. 2014. Zero-Shot Learning via Visual Abstraction. In ECCV.Google Scholar
- Bansal, A., Kowdle, A., Parikh, D., Gallagher, A., and Zitnick, L. 2013. Which edges matter? In Computer Vision Workshops (ICCVW), 2013 IEEE International Conference on, 578--585. Google ScholarDigital Library
- Bell, S., and Bala, K. 2015. Learning visual similarity for product design with convolutional neural networks. ACM Trans. Graph. 34, 4 (July). Google ScholarDigital Library
- Berger, I., Shamir, A., Mahler, M., Carter, E., and Hodgins, J. 2013. Style and abstraction in portrait sketching. ACM Trans. Graph. 32, 4 (July), 55:1--55:12. Google ScholarDigital Library
- Brady, T. F., Konkle, T., Alvarez, G. A., and Oliva, A. 2008. Visual long-term memory has a massive storage capacity for object details. Proceedings of the National Academy of Sciences 105, 38, 14325--14329.Google ScholarCross Ref
- Brady, T. F., Konkle, T., Gill, J., Oliva, A., and Alvarez, G. A. 2013. Visual long-term memory has the same limit on fidelity as visual working memory. Psychological Science 24, 6.Google ScholarCross Ref
- Cao, Y., Wang, C., Zhang, L., and Zhang, L. 2011. Edgel index for large-scale sketch-based image search. In Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on, IEEE, 761--768. Google ScholarDigital Library
- Cao, X., Zhang, H., Liu, S., Guo, X., and Lin, L. 2013. Sym-fish: A symmetry-aware flip invariant sketch histogram shape descriptor. In Computer Vision (ICCV), 2013 IEEE International Conference on, 313--320. Google ScholarDigital Library
- Chen, T., ming Cheng, M., Tan, P., Shamir, A., and min Hu, S. 2009. Sketch2photo: internet image montage. ACM SIGGRAPH Asia. Google ScholarDigital Library
- Chen, T., Tan, P., Ma, L.-Q., Cheng, M.-M., Shamir, A., and Hu, S.-M. 2013. Poseshop: Human image database construction and personalized content synthesis. IEEE Transactions on Visualization and Computer Graphics 19, 5 (May), 824--837. Google ScholarDigital Library
- Chopra, S., Hadsell, R., and LeCun, Y. 2005. Learning a similarity metric discriminatively, with application to face verification. In Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on, vol. 1, 539--546. Google ScholarDigital Library
- Cole, F., Golovinskiy, A., Limpaecher, A., Barros, H. S., Finkelstein, A., Funkhouser, T., and Rusinkiewicz, S. 2008. Where do people draw lines? ACM Transactions on Graphics (Proc. SIGGRAPH) 27, 3 (Aug.). Google ScholarDigital Library
- Del Bimbo, A., and Pala, P. 1997. Visual image retrieval by elastic matching of user sketches. Pattern Analysis and Machine Intelligence, IEEE Transactions on 19, 2 (Feb), 121--132. Google ScholarDigital Library
- Dosovitskiy, A., Springenberg, J. T., and Brox, T. 2014. Learning to generate chairs with convolutional neural networks. CoRR abs/1411.5928.Google Scholar
- Eitz, M., Hildebrand, K., Boubekeur, T., and Alexa, M. 2010. An evaluation of descriptors for large-scale image retrieval from sketched feature lines. Computers & Graphics 34, 5, 482--498. Google ScholarDigital Library
- Eitz, M., Hildebrand, K., Boubekeur, T., and Alexa, M. 2011. Sketch-based image retrieval: Benchmark and bag-of-features descriptors. IEEE Transactions on Visualization and Computer Graphics 17, 11, 1624--1636. Google ScholarDigital Library
- Eitz, M., Richter, R., Hildebrand, K., Boubekeur, T., and Alexa, M. 2011. Photosketcher: interactive sketch-based image synthesis. IEEE Computer Graphics and Applications. Google ScholarDigital Library
- Eitz, M., Hays, J., and Alexa, M. 2012. How do humans sketch objects? ACM Trans. Graph. (Proc. SIGGRAPH) 31, 4, 44:1--44:10. Google ScholarDigital Library
- Eitz, M., Richter, R., Boubekeur, T., Hildebrand, K., and Alexa, M. 2012. Sketch-based shape retrieval. ACM Transactions on Graphics (Proceedings SIGGRAPH) 31, 4, 31:1--31:10. Google ScholarDigital Library
- Everingham, M., Gool, L., Williams, C. K., Winn, J., and Zisserman, A. 2010. The pascal visual object classes (voc) challenge. Int. J. Comput. Vision 88, 2 (June), 303--338. Google ScholarDigital Library
- Felzenszwalb, P. F., Girshick, R. B., McAllester, D., and Ramanan, D. 2010. Object detection with discriminatively trained part-based models. IEEE Trans. Pattern Anal. Mach. Intell. 32, 9 (Sept.), 1627--1645. Google ScholarDigital Library
- Grill-Spector, K., and Kanwisher, N. 2005. Visual recognition: as soon as you see it, you know what it is. Psychological Science 16, 2, 152--160.Google ScholarCross Ref
- Hadsell, R., Chopra, S., and LeCun, Y. 2006. Dimensionality reduction by learning an invariant mapping. In Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on, vol. 2, 1735--1742. Google ScholarDigital Library
- Han, X., Leung, T., Jia, Y., Sukthankar, R., and Berg, A. 2015. Matchnet: Unifying feature and metric learning for patch-based matching. In Computer Vision and Pattern Recognition (CVPR), 2015 IEEE Conference on, 3279--3286.Google Scholar
- Hu, R., and Collomosse, J. 2013. A performance evaluation of gradient field hog descriptor for sketch based image retrieval. Computer Vision and Image Understanding 117, 7, 790--806. Google ScholarDigital Library
- Jacobs, C. E., Finkelstein, A., and Salesin, D. H. 1995. Fast multiresolution image querying. In Proceedings of the 22Nd Annual Conference on Computer Graphics and Interactive Techniques, ACM, SIGGRAPH '95, 277--286. Google ScholarDigital Library
- Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., and Darrell, T. 2014. Caffe: Convolutional architecture for fast feature embedding. arXiv preprint arXiv:1408.5093.Google Scholar
- Jun, X., Aaron, H., Wilmot, L., and Holger, W. 2014. Portraitsketch: Face sketching assistance for novices. In Proceedings of the 27th Annual ACM Symposium on User Interface Software and Technology, ACM. Google ScholarDigital Library
- Kato, T., Kurita, T., Otsu, N., and Hirata, K. 1992. A sketch retrieval method for full color image database-query by visual example. In Pattern Recognition, 1992. Vol. I. Conference A: Computer Vision and Applications, Proceedings., 11th IAPR International Conference on, 530--533.Google Scholar
- Krizhevsky, A., Sutskever, I., and Hinton, G. E. 2012. Imagenet classification with deep convolutional neural networks. In 26th Annual Conference on Neural Information Processing Systems (NIPS), 1106--1114.Google Scholar
- Lee, D., and Chun, M. M. What are the units of visual short-term memory, objects or spatial locations? Perception & Psychophysics 63, 2, 253--257.Google Scholar
- Li, Y., Hospedales, T. M., Song, Y.-Z., and Gong, S. 2014. Fine-grained sketch-based image retrieval by matching deformable part models. In British Machine Vision Conference (BMVC).Google Scholar
- Li, Y., Su, H., Qi, C. R., Fish, N., Cohen-Or, D., and Guibas, L. J. 2015. Joint embeddings of shapes and images via cnn image purification. ACM Trans. Graph. 34, 6 (Oct.), 234:1--234:12. Google ScholarDigital Library
- Limpaecher, A., Feltman, N., Treuille, A., and Cohen, M. 2013. Real-time drawing assistance through crowdsourcing. ACM Trans. Graph. 32, 4 (July), 54:1--54:8. Google ScholarDigital Library
- Lin, T., Maire, M., Belongie, S. J., Bourdev, L. D., Girshick, R. B., Hays, J., Perona, P., Ramanan, D., Dollár, P., and Zitnick, C. L. 2014. Microsoft COCO: common objects in context. CoRR abs/1405.0312.Google Scholar
- Lin, T.-Y., Cui, Y., Belongie, S., and Hays, J. 2015. Learning deep representations for ground-to-aerial geolocalization. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google Scholar
- Mainelli, T., Chau, M., Reith, R., and Shirer, M., 2015. Idc worldwide quarterly smart connected device tracker. http://www.idc.com/getdoc.jsp?containerId=prUS25500515, March 20, 2015.Google Scholar
- Martin, D., Fowlkes, C., Tal, D., and Malik, J. 2001. A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In Proc. 8th Int'l Conf. Computer Vision, vol. 2, 416--423.Google ScholarCross Ref
- Nieuwenstein, M., and Wyble, B. 2014. Beyond a mask and against the bottleneck: Retroactive dual-task interference during working memory consolidation of a masked visual target. Journal of Experimental Psychology: General 143, 1409--1427.Google ScholarCross Ref
- Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A. C., and Fei-Fei, L. 2015. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision (IJCV) 115, 3, 211--252. Google ScholarDigital Library
- Saavedra, J. M., and Barrios, J. M. 2015. Sketch based image retrieval using learned keyshapes (lks). In Proceedings of the British Machine Vision Conference (BMVC), 164.1--164.11.Google Scholar
- Schneider, R. G., and Tuytelaars, T. 2014. Sketch classification and classification-driven analysis using fisher vectors. ACM Trans. Graph. 33, 6 (Nov.), 174:1--174:9. Google ScholarDigital Library
- Sclaroff, S. 1997. Deformable prototypes for encoding shape categories in image databases. Pattern Recognition 30, 4, 627--641.Google ScholarCross Ref
- Shrivastava, A., Malisiewicz, T., Gupta, A., and Efros, A. A. 2011. Data-driven visual similarity for cross-domain image matching. In ACM Transactions on Graphics (TOG), vol. 30, ACM, 154. Google ScholarDigital Library
- Smeulders, A., Worring, M., Santini, S., Gupta, A., and Jain, R. 2000. Content-based image retrieval at the end of the early years. Pattern Analysis and Machine Intelligence, IEEE Transactions on 22, 12 (Dec), 1349--1380. Google ScholarDigital Library
- Su, H., Maji, S., Kalogerakis, E., and Learned-Miller, E. G. 2015. Multi-view convolutional neural networks for 3d shape recognition. In Proc. ICCV. Google ScholarDigital Library
- Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., and Rabinovich, A. 2014. Going deeper with convolutions. arXiv preprint arXiv:1409.4842.Google Scholar
- Taigman, Y., Yang, M., Ranzato, M., and Wolf, L. 2014. Deepface: Closing the gap to human-level performance in face verification. In Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on, 1701--1708. Google ScholarDigital Library
- van der Maaten, L., and Hinton, G. 2008. Visualizing high-dimensional data using t-sne. Journal of Machine Learning Research 9, 3 (Nov.), 2579--2605.Google Scholar
- Wang, J., Song, Y., Leung, T., Rosenberg, C., Wang, J., Philbin, J., Chen, B., and Wu, Y. 2014. Learning fine-grained image similarity with deep ranking. CoRR abs/1404.4661. Google ScholarDigital Library
- Wang, F., Kang, L., and Li, Y. 2015. Sketch-based 3d shape retrieval using convolutional neural networks. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google Scholar
- Xiao, J., Ehinger, K. A., Hays, J., Torralba, A., and Oliva, A. 2014. Sun database: Exploring a large collection of scene categories. International Journal of Computer Vision, 1--20. Google ScholarDigital Library
- Yu, Q., Yang, Y., Song, Y.-Z., Xiang, T., and Hospedales, T. 2015. Sketch-a-net that beats humans. In British Machine Vision Conference (BMVC).Google Scholar
- Yu, Q., Liu, F., Song, Y., Xiang, T., Hospedales, T., and Loy, C. C. 2016. Sketch me that shoe. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google Scholar
- Zeiler, M. D., and Fergus, R. 2014. Visualizing and understanding convolutional networks. In Computer Vision--ECCV 2014. Springer, 818--833.Google Scholar
- Zhou, T., Jae Lee, Y., Yu, S. X., and Efros, A. A. 2015. Flowweb: Joint image set alignment by weaving consistent, pixel-wise correspondences. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google Scholar
- Zhu, J.-Y., Lee, Y. J., and Efros, A. A. 2014. Averageexplorer: Interactive exploration and alignment of visual data collections. ACM Transactions on Graphics (SIGGRAPH 2014) 33, 4. Google ScholarDigital Library
Index Terms
- The sketchy database: learning to retrieve badly drawn bunnies
Recommendations
DeepSketch 3
Freehand sketches are a simple and powerful tool for communication. They are easily recognized across cultures and suitable for various applications. In this paper, we use deep convolutional neural networks (ConvNets), state-of-the-art in the field of ...
Multi-granularity Association Learning for On-the-fly Fine-grained Sketch-based Image Retrieval
AbstractFine-grained sketch-based image retrieval (FG-SBIR) addresses the problem of retrieving a specific photo from a given query sketch. However, its widespread applicability is limited because it is difficult for most people to draw a ...
Sketch-based Image Retrieval using Generative Adversarial Networks
MM '17: Proceedings of the 25th ACM international conference on MultimediaFor sketch-based image retrieval (SBIR), we propose a generative adversarial network trained on a large number of sketches and their corresponding real images. To imitate human search process, we attempt to match candidate images with theimaginary image ...
Comments