Skip to main content
Erschienen in: International Journal of Computer Vision 2/2021

30.09.2020

Fine-Grained Instance-Level Sketch-Based Image Retrieval

verfasst von: Qian Yu, Jifei Song, Yi-Zhe Song, Tao Xiang, Timothy M. Hospedales

Erschienen in: International Journal of Computer Vision | Ausgabe 2/2021

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The problem of fine-grained sketch-based image retrieval (FG-SBIR) is defined and investigated in this paper. In FG-SBIR, free-hand human sketch images are used as queries to retrieve photo images containing the same object instances. It is thus a cross-domain (sketch to photo) instance-level retrieval task. It is an extremely challenging problem because (i) visual comparisons and matching need to be executed under large domain gap, i.e., from black and white line drawing sketches to colour photos; (ii) it requires to capture the fine-grained (dis)similarities of sketches and photo images while free-hand sketches drawn by different people present different levels of deformation and expressive interpretation; and (iii) annotated cross-domain fine-grained SBIR datasets are scarce, challenging many state-of-the-art machine learning techniques, particularly those based on deep learning. In this paper, for the first time, we address all these challenges, providing a step towards the capabilities that would underpin a commercial sketch-based object instance retrieval application. Specifically, a new large-scale FG-SBIR database is introduced which is carefully designed to reflect the real-world application scenarios. A deep cross-domain matching model is then formulated to solve the intrinsic drawing style variability, large domain gap issues, and capture instance-level discriminative features. It distinguishes itself by a carefully designed attention module. Extensive experiments on the new dataset demonstrate the effectiveness of the proposed model and validate the need for a rigorous definition of the FG-SBIR problem and collecting suitable datasets.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Fußnoten
1
Free-hand sketch in this work refers to sketches drawn by amateurs based on their mental recollection. Specifically, we assume that before a human draws a sketch, (s)he has seen a reference object instance, but does not have the object or a photo at hand while drawing.
 
2
Here ‘CFF’ refers to the operation of combining the feature map extracted from an earlier layer with the final layer output. This is different with the meaning in the preliminary version (Song et al. 2017) where it indicates both feature fusion and residual attention module.
 
Literatur
Zurück zum Zitat Bui, T., Ribeiro, L., Ponti, M., & Collomosse, J. (2016). Generalisation and sharing in triplet convnets for sketch based visual search. arXiv preprint arXiv:1611.05301. Bui, T., Ribeiro, L., Ponti, M., & Collomosse, J. (2016). Generalisation and sharing in triplet convnets for sketch based visual search. arXiv preprint arXiv:​1611.​05301.
Zurück zum Zitat Bui, T., Ribeiro, L., Ponti, M., & Collomosse, J. (2018). Sketching out the details: sketch-based image retrieval using convolutional neural networks with multi-stage regression. Computers & Graphics, 71, 77–87.CrossRef Bui, T., Ribeiro, L., Ponti, M., & Collomosse, J. (2018). Sketching out the details: sketch-based image retrieval using convolutional neural networks with multi-stage regression. Computers & Graphics, 71, 77–87.CrossRef
Zurück zum Zitat Cao, Y., Wang, H., Wang, C., Li, Z., Zhang, L., & Zhang, L. (2010). Mindfinder: interactive sketch-based image search on millions of images. In International conference on multimedia. Cao, Y., Wang, H., Wang, C., Li, Z., Zhang, L., & Zhang, L. (2010). Mindfinder: interactive sketch-based image search on millions of images. In International conference on multimedia.
Zurück zum Zitat Cao, Y., Wang, C., Zhang, L., & Zhang, L. (2011) Edgel index for large-scale sketch-based image search. In CVPR. Cao, Y., Wang, C., Zhang, L., & Zhang, L. (2011) Edgel index for large-scale sketch-based image search. In CVPR.
Zurück zum Zitat Chen, T., Cheng, M. M., Tan, P., Shamir, A., & Hu, S. M. (2009). Sketch2photo: internet image montage. ACM Transactions on Graphics (TOG), 28, 1–10. Chen, T., Cheng, M. M., Tan, P., Shamir, A., & Hu, S. M. (2009). Sketch2photo: internet image montage. ACM Transactions on Graphics (TOG), 28, 1–10.
Zurück zum Zitat Chopra, S., Hadsell, R., & LeCun, Y. (2005). Learning a similarity metric discriminatively, with application to face verification. In IEEE computer society conference on computer vision and pattern recognitio. Chopra, S., Hadsell, R., & LeCun, Y. (2005). Learning a similarity metric discriminatively, with application to face verification. In IEEE computer society conference on computer vision and pattern recognitio.
Zurück zum Zitat Collomosse, J., Bui, T., Wilber, M. J., Fang, C., & Jin, H. (2017). Sketching with style: visual search with sketches and aesthetic context. In Proceedings of the IEEE international conference on computer vision. Collomosse, J., Bui, T., Wilber, M. J., Fang, C., & Jin, H. (2017). Sketching with style: visual search with sketches and aesthetic context. In Proceedings of the IEEE international conference on computer vision.
Zurück zum Zitat Deng, J., Dong, W., Socher, R., Li, L. J., Li, K., & Fei-Fei, L. (2009) ImageNet: a large-scale hierarchical image database. In CVPR. Deng, J., Dong, W., Socher, R., Li, L. J., Li, K., & Fei-Fei, L. (2009) ImageNet: a large-scale hierarchical image database. In CVPR.
Zurück zum Zitat Eitz, M., Hildebrand, K., Boubekeur, T., & Alexa, M. (2010). An evaluation of descriptors for large-scale image retrieval from sketched feature lines. Computers & Graphics, 34(5), 482–498.CrossRef Eitz, M., Hildebrand, K., Boubekeur, T., & Alexa, M. (2010). An evaluation of descriptors for large-scale image retrieval from sketched feature lines. Computers & Graphics, 34(5), 482–498.CrossRef
Zurück zum Zitat Eitz, M., Hildebrand, K., Boubekeur, T., & Alexa, M. (2011). Sketch-based image retrieval: benchmark and bag-of-features descriptors. IEEE Transactions on Visualization and Computer Graphics, 17(11), 1624–1636.CrossRef Eitz, M., Hildebrand, K., Boubekeur, T., & Alexa, M. (2011). Sketch-based image retrieval: benchmark and bag-of-features descriptors. IEEE Transactions on Visualization and Computer Graphics, 17(11), 1624–1636.CrossRef
Zurück zum Zitat Eitz, M., Hays, J., & Alexa, M. (2012). How do humans sketch objects? ACM Transactions on Graphics (TOG), 31, 1–10. Eitz, M., Hays, J., & Alexa, M. (2012). How do humans sketch objects? ACM Transactions on Graphics (TOG), 31, 1–10.
Zurück zum Zitat Everingham, M., Van Gool, L., Williams, C. K., Winn, J., & Zisserman, A. (2010). The pascal visual object classes (voc) challenge. International Journal of Computer Vision, 88(2), 303–338.CrossRef Everingham, M., Van Gool, L., Williams, C. K., Winn, J., & Zisserman, A. (2010). The pascal visual object classes (voc) challenge. International Journal of Computer Vision, 88(2), 303–338.CrossRef
Zurück zum Zitat Fukui, A., Park, D. H., Yang, D., Rohrbach, A., Darrell, T., & Rohrbach, M. (2016). Multimodal compact bilinear pooling for visual question answering and visual grounding. In EMNLP. Fukui, A., Park, D. H., Yang, D., Rohrbach, A., Darrell, T., & Rohrbach, M. (2016). Multimodal compact bilinear pooling for visual question answering and visual grounding. In EMNLP.
Zurück zum Zitat Gatys, L. A., Ecker, A. S., & Bethge, M. (2015). Texture synthesis and the controlled generation of natural stimuli using convolutional neural networks. CoRR, arXiv:1505.07376. Gatys, L. A., Ecker, A. S., & Bethge, M. (2015). Texture synthesis and the controlled generation of natural stimuli using convolutional neural networks. CoRR, arXiv:​1505.​07376.
Zurück zum Zitat Gong, Y., Wang, L., Guo, R., & Lazebnik, S. (2014). Multi-scale orderless pooling of deep convolutional activation features. In European conference on computer vision. Gong, Y., Wang, L., Guo, R., & Lazebnik, S. (2014). Multi-scale orderless pooling of deep convolutional activation features. In European conference on computer vision.
Zurück zum Zitat Gordo, A., Almazan, J., Revaud, J., & Larlus, D. (2017). End-to-end learning of deep visual representations for image retrieval. International Journal of Computer Vision, 124(2), 237–254.MathSciNetCrossRef Gordo, A., Almazan, J., Revaud, J., & Larlus, D. (2017). End-to-end learning of deep visual representations for image retrieval. International Journal of Computer Vision, 124(2), 237–254.MathSciNetCrossRef
Zurück zum Zitat Gygli, M., Grabner, H., Riemenschneider, H., Nater, F., & Van Gool, L. (2013). The interestingness of images. In IEEE international conference on computer vision. Gygli, M., Grabner, H., Riemenschneider, H., Nater, F., & Van Gool, L. (2013). The interestingness of images. In IEEE international conference on computer vision.
Zurück zum Zitat He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In CVPR. He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In CVPR.
Zurück zum Zitat Hu, R., & Collomosse, J. (2013). A performance evaluation of gradient field HOG descriptor for sketch based image retrieval. Computer Vision and Image Understanding, 117(7), 790–806.CrossRef Hu, R., & Collomosse, J. (2013). A performance evaluation of gradient field HOG descriptor for sketch based image retrieval. Computer Vision and Image Understanding, 117(7), 790–806.CrossRef
Zurück zum Zitat Hu, R., Barnard, M., & Collomosse, J. (2010). Gradient field descriptor for sketch based retrieval and localization. In IEEE international conference on image processing. Hu, R., Barnard, M., & Collomosse, J. (2010). Gradient field descriptor for sketch based retrieval and localization. In IEEE international conference on image processing.
Zurück zum Zitat Hu, R., Wang, T., & Collomosse, J. (2011). A bag-of-regions approach to sketch based image retrieval. In IEEE international conference on image processing. Hu, R., Wang, T., & Collomosse, J. (2011). A bag-of-regions approach to sketch based image retrieval. In IEEE international conference on image processing.
Zurück zum Zitat Jaderberg, M., Simonyan, K., Zisserman, A., et al. (2015). Spatial transformer networks. In Advances in neural information processing systems. Jaderberg, M., Simonyan, K., Zisserman, A., et al. (2015). Spatial transformer networks. In Advances in neural information processing systems.
Zurück zum Zitat James, S., Fonseca, M., & Collomosse, J. (2014). Reenact: Sketch based choreographic design from archival dance footage. In Proceedings of international conference on multimedia retrieval. James, S., Fonseca, M., & Collomosse, J. (2014). Reenact: Sketch based choreographic design from archival dance footage. In Proceedings of international conference on multimedia retrieval.
Zurück zum Zitat Jiang, Y. G., Wang, Y., Feng, R., Xue, X., Zheng, Y., & Yang, H. (2013). Understanding and predicting interestingness of videos. In AAAI. Jiang, Y. G., Wang, Y., Feng, R., Xue, X., Zheng, Y., & Yang, H. (2013). Understanding and predicting interestingness of videos. In AAAI.
Zurück zum Zitat Johnson, J., Krishna, R., Stark, M., Li, L. J., Shamma, D., Bernstein, M., & Fei-Fei, L. (2015). Image retrieval using scene graphs. In CVPR. Johnson, J., Krishna, R., Stark, M., Li, L. J., Shamma, D., Bernstein, M., & Fei-Fei, L. (2015). Image retrieval using scene graphs. In CVPR.
Zurück zum Zitat Krizhevsky, A., & Hinton, G. E. (2011). Using very deep autoencoders for content-based image retrieval. In European symposium on artificial neural networks. Krizhevsky, A., & Hinton, G. E. (2011). Using very deep autoencoders for content-based image retrieval. In European symposium on artificial neural networks.
Zurück zum Zitat Krizhevsky, A., Sutskever, I., Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems. Krizhevsky, A., Sutskever, I., Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems.
Zurück zum Zitat Landay, J. A., & Myers, B. A. (2001). Sketching interfaces: toward more human interface design. IEEE Computer, 34(3), 56–64.CrossRef Landay, J. A., & Myers, B. A. (2001). Sketching interfaces: toward more human interface design. IEEE Computer, 34(3), 56–64.CrossRef
Zurück zum Zitat Li, Y., Hospedales, T., Song, Y. Z., & Gong, S. (2014). Fine-grained sketch-based image retrieval by matching deformable part models. In BMVC. Li, Y., Hospedales, T., Song, Y. Z., & Gong, S. (2014). Fine-grained sketch-based image retrieval by matching deformable part models. In BMVC.
Zurück zum Zitat Li, Y., Hospedales, T. M., Song, Y. Z., & Gong, S. (2015). Free-hand sketch recognition by multi-kernel feature learning. Computer Vision and Image Understanding, 137, 1–11.CrossRef Li, Y., Hospedales, T. M., Song, Y. Z., & Gong, S. (2015). Free-hand sketch recognition by multi-kernel feature learning. Computer Vision and Image Understanding, 137, 1–11.CrossRef
Zurück zum Zitat Li, K., Pang, K., Song, Y. Z., Hospedales, T. M., Xiang, T., & Zhang, H. (2017). Synergistic instance-level subspace alignment for fine-grained sketch-based image retrieval. IEEE Transactions on Image Processing, 26(12), 5908–5921.MathSciNetCrossRef Li, K., Pang, K., Song, Y. Z., Hospedales, T. M., Xiang, T., & Zhang, H. (2017). Synergistic instance-level subspace alignment for fine-grained sketch-based image retrieval. IEEE Transactions on Image Processing, 26(12), 5908–5921.MathSciNetCrossRef
Zurück zum Zitat Lin, Y., Huang, C., Wan, C., & Hsu, W. (2013) 3D sub-query expansion for improving sketch-based multi-view image retrieval. In Proceedings of the IEEE international conference on computer vision. Lin, Y., Huang, C., Wan, C., & Hsu, W. (2013) 3D sub-query expansion for improving sketch-based multi-view image retrieval. In Proceedings of the IEEE international conference on computer vision.
Zurück zum Zitat Lin, T. Y., RoyChowdhury, A., & Maji, S. (2015). Bilinear cnn models for fine-grained visual recognition. In Proceedings of the IEEE international conference on computer vision (pp. 1449–1457). Lin, T. Y., RoyChowdhury, A., & Maji, S. (2015). Bilinear cnn models for fine-grained visual recognition. In Proceedings of the IEEE international conference on computer vision (pp. 1449–1457).
Zurück zum Zitat Liu, L., Shen, F., Shen, Y., Liu, X., & Shao, L. (2017a). Deep sketch hashing: fast free-hand sketch-based image retrieval. arXiv preprint arXiv:1703.05605. Liu, L., Shen, F., Shen, Y., Liu, X., & Shao, L. (2017a). Deep sketch hashing: fast free-hand sketch-based image retrieval. arXiv preprint arXiv:​1703.​05605.
Zurück zum Zitat Liu, Y., Guo, Y., Lew, M. S. (2017b). On the exploration of convolutional fusion networks for visual recognition. In International conference on multimedia modeling. Liu, Y., Guo, Y., Lew, M. S. (2017b). On the exploration of convolutional fusion networks for visual recognition. In International conference on multimedia modeling.
Zurück zum Zitat Lu, J., Xiong, C., Parikh, D., & Socher, R. (2016). Knowing when to look: adaptive attention via a visual sentinel for image captioning. arXiv preprint arXiv:1612.01887. Lu, J., Xiong, C., Parikh, D., & Socher, R. (2016). Knowing when to look: adaptive attention via a visual sentinel for image captioning. arXiv preprint arXiv:​1612.​01887.
Zurück zum Zitat Mahendran, A., & Vedaldi, A. (2015) Understanding deep image representations by inverting them. In IEEE conference on computer vision and pattern recognition. Mahendran, A., & Vedaldi, A. (2015) Understanding deep image representations by inverting them. In IEEE conference on computer vision and pattern recognition.
Zurück zum Zitat Marr, D. (1982). Vision. New York: W. H. Freeman and Company. Marr, D. (1982). Vision. New York: W. H. Freeman and Company.
Zurück zum Zitat Mnih, V., Heess, N., Graves, A., et al. (2014). Recurrent models of visual attention. In Advances in neural information processing systems. Mnih, V., Heess, N., Graves, A., et al. (2014). Recurrent models of visual attention. In Advances in neural information processing systems.
Zurück zum Zitat Moulin, C., Largeron, C., Ducottet, C., Géry, M., & Barat, C. (2014). Fisher linear discriminant analysis for text-image combination in multimedia information retrieval. Pattern Recognition, 47(1), 260–269.CrossRef Moulin, C., Largeron, C., Ducottet, C., Géry, M., & Barat, C. (2014). Fisher linear discriminant analysis for text-image combination in multimedia information retrieval. Pattern Recognition, 47(1), 260–269.CrossRef
Zurück zum Zitat Nam, H., Ha, J. W., & Kim, J. (2016). Dual attention networks for multimodal reasoning and matching. arXiv preprint arXiv:1611.00471. Nam, H., Ha, J. W., & Kim, J. (2016). Dual attention networks for multimodal reasoning and matching. arXiv preprint arXiv:​1611.​00471.
Zurück zum Zitat Newell, A., Yang, K., Deng, J. (2016). Stacked hourglass networks for human pose estimation. In European conference on computer vision. Newell, A., Yang, K., Deng, J. (2016). Stacked hourglass networks for human pose estimation. In European conference on computer vision.
Zurück zum Zitat Noh, H., Araujo, A., Sim, J., Weyand, T., & Han, B. (2017). Large-scale image retrieval with attentive deep local features. In IEEE international conference on computer vision. Noh, H., Araujo, A., Sim, J., Weyand, T., & Han, B. (2017). Large-scale image retrieval with attentive deep local features. In IEEE international conference on computer vision.
Zurück zum Zitat Philbin, J., Chum, O., Isard, M., Sivic, J., & Zisserman, A. (2007). Object retrieval with large vocabularies and fast spatial matching. In IEEE conference on computer vision and pattern recognition. Philbin, J., Chum, O., Isard, M., Sivic, J., & Zisserman, A. (2007). Object retrieval with large vocabularies and fast spatial matching. In IEEE conference on computer vision and pattern recognition.
Zurück zum Zitat Philbin, J., Chum, O., Isard, M., Sivic, J., & Zisserman, A. (2008). Lost in quantization: improving particular object retrieval in large scale image databases. In IEEE conference on computer vision and pattern recognition. Philbin, J., Chum, O., Isard, M., Sivic, J., & Zisserman, A. (2008). Lost in quantization: improving particular object retrieval in large scale image databases. In IEEE conference on computer vision and pattern recognition.
Zurück zum Zitat Prosser, B. J., Zheng, W. S., Gong, S., Xiang, T., & Mary, Q. (2010). Person re-identification by support vector ranking. In British machine vision conference. Prosser, B. J., Zheng, W. S., Gong, S., Xiang, T., & Mary, Q. (2010). Person re-identification by support vector ranking. In British machine vision conference.
Zurück zum Zitat Radenovic, F., Tolias, G., & Chum, O. (2018). Deep shape matching. In Proceedings of the European conference on computer vision. Radenovic, F., Tolias, G., & Chum, O. (2018). Deep shape matching. In Proceedings of the European conference on computer vision.
Zurück zum Zitat Radenović, F., Tolias, G., & Chum, O. (2018). Fine-tuning cnn image retrieval with no human annotation. TPAMI, 41(7), 1655–1668.CrossRef Radenović, F., Tolias, G., & Chum, O. (2018). Fine-tuning cnn image retrieval with no human annotation. TPAMI, 41(7), 1655–1668.CrossRef
Zurück zum Zitat Ren, X. (2008). Multi-scale improves boundary detection in natural images. In Proceedings of the European conference on computer vision. Ren, X. (2008). Multi-scale improves boundary detection in natural images. In Proceedings of the European conference on computer vision.
Zurück zum Zitat Sangkloy, P., Burnell, N., Ham, C., & Hays, J. (2016). The sketchy database: learning to retrieve badly drawn bunnies. ACM Transactions on Graphics (TOG), 35, 1–12.CrossRef Sangkloy, P., Burnell, N., Ham, C., & Hays, J. (2016). The sketchy database: learning to retrieve badly drawn bunnies. ACM Transactions on Graphics (TOG), 35, 1–12.CrossRef
Zurück zum Zitat Song, J., Yu, Q., Song, Y. Z., Xiang, T., & Hospedales, T. M. (2017). Deep spatial-semantic attention for fine-grained sketch-based image retrieval. In Proceedings of the IEEE international conference on computer vision. Song, J., Yu, Q., Song, Y. Z., Xiang, T., & Hospedales, T. M. (2017). Deep spatial-semantic attention for fine-grained sketch-based image retrieval. In Proceedings of the IEEE international conference on computer vision.
Zurück zum Zitat Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., et al. (2015). Going deeper with convolutions. IEEE conference on computer vision and pattern recognition. arXiv:1409.4842. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., et al. (2015). Going deeper with convolutions. IEEE conference on computer vision and pattern recognition. arXiv:​1409.​4842.
Zurück zum Zitat Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., & Wojna, Z. (2016a). Rethinking the inception architecture for computer vision. In IEEE conference on computer vision and pattern recognition. Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., & Wojna, Z. (2016a). Rethinking the inception architecture for computer vision. In IEEE conference on computer vision and pattern recognition.
Zurück zum Zitat Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., & Wojna, Z. (2016b). Rethinking the inception architecture for computer vision. In IEEE conference on computer vision and pattern recognition. Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., & Wojna, Z. (2016b). Rethinking the inception architecture for computer vision. In IEEE conference on computer vision and pattern recognition.
Zurück zum Zitat Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017). Attention is all you need. In NeuIPS. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017). Attention is all you need. In NeuIPS.
Zurück zum Zitat Wang, X., & Tang, X. (2009). Face photo-sketch synthesis and recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(11), 1955–1967.MathSciNetCrossRef Wang, X., & Tang, X. (2009). Face photo-sketch synthesis and recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(11), 1955–1967.MathSciNetCrossRef
Zurück zum Zitat Wang, C., Li, Z., & Zhang, L. (2010). Mindfinder: image search by interactive sketching and tagging. In Proceedings of the 19th international conference on world wide web. Wang, C., Li, Z., & Zhang, L. (2010). Mindfinder: image search by interactive sketching and tagging. In Proceedings of the 19th international conference on world wide web.
Zurück zum Zitat Wang, F., Kang, L., & Li, Y. (2015). Sketch-based 3D shape retrieval using convolutional neural networks. In IEEE conference on computer vision and pattern recognition. Wang, F., Kang, L., & Li, Y. (2015). Sketch-based 3D shape retrieval using convolutional neural networks. In IEEE conference on computer vision and pattern recognition.
Zurück zum Zitat Xiao, T., Xu, Y., Yang, K., Zhang, J., Peng, Y., & Zhang, Z. (2015). The application of two-level attention models in deep convolutional neural network for fine-grained image classification. In IEEE conference on computer vision and pattern recognition. Xiao, T., Xu, Y., Yang, K., Zhang, J., Peng, Y., & Zhang, Z. (2015). The application of two-level attention models in deep convolutional neural network for fine-grained image classification. In IEEE conference on computer vision and pattern recognition.
Zurück zum Zitat Xie, S., & Tu, Z. (2015). Holistically-nested edge detection. In IEEE international conference on computer vision. Xie, S., & Tu, Z. (2015). Holistically-nested edge detection. In IEEE international conference on computer vision.
Zurück zum Zitat Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A. C., Salakhutdinov, R., Zemel, R. S., & Bengio, Y. (2015). Show, attend and tell: neural image caption generation with visual attention. In International conference on machine learning. Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A. C., Salakhutdinov, R., Zemel, R. S., & Bengio, Y. (2015). Show, attend and tell: neural image caption generation with visual attention. In International conference on machine learning.
Zurück zum Zitat Yang, S., & Ramanan, D. (2015). Multi-scale recognition with DAG-CNNS. In IEEE international conference on computer vision. Yang, S., & Ramanan, D. (2015). Multi-scale recognition with DAG-CNNS. In IEEE international conference on computer vision.
Zurück zum Zitat Yu, A., & Grauman, K. (2014). Fine-grained visual comparisons with local learning. In IEEE conference on computer vision and pattern recognition. Yu, A., & Grauman, K. (2014). Fine-grained visual comparisons with local learning. In IEEE conference on computer vision and pattern recognition.
Zurück zum Zitat Yu, Q., Yang, Y., Song, Y., Xiang, T., & Hospedales, T. (2015). Sketch-a-net that beats humans. In BMVC. Yu, Q., Yang, Y., Song, Y., Xiang, T., & Hospedales, T. (2015). Sketch-a-net that beats humans. In BMVC.
Zurück zum Zitat Yu, Q., Liu, F., Song, Y. Z., Xiang, T., Hospedales, T. M., & Loy, C. C. (2016). Sketch me that shoe. In IEEE conference on computer vision and pattern recognition. Yu, Q., Liu, F., Song, Y. Z., Xiang, T., Hospedales, T. M., & Loy, C. C. (2016). Sketch me that shoe. In IEEE conference on computer vision and pattern recognition.
Zurück zum Zitat Yu, Q., Yang, Y., Liu, F., Song, Y. Z., Xiang, T., & Hospedales, T. M. (2017). Sketch-a-net: a deep neural network that beats humans. International Journal of Computer Vision, 122(3), 411–425.MathSciNetCrossRef Yu, Q., Yang, Y., Liu, F., Song, Y. Z., Xiang, T., & Hospedales, T. M. (2017). Sketch-a-net: a deep neural network that beats humans. International Journal of Computer Vision, 122(3), 411–425.MathSciNetCrossRef
Zurück zum Zitat Zhang, J., Shen, F., Liu, L., Zhu, F., Yu, M., Shao, L., Tao Shen, H., & Van Gool, L. (2018). Generative domain-migration hashing for sketch-to-image retrieval. In Proceedings of the European conference on computer vision (ECCV). Zhang, J., Shen, F., Liu, L., Zhu, F., Yu, M., Shao, L., Tao Shen, H., & Van Gool, L. (2018). Generative domain-migration hashing for sketch-to-image retrieval. In Proceedings of the European conference on computer vision (ECCV).
Zurück zum Zitat Zhu, J. Y., Lee, Y. J., & Efros, A. A. (2014). Averageexplorer: interactive exploration and alignment of visual data collections. ACM Transactions on Graphics (TOG), 33, 1–11. Zhu, J. Y., Lee, Y. J., & Efros, A. A. (2014). Averageexplorer: interactive exploration and alignment of visual data collections. ACM Transactions on Graphics (TOG), 33, 1–11.
Zurück zum Zitat Zitnick, C. L., & Dollár, P. (2014). Edge boxes: locating object proposals from edges. In Proceedings of the European conference on computer vision. Zitnick, C. L., & Dollár, P. (2014). Edge boxes: locating object proposals from edges. In Proceedings of the European conference on computer vision.
Metadaten
Titel
Fine-Grained Instance-Level Sketch-Based Image Retrieval
verfasst von
Qian Yu
Jifei Song
Yi-Zhe Song
Tao Xiang
Timothy M. Hospedales
Publikationsdatum
30.09.2020
Verlag
Springer US
Erschienen in
International Journal of Computer Vision / Ausgabe 2/2021
Print ISSN: 0920-5691
Elektronische ISSN: 1573-1405
DOI
https://doi.org/10.1007/s11263-020-01382-3

Weitere Artikel der Ausgabe 2/2021

International Journal of Computer Vision 2/2021 Zur Ausgabe