Skip to main content
Erschienen in: International Journal of Computer Vision 3/2017

26.07.2016

Sketch-a-Net: A Deep Neural Network that Beats Humans

verfasst von: Qian Yu, Yongxin Yang, Feng Liu, Yi-Zhe Song, Tao Xiang, Timothy M. Hospedales

Erschienen in: International Journal of Computer Vision | Ausgabe 3/2017

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

We propose a deep learning approach to free-hand sketch recognition that achieves state-of-the-art performance, significantly surpassing that of humans. Our superior performance is a result of modelling and exploiting the unique characteristics of free-hand sketches, i.e., consisting of an ordered set of strokes but lacking visual cues such as colour and texture, being highly iconic and abstract, and exhibiting extremely large appearance variations due to different levels of abstraction and deformation. Specifically, our deep neural network, termed Sketch-a-Net has the following novel components: (i) we propose a network architecture designed for sketch rather than natural photo statistics. (ii) Two novel data augmentation strategies are developed which exploit the unique sketch-domain properties to modify and synthesise sketch training data at multiple abstraction levels. Based on this idea we are able to both significantly increase the volume and diversity of sketches for training, and address the challenge of varying levels of sketching detail commonplace in free-hand sketches. (iii) We explore different network ensemble fusion strategies, including a re-purposed joint Bayesian scheme, to further improve recognition performance. We show that state-of-the-art deep networks specifically engineered for photos of natural objects fail to perform well on sketch recognition, regardless whether they are trained using photos or sketches. Furthermore, through visualising the learned filters, we offer useful insights in to where the superior performance of our network comes from.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Fußnoten
1
We set \(k=30\) in this work and the regularisation parameter of JB is set to 1. For robustness at test time, we also take 10 crops and reflections of each train and test image (Krizhevsky et al. 2012). This inflates the KNN train and test pool by 10, and the crop-level matches are combined to image predictions by majority voting.
 
Literatur
Zurück zum Zitat Chatfield, K., Simonyan , K., Vedaldi, A., & Zisserman, A. (2014). Return of the devil in the details: Delving deep into convolutional nets. In BMVC. Chatfield, K., Simonyan , K., Vedaldi, A., & Zisserman, A. (2014). Return of the devil in the details: Delving deep into convolutional nets. In BMVC.
Zurück zum Zitat Chen, D., Cao, X., Wang, L., Wen, F., & Sun, J. (2012). Bayesian face revisited: A joint formulation. In ECCV. Chen, D., Cao, X., Wang, L., Wen, F., & Sun, J. (2012). Bayesian face revisited: A joint formulation. In ECCV.
Zurück zum Zitat Deng, J., Dong, W., Socher, R., Li, L., Li, K., & Fei-Fei, L. (2009). Imagenet: A large-scale hierarchical image database. In CVPR. Deng, J., Dong, W., Socher, R., Li, L., Li, K., & Fei-Fei, L. (2009). Imagenet: A large-scale hierarchical image database. In CVPR.
Zurück zum Zitat Donahue, J., Jia, Y., Vinyals, O., Hoffman, J., Zhang, N., Tzeng, E., & Darrell, T. (2015). Decaf: A deep convolutional activation feature for generic visual recognition. In ICML. Donahue, J., Jia, Y., Vinyals, O., Hoffman, J., Zhang, N., Tzeng, E., & Darrell, T. (2015). Decaf: A deep convolutional activation feature for generic visual recognition. In ICML.
Zurück zum Zitat Eitz, M., Hays, J., & Alexa, M. (2012). How do humans sketch objects? In SIGGRAPH. Eitz, M., Hays, J., & Alexa, M. (2012). How do humans sketch objects? In SIGGRAPH.
Zurück zum Zitat Eitz, M., Hildebrand, K., Boubekeur, T., & Alexa, M. (2011). Sketch-based image retrieval: Benchmark and bag-of-features descriptors. TVCG, 17(11), 1624–1636. Eitz, M., Hildebrand, K., Boubekeur, T., & Alexa, M. (2011). Sketch-based image retrieval: Benchmark and bag-of-features descriptors. TVCG, 17(11), 1624–1636.
Zurück zum Zitat Fukushima, K. (1980). Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biological Cybernetics, 36(4), 193–202.CrossRefMATH Fukushima, K. (1980). Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biological Cybernetics, 36(4), 193–202.CrossRefMATH
Zurück zum Zitat Gabor, D. (1946). Theory of communication. Part 1: The analysis of information. Journal of the Institution of Electrical Engineers, Part III: Radio and Communication Engineering, 93, 429–441. Gabor, D. (1946). Theory of communication. Part 1: The analysis of information. Journal of the Institution of Electrical Engineers, Part III: Radio and Communication Engineering, 93, 429–441.
Zurück zum Zitat Hinton, G. E., Osindero, S., & Teh, Y. W. (2006). A fast learning algorithm for deep belief nets. Neural Computation, 18(7), 1527–1554.MathSciNetCrossRefMATH Hinton, G. E., Osindero, S., & Teh, Y. W. (2006). A fast learning algorithm for deep belief nets. Neural Computation, 18(7), 1527–1554.MathSciNetCrossRefMATH
Zurück zum Zitat Hinton, G. E., Srivastava, N., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. R. (2012). Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:1207.0580. Hinton, G. E., Srivastava, N., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. R. (2012). Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:​1207.​0580.
Zurück zum Zitat Hu, R., & Collomosse, J. (2013). A performance evaluation of gradient field HOG descriptor for sketch based image retrieval. CVIU, 117(7), 790–806. Hu, R., & Collomosse, J. (2013). A performance evaluation of gradient field HOG descriptor for sketch based image retrieval. CVIU, 117(7), 790–806.
Zurück zum Zitat Hubel, D. H., & Wiesel, T. N. (1959). Receptive fields of single neurons in the cat’s striate cortex. Journal of Physiology, 148, 574–591.CrossRef Hubel, D. H., & Wiesel, T. N. (1959). Receptive fields of single neurons in the cat’s striate cortex. Journal of Physiology, 148, 574–591.CrossRef
Zurück zum Zitat Jabal, M. F. A., Rahim, M. S. M., Othman, N. Z. S., & Jupri, Z. (2009). A comparative study on extraction and recognition method of CAD data from CAD drawings. In International conference on information management and engineering. Jabal, M. F. A., Rahim, M. S. M., Othman, N. Z. S., & Jupri, Z. (2009). A comparative study on extraction and recognition method of CAD data from CAD drawings. In International conference on information management and engineering.
Zurück zum Zitat Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., & Darrell, T. (2014). Caffe: Convolutional architecture for fast feature embedding. arXiv preprint arXiv:1408.5093. Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., & Darrell, T. (2014). Caffe: Convolutional architecture for fast feature embedding. arXiv preprint arXiv:​1408.​5093.
Zurück zum Zitat Johnson, G., Gross, M. D., Hong, J., & Do, E. Y.-L. (2009). Computational support for sketching in design: A review. Foundations and Trends in Human–Computer Interaction, 2, 1–93.CrossRef Johnson, G., Gross, M. D., Hong, J., & Do, E. Y.-L. (2009). Computational support for sketching in design: A review. Foundations and Trends in Human–Computer Interaction, 2, 1–93.CrossRef
Zurück zum Zitat Klare, B. F., Li, Z., & Jain, A. K. (2011). Matching forensic sketches to mug shot photos. TPAMI, 33(3), 639–646. Klare, B. F., Li, Z., & Jain, A. K. (2011). Matching forensic sketches to mug shot photos. TPAMI, 33(3), 639–646.
Zurück zum Zitat Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In NIPS. Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In NIPS.
Zurück zum Zitat Le Cun, Y., Boser, B., Denker, J. S., Henderson, D., Howard, R. E., Hubbard, W., & Jackel, L. D. (1990). Handwritten digit recognition with a back-propagation network. In NIPS. Le Cun, Y., Boser, B., Denker, J. S., Henderson, D., Howard, R. E., Hubbard, W., & Jackel, L. D. (1990). Handwritten digit recognition with a back-propagation network. In NIPS.
Zurück zum Zitat LeCun, Y., Bottou, L., Orr, G. B., & Müller, K. (1998). Efficient backprop. In G. Orr & K. Müller (Eds.), Neural networks: Tricks of the trade. Springer. LeCun, Y., Bottou, L., Orr, G. B., & Müller, K. (1998). Efficient backprop. In G. Orr & K. Müller (Eds.), Neural networks: Tricks of the trade. Springer.
Zurück zum Zitat Li, Y., Hospedales, T. M., Song, Y., & Gong, S. (2015). Free-hand sketch recognition by multi-kernel feature learning. Springer. CVIU, 137, 1–11. Li, Y., Hospedales, T. M., Song, Y., & Gong, S. (2015). Free-hand sketch recognition by multi-kernel feature learning. Springer. CVIU, 137, 1–11.
Zurück zum Zitat Li, Y., Song, Y., & Gong, S. (2013). Sketch recognition by ensemble matching of structured features. In BMVC. Li, Y., Song, Y., & Gong, S. (2013). Sketch recognition by ensemble matching of structured features. In BMVC.
Zurück zum Zitat Lu, T., Tai, C., Su, F., & Cai, S. (2005). A new recognition model for electronic architectural drawings. Computer-Aided Design, 37(10), 1053–1069.CrossRef Lu, T., Tai, C., Su, F., & Cai, S. (2005). A new recognition model for electronic architectural drawings. Computer-Aided Design, 37(10), 1053–1069.CrossRef
Zurück zum Zitat Olshausen, B. A., & Field, D. J. (1996). Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature, 381, 607–609.CrossRef Olshausen, B. A., & Field, D. J. (1996). Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature, 381, 607–609.CrossRef
Zurück zum Zitat Ouyang, S., Hospedales ,T., Song, Y., & Li, X. (2014). Cross-modal face matching: Beyond viewed sketches. In ACCV. Ouyang, S., Hospedales ,T., Song, Y., & Li, X. (2014). Cross-modal face matching: Beyond viewed sketches. In ACCV.
Zurück zum Zitat Schaefer, S., McPhail, T., & Warren, J. (2006). Image deformation using moving least squares. TOG, 25(3), 533–540.CrossRef Schaefer, S., McPhail, T., & Warren, J. (2006). Image deformation using moving least squares. TOG, 25(3), 533–540.CrossRef
Zurück zum Zitat Schmidhuber, J. (2015). Deep learning in neural networks: An overview. Neural Networks, 61, 85–117.CrossRef Schmidhuber, J. (2015). Deep learning in neural networks: An overview. Neural Networks, 61, 85–117.CrossRef
Zurück zum Zitat Schneider, R. G., & Tuytelaars, T. (2014). Sketch classification and classification-driven analysis using Fisher vectors. In SIGGRAPH Asia. Schneider, R. G., & Tuytelaars, T. (2014). Sketch classification and classification-driven analysis using Fisher vectors. In SIGGRAPH Asia.
Zurück zum Zitat Simonyan, K., & Zisserman, A. (2015). Very deep convolutional networks for large-scale image recognition. In ICLR. Simonyan, K., & Zisserman, A. (2015). Very deep convolutional networks for large-scale image recognition. In ICLR.
Zurück zum Zitat Sousa, P., & Fonseca, M. J. (2009). Geometric matching for clip-art drawing retrieval. Journal of Visual Communication and Image Representation, 20(12), 71–83.CrossRef Sousa, P., & Fonseca, M. J. (2009). Geometric matching for clip-art drawing retrieval. Journal of Visual Communication and Image Representation, 20(12), 71–83.CrossRef
Zurück zum Zitat Stollenga, M. F., Masci, J., Gomez, F., & Schmidhuber, J. (2014). Deep networks with internal selective attention through feedback connections. In NIPS. Stollenga, M. F., Masci, J., Gomez, F., & Schmidhuber, J. (2014). Deep networks with internal selective attention through feedback connections. In NIPS.
Zurück zum Zitat Wang, F., Kang, L., & Li, Y. (2015). Sketch-based 3D shape retrieval using convolutional neural networks. In CVPR. Wang, F., Kang, L., & Li, Y. (2015). Sketch-based 3D shape retrieval using convolutional neural networks. In CVPR.
Zurück zum Zitat Yanık, E., & Sezgin, T. M. (2015). Active learning for sketch recognition. Computers and Graphics, 52, 93–105.CrossRef Yanık, E., & Sezgin, T. M. (2015). Active learning for sketch recognition. Computers and Graphics, 52, 93–105.CrossRef
Zurück zum Zitat Yin, F., Wang, Q., Zhang, X., & Liu, C. (2013). ICDAR 2013 Chinese handwriting recognition competition. In International conference on document analysis and recognition. Yin, F., Wang, Q., Zhang, X., & Liu, C. (2013). ICDAR 2013 Chinese handwriting recognition competition. In International conference on document analysis and recognition.
Zurück zum Zitat Yu, Q., Yang, Y., Song, Y. Z., Xiang, T., & Hospedales, T. M. (2015). Sketch-a-net that beats humans. In BMVC. Yu, Q., Yang, Y., Song, Y. Z., Xiang, T., & Hospedales, T. M. (2015). Sketch-a-net that beats humans. In BMVC.
Zurück zum Zitat Zeiler, M., & Fergus, R. (2014). Visualizing and understanding convolutional networks. In ECCV. Zeiler, M., & Fergus, R. (2014). Visualizing and understanding convolutional networks. In ECCV.
Zurück zum Zitat Zitnick, C. L., & Dollár, P. (2014). Edge boxes: Locating object proposals from edges. In ECCV. Zitnick, C. L., & Dollár, P. (2014). Edge boxes: Locating object proposals from edges. In ECCV.
Zurück zum Zitat Zitnick, C. L., & Parikh, D. (2013). Bringing semantics into focus using visual abstraction. In CVPR. Zitnick, C. L., & Parikh, D. (2013). Bringing semantics into focus using visual abstraction. In CVPR.
Zurück zum Zitat Zou, C., Huang, Z., Lau, R. W., Liu, J., & Fu, H. (2015). Sketch-based shape retrieval using pyramid-of-parts. arXiv preprint arXiv:1502.04232. Zou, C., Huang, Z., Lau, R. W., Liu, J., & Fu, H. (2015). Sketch-based shape retrieval using pyramid-of-parts. arXiv preprint arXiv:​1502.​04232.
Metadaten
Titel
Sketch-a-Net: A Deep Neural Network that Beats Humans
verfasst von
Qian Yu
Yongxin Yang
Feng Liu
Yi-Zhe Song
Tao Xiang
Timothy M. Hospedales
Publikationsdatum
26.07.2016
Verlag
Springer US
Erschienen in
International Journal of Computer Vision / Ausgabe 3/2017
Print ISSN: 0920-5691
Elektronische ISSN: 1573-1405
DOI
https://doi.org/10.1007/s11263-016-0932-3

Weitere Artikel der Ausgabe 3/2017

International Journal of Computer Vision 3/2017 Zur Ausgabe