nach oben

International Journal of Computer Vision

Erschienen in:

26.07.2016

Sketch-a-Net: A Deep Neural Network that Beats Humans

verfasst von: Qian Yu, Yongxin Yang, Feng Liu, Yi-Zhe Song, Tao Xiang, Timothy M. Hospedales

Erschienen in: International Journal of Computer Vision | Ausgabe 3/2017

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

We propose a deep learning approach to free-hand sketch recognition that achieves state-of-the-art performance, significantly surpassing that of humans. Our superior performance is a result of modelling and exploiting the unique characteristics of free-hand sketches, i.e., consisting of an ordered set of strokes but lacking visual cues such as colour and texture, being highly iconic and abstract, and exhibiting extremely large appearance variations due to different levels of abstraction and deformation. Specifically, our deep neural network, termed Sketch-a-Net has the following novel components: (i) we propose a network architecture designed for sketch rather than natural photo statistics. (ii) Two novel data augmentation strategies are developed which exploit the unique sketch-domain properties to modify and synthesise sketch training data at multiple abstraction levels. Based on this idea we are able to both significantly increase the volume and diversity of sketches for training, and address the challenge of varying levels of sketching detail commonplace in free-hand sketches. (iii) We explore different network ensemble fusion strategies, including a re-purposed joint Bayesian scheme, to further improve recognition performance. We show that state-of-the-art deep networks specifically engineered for photos of natural objects fail to perform well on sketch recognition, regardless whether they are trained using photos or sketches. Furthermore, through visualising the learned filters, we offer useful insights in to where the superior performance of our network comes from.

Vorheriger Artikel Recognition, Tracking, and Optimisation

Nächster Artikel Deep Perceptual Mapping for Cross-Modal Face Recognition

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

We set \(k=30\) in this work and the regularisation parameter of JB is set to 1. For robustness at test time, we also take 10 crops and reflections of each train and test image (Krizhevsky et al. 2012). This inflates the KNN train and test pool by 10, and the crop-level matches are combined to image predictions by majority voting.

Chatfield, K., Simonyan , K., Vedaldi, A., & Zisserman, A. (2014). Return of the devil in the details: Delving deep into convolutional nets. In BMVC.

Chen, D., Cao, X., Wang, L., Wen, F., & Sun, J. (2012). Bayesian face revisited: A joint formulation. In ECCV.

Deng, J., Dong, W., Socher, R., Li, L., Li, K., & Fei-Fei, L. (2009). Imagenet: A large-scale hierarchical image database. In CVPR.

Donahue, J., Jia, Y., Vinyals, O., Hoffman, J., Zhang, N., Tzeng, E., & Darrell, T. (2015). Decaf: A deep convolutional activation feature for generic visual recognition. In ICML.

Eitz, M., Hays, J., & Alexa, M. (2012). How do humans sketch objects? In SIGGRAPH.

Eitz, M., Hildebrand, K., Boubekeur, T., & Alexa, M. (2011). Sketch-based image retrieval: Benchmark and bag-of-features descriptors. TVCG, 17(11), 1624–1636.

Fukushima, K. (1980). Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biological Cybernetics, 36(4), 193–202.CrossRefMATH

Gabor, D. (1946). Theory of communication. Part 1: The analysis of information. Journal of the Institution of Electrical Engineers, Part III: Radio and Communication Engineering, 93, 429–441.

Hinton, G. E., Osindero, S., & Teh, Y. W. (2006). A fast learning algorithm for deep belief nets. Neural Computation, 18(7), 1527–1554.MathSciNetCrossRefMATH

Hinton, G. E., Srivastava, N., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. R. (2012). Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:1207.0580.

Hu, R., & Collomosse, J. (2013). A performance evaluation of gradient field HOG descriptor for sketch based image retrieval. CVIU, 117(7), 790–806.

Hubel, D. H., & Wiesel, T. N. (1959). Receptive fields of single neurons in the cat’s striate cortex. Journal of Physiology, 148, 574–591.CrossRef

Jabal, M. F. A., Rahim, M. S. M., Othman, N. Z. S., & Jupri, Z. (2009). A comparative study on extraction and recognition method of CAD data from CAD drawings. In International conference on information management and engineering.

Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., & Darrell, T. (2014). Caffe: Convolutional architecture for fast feature embedding. arXiv preprint arXiv:1408.5093.

Johnson, G., Gross, M. D., Hong, J., & Do, E. Y.-L. (2009). Computational support for sketching in design: A review. Foundations and Trends in Human–Computer Interaction, 2, 1–93.CrossRef

Klare, B. F., Li, Z., & Jain, A. K. (2011). Matching forensic sketches to mug shot photos. TPAMI, 33(3), 639–646.

Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In NIPS.

Le Cun, Y., Boser, B., Denker, J. S., Henderson, D., Howard, R. E., Hubbard, W., & Jackel, L. D. (1990). Handwritten digit recognition with a back-propagation network. In NIPS.

LeCun, Y., Bottou, L., Orr, G. B., & Müller, K. (1998). Efficient backprop. In G. Orr & K. Müller (Eds.), Neural networks: Tricks of the trade. Springer.

Li, Y., Hospedales, T. M., Song, Y., & Gong, S. (2015). Free-hand sketch recognition by multi-kernel feature learning. Springer. CVIU, 137, 1–11.

Li, Y., Song, Y., & Gong, S. (2013). Sketch recognition by ensemble matching of structured features. In BMVC.

Lu, T., Tai, C., Su, F., & Cai, S. (2005). A new recognition model for electronic architectural drawings. Computer-Aided Design, 37(10), 1053–1069.CrossRef

Olshausen, B. A., & Field, D. J. (1996). Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature, 381, 607–609.CrossRef

Ouyang, S., Hospedales ,T., Song, Y., & Li, X. (2014). Cross-modal face matching: Beyond viewed sketches. In ACCV.

Schaefer, S., McPhail, T., & Warren, J. (2006). Image deformation using moving least squares. TOG, 25(3), 533–540.CrossRef

Schmidhuber, J. (2015). Deep learning in neural networks: An overview. Neural Networks, 61, 85–117.CrossRef

Schneider, R. G., & Tuytelaars, T. (2014). Sketch classification and classification-driven analysis using Fisher vectors. In SIGGRAPH Asia.

Simonyan, K., & Zisserman, A. (2015). Very deep convolutional networks for large-scale image recognition. In ICLR.

Sousa, P., & Fonseca, M. J. (2009). Geometric matching for clip-art drawing retrieval. Journal of Visual Communication and Image Representation, 20(12), 71–83.CrossRef

Stollenga, M. F., Masci, J., Gomez, F., & Schmidhuber, J. (2014). Deep networks with internal selective attention through feedback connections. In NIPS.

Wang, F., Kang, L., & Li, Y. (2015). Sketch-based 3D shape retrieval using convolutional neural networks. In CVPR.

Yanık, E., & Sezgin, T. M. (2015). Active learning for sketch recognition. Computers and Graphics, 52, 93–105.CrossRef

Yin, F., Wang, Q., Zhang, X., & Liu, C. (2013). ICDAR 2013 Chinese handwriting recognition competition. In International conference on document analysis and recognition.

Yu, Q., Yang, Y., Song, Y. Z., Xiang, T., & Hospedales, T. M. (2015). Sketch-a-net that beats humans. In BMVC.

Zeiler, M., & Fergus, R. (2014). Visualizing and understanding convolutional networks. In ECCV.

Zitnick, C. L., & Dollár, P. (2014). Edge boxes: Locating object proposals from edges. In ECCV.

Zitnick, C. L., & Parikh, D. (2013). Bringing semantics into focus using visual abstraction. In CVPR.

Zou, C., Huang, Z., Lau, R. W., Liu, J., & Fu, H. (2015). Sketch-based shape retrieval using pyramid-of-parts. arXiv preprint arXiv:1502.04232.

Titel: Sketch-a-Net: A Deep Neural Network that Beats Humans
verfasst von: Qian Yu
Yongxin Yang
Feng Liu
Yi-Zhe Song
Tao Xiang
Timothy M. Hospedales
Publikationsdatum: 26.07.2016
Verlag: Springer US
Erschienen in: International Journal of Computer Vision / Ausgabe 3/2017
Print ISSN: 0920-5691
Elektronische ISSN: 1573-1405
DOI: https://doi.org/10.1007/s11263-016-0932-3

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Weitere Artikel der Ausgabe 3/2017

Recognition, Tracking, and Optimisation

Global Minimum for a Finsler Elastica Minimal Path Approach

Latent Structure Preserving Hashing

Spatiotemporal Deformable Prototypes for Motion Anomaly Detection

Learning Optimal Parameters for Multi-target Tracking with Contextual Interactions

Automated Visual Fin Identification of Individual Great White Sharks