Elsevier

Neurocomputing

Volume 165, 1 October 2015, Pages 338-349
Neurocomputing

Im2Sketch: Sketch generation by unconflicted perceptual grouping

https://doi.org/10.1016/j.neucom.2015.03.023Get rights and content

Abstract

Effectively solving the problem of sketch generation, which aims to produce human-drawing-like sketches from real photographs, opens the door for many vision applications such as sketch-based image retrieval and non-photorealistic rendering. In this paper, we approach automatic sketch generation from a human visual perception perspective. Instead of gathering insights from photographs, for the first time, we extract information from a large pool of human sketches. In particular, we study how multiple Gestalt rules can be encapsulated into a unified perceptual grouping framework for sketch generation. We further show that by solving the problem of Gestalt confliction, i.e., encoding the relative importance of each rule, more similar to human-made sketches can be generated. For that, we release a manually labeled sketch dataset of 96 object categories and 7680 sketches. A novel evaluation framework is proposed to quantify human likeness of machine-generated sketches by examining how well they can be classified using models trained from human data. Finally, we demonstrate the superiority of our sketches under the practical application of sketch-based image retrieval.

Introduction

There exists plenty of prior work on sketch in computer vision, from sketch recognition [1], [2] to sketch-based image retrieval (SBIR) [3], [4]. Recently, sketch research has gained much momentum due to the proliferation of touch sensitive devices. Nonetheless, how to automatically produce sketches using machines as humans do is still an open problem [1], [5], [6]. Solving this problem importantly opens the door for many vision applications, especially for SBIR since better sketch conversion essentially closes the domain gap between query sketches and gallery photographs.

This paper sets out to tackle the problem of sketch generation via learning from established theories found in human visual cognition studies. Human visual system is very powerful so that we can easily find sense from chaos. In Neuroscience, one of the most critical problems is understanding how human brains perceive visual objects. In particular, perceptual grouping, a concept introduced by the Gestalt school of psychologists [7], advocates that human perceives certain elements of the visual world as going together more strongly than the others. Max Wertheimer, a pioneer in the Gestalt school, pointed out the significance of perceptual grouping and further listed a series of rules, such as proximity, similarity and continuation [8]. His work has consequently triggered plenty of research specifically aimed at understanding human visual systems [9], [10].

We treat sketch generation as a perceptual grouping and filtering task. Essentially, the underlying hypothesis is that perceptual grouping is able to find sense from chaos therefore leaving only signals corresponding to sketches of human resemblance. More specifically, our choice of utilizing an elementary grouping process for sketch generation is justified as follows: (i) sketches exhibit a lack of visual cues (black and white drawing versus textured image regions), making conventional vision algorithms sub-optimal, (ii) sketch despite abstract (e.g., a stickman human figure) is very iconic – it is often structural variations of strokes that capture object appearance, last but not least, (iii) sketches are the simplest form of depictions of human visual impressions that can be rendered by hand, therefore act as ideal basis for applying/testing theories found in human visual cognition [3].

Traditionally, applying perceptual grouping in vision applications incurs two critical design considerations: (i) how to combine multiple Gestalt rules into a single globally optimized framework, (ii) how to encode the relative importance of each rule. For the former problem, although unary Gestalt principle has been proven to be useful for contour grouping when used alone [11], [12], [13], very few work [14] attempt to investigate how they can be exploited jointly in a single framework. The latter problem, often referred as Gestalt confliction, remains unaddressed to date [7], [15]. Despite being the subject of investigation in the fields of psychology, little is known about how Gestalt confliction work in human vision systems [16], thus shedding little light on how to design a computer vision system.

In this paper, we first propose a unified grouping framework that is able to work with multiple Gestalt principles simultaneously. We then show how Gestalt confliction can be accounted for by learning from a dataset of pre-segmented human sketches. In particular, a multi-label graph-cuts [17], [18], [19] perceptual grouping framework is developed to group stroke segments while utilizing the learned importance of different Gestalt principles. It follows that, upon generating sketch from photograph, the same learned perceptual grouping framework can be used to form groups of image boundary segments, which are further filtered to produce human-drawing-like sketches. More specifically, a learning to rank strategy based on RankSVM [20] is proposed to learn the relative importance between two Gestalt principles, namely proximity and continuity. We learn from a subset of a large scale human-drawn sketch dataset [1], where each sketch is pre-segmented into semantic groups. The entire process of the proposed sketch generation framework is shown in Fig. 1.

To evaluate the quality of automatically generated sketches, we present a novel approach that recognizes them by sketch classifiers trained from human data. Prior works [5], [21] evaluate sketching performance by comparing computer generated sketches with tracings produced by humans. This evaluation strategy importantly does not account for likeness to human-made sketches because (i) sketches are abstract depictions that are fundamentally different from tracing of image boundaries, (ii) humans often sketch without reference to real photographs of objects, (iii) sketches exhibit much more intra-class variability, due to different levels of drawing skills and individual visual impressions. By measuring how well a human sketch classifier [1] recognizes machine-generated sketches, we essentially examine how closely they resemble human-made ones. Our results show that the sketches generated using our method outperform a number of state-of-the-arts alternatives.

To further demonstrate the quality of our sketches, we demonstrate its effectiveness for SBIR. Experimental results confirm a positive performance boost on the largest SBIR dataset to date [4], when compared with state-of-the-art alternatives. It importantly shows that our sketch generation algorithm is able to bridge the domain gap between sketches and natural images. As can be seen from Fig. 2, the proposed sketch converter yields cleaner sketch that matches better to the query when encoded using common descriptor such as Histogram of Oriented Gradients (HOG).

The contributions of this paper can be summarized as follows:

(1) We apply perceptual grouping as means for automated sketch generation and propose a learning to rank strategy to learn the relative importance among two Gestalt principles.

(2) A novel evaluation strategy is devised to quantitatively evaluate human likeness of sketches.

(3) We demonstrate the effectiveness of sketch generation in SBIR and show a performance boost when compared with state-of-the-art alternatives.

(4) A new dataset containing 96 object categories and 7680 sketches is released, where each sketch is segmented into semantic parts by human.

Section snippets

Perceptual grouping

Perceptual grouping is one particular kind of organization phenomenon. Historically, grouping is stated as the fact that observers perceive some elements of visual field as “going together” more strongly than others [7]. Wertheimer first laid out the problem of perceptual grouping [8] by asking what stimulus factors influence the perceived grouping of discrete elements. Several work have been triggered since to investigate the problem of perceptual grouping. To date, many Gestalt principles,

A multi-label graph-cuts model for grouping

In this section, a multi-label graph-cuts [18] based model for perceptual grouping is described which conjoins different Gestalt principles. In particular, two Gestalt principles, continuity and proximity, are utilized, however the framework can be extended to work with more principles. Specifically, we formulate the problem of grouping as a min-cut/max-flow optimization problem. We later show in Section 4 how Gestalt confliction can be learned and easily embedded into this framework to deal

Learning Gestalt confliction

Given the grouping framework detailed above, in this section, we aim to introduce a human-annotated sketch dataset containing 7680 sketches in 96 categories, and show how these annotations can be utilized to learn Gestalt confliction using a learning to rank strategy. We further demonstrate how the learned confliction information can be embedded into the general grouping framework laid out in the previous section.

Sketch generation

In this section, we introduce how sketches can be generated from real images using the proposed perceptual grouping framework. There are primarily three stages (Fig. 1) for automatic sketch generation: (i) extracting boundary map to produce curve segments as grouping primitives, (ii) grouping boundary by the proposed grouper while accounting for Gestalt confliction, (iii) filtering away redundancy by coarseness analysis of boundary groups. Details of each stage are presented as follows.

Experiments and analysis

We first conduct an experiment to evaluate the effectiveness of our proposed grouping framework, especially with and without the learned Gestalt confliction. Then we demonstrate how human-drawing likeness of machine-generated sketch can be measured using a novel sketch recognition experiments.

Sketch-based image retrieval

In this section, we present a novel application of sketch-based image retrieval (SBIR) which aims to retrieve natural images by a human drawn sketch query. It is a challenging task because images contain the same objects, but come from different domains (i.e., sketch and real image) produce distinct representations of objects. Therefore, we deal with this problem by converting real images into sketch-like images, which makes sketch-based image retrieval possible (see Fig. 2). More specifically,

Conclusion and future work

In this paper, we presented a novel approach for automatic sketch generation from a single natural image. We casted sketch extraction into a perceptual contour grouping and filtering problem, and by exploiting two commonly used perceptual grouping principles, i.e., continuity and proximity, we were able to develop an effective automated sketching algorithm to simulate how human draw objects. Furthermore, we were able to show that grouping performance could be improved by investigating the

Acknowledgments

This work was partially supported by National Natural Science Foundation of China under Grant nos. 61273217, 61175011, 61171193, 61402047, and the 111 project under Grant no. B08004.

Yonggang Qi is currently a Ph.D. candidate at School of Information and Communication Engineering, Beijing University of Posts and Telecommunications. His research interest is computer vision, particularly focus on perceptual grouping and object recognition. He was a visiting Ph.D. student at Department of Electronic Systems at Aalborg University, Aalborg, Denmark, in 2013.

References (45)

  • A. Amir et al.

    A generic grouping algorithm and its quantitative analysis, IEEE Trans

    Pattern Anal. Mach. Intell

    (1998)
  • X. Ren, J. Malik, Learning a classification model for segmentation., in: ICCV...
  • J.S. Stahl et al.

    Edge grouping combining boundary and region information

    IEEE Transactions on Image Processing

    (2007)
  • G. Papari et al.

    Adaptive pseudo dilation for gestalt edge grouping and contour detection

    IEEE Transactions on Image Processing

    (2008)
  • N. Adluru, L. J. Latecki, R. Lak¨amper, T. Young, X. Bai, A. D. Gross, Contour grouping based on local symmetry., in:...
  • Y. Song et al.

    In search of perceptually salient groupings

    IEEE Transactions on Image Processing

    (2011)
  • J. Wagemans et al.

    A century of Gestalt psychology in visual perception: II

    Conceptual and theoretical foundations., Psychological bulletin

    (2012)
  • M. Kubovy et al.

    The whole is equal to the sum of its parts: A probabilistic model of grouping by proximity and similarity in regular patterns

    Psychological Review

    (2008)
  • Y. Boykov et al.

    An experimental comparison of mincut/max-flow algorithms for energy minimization in vision

    IEEE Trans. Pattern Anal. Mach. Intell

    (2004)
  • Y. Boykov et al.

    Fast approximate energy minimization via graph cuts

    IEEE Trans. Pattern Anal. Mach. Intell

    (2001)
  • A. Delong et al.

    Fast approximate energy minimization with label costs

    International Journal of Computer Vision

    (2012)
  • O. Chapelle et al.

    Efficient algorithms for ranking with svms

    Inf. Retr.

    (2010)
  • Cited by (31)

    • A deformable CNN-based triplet model for fine-grained sketch-based image retrieval

      2022, Pattern Recognition
      Citation Excerpt :

      We perform the validation experiments on the same nine category of images both in Sketchy and Flickr15k. State-of-the-art baselines include Edge box [51], Canny, Perceptual grouping [52], saliency based on a random walk model [53], HED [54], RCF [55], CycleGAN [56], DualGAN [48], Deep shape [57], StyleGAN [58], StyleGAN2 [59], Atten-Dual-GAN [60]. The nine overlapping image categories include horse, hot-air balloon, flower, butterfly, bicycle, airplane, sailboat, swan and duck.

    • Cali-sketch: Stroke calibration and completion for high-quality face image generation from human-like sketches

      2021, Neurocomputing
      Citation Excerpt :

      Our method can help complete necessary object information critical for a reliable search performance. Compared with image inpainting [28,65] or image-to-sketch synthesis [49,54,39], generating photo-realistic image from human-like sketch is more challenging since there is less information in sketches. Thus, we temporarily experiment on frontal faces without large rotation and translation.

    • A relic sketch extraction framework based on detail-aware hierarchical deep network

      2021, Signal Processing
      Citation Excerpt :

      Hussein et al. [24] proposed a hybrid optimization model combining particle swarm optimization with a local search algorithm for sketches. Qi et al. [25] proposed a learning ranking strategy that uses perceptual grouping to automatically generate sketches. These methods fuse prior knowledge into the sketch extraction and are robust to scattered lines and complex noise.

    • Better freehand sketch synthesis for sketch-based image retrieval: Beyond image edges

      2018, Neurocomputing
      Citation Excerpt :

      To ensure generality and fairness to all algorithms, we perform the validation experiments on the same nine categories of images both in the Sketchy database and the Flickr15k [10] dataset. For more details, the baselines we used for validation include the methods of Canny [41], perceptual grouping [42], saliency based on a random model [43], HED [21], RCF [22], DualGAN [28], and the CycleGAN [34]. Images of the nine classes include airplane, horse, flower, butterfly, sailboat, swan, bicycle, hot-air balloon and duck.

    • Auto-painter: Cartoon image generation from sketch by using conditional Wasserstein generative adversarial networks

      2018, Neurocomputing
      Citation Excerpt :

      Since it is not easy to find paired sketch-image cartoons for supervised training, we need to extract the sketches from the cartoon images as our training set. In the past, researchers pay attention to synthesis sketches of the images [36–38] for the practical application of sketch-based image retrieval. We employ an extended different-of-Gaussians operator among them (XDoG [38]) to extract sketches from colored cartoons.

    • Cross-modal subspace learning for fine-grained sketch-based image retrieval

      2018, Neurocomputing
      Citation Excerpt :

      The vibrancy of the SBIR area also promoted the development of other related research problems, such as sketch recognition [11,12], sketch synthesis [13–15], sketch-based 3D retrieval [16], and sketch segmentation [17]. From a technical perspective, SBIR is traditionally cast into a classification task, with most prior work evaluating the retrieval performance at category-level [12,18–21]. More recently, fine-grained variants of SBIR [1,7] requires retrieval to be conducted within single object categories.

    View all citing articles on Scopus

    Yonggang Qi is currently a Ph.D. candidate at School of Information and Communication Engineering, Beijing University of Posts and Telecommunications. His research interest is computer vision, particularly focus on perceptual grouping and object recognition. He was a visiting Ph.D. student at Department of Electronic Systems at Aalborg University, Aalborg, Denmark, in 2013.

    Jun Guo received his Ph.D. from Tohuku-Gakuin University, in 1993. He is currently the Vice-President of BUPT, a Distinguished Professor at Beijing University of Posts and Telecommunications, and the Dean of the School of Information and Communication Engineering. He is mainly engaged in the research of pattern recognition, web searching, and network management. He has more than 200 publications at International top journals and conferences, including SCIENCE, IEEE Trans. on PAMI, IEICE Trans, ICPR, ICCV, SIGIR. He received numerous international and national awards, including 3 IEEE International Awards, the second prize of Beijing scientific and technological progress, the second prize of the Ministry of Posts and Telecommunications scientific and technological progress.

    Yi-Zhe Song is a Lecturer (Assistant Professor) at School of Electronic Engineering and Computer Science, Queen Mary, University of London. He researches into computer vision, computer graphics and their convergence, particularly perceptual grouping, image segmentation (description), cross-domain image analysis, non-photorealistic rendering, with a recent emphasis on human sketch representation, recognition and retrieval. He received both the B.Sc. (first class) and Ph.D. degrees in Computer Science from the Department of Computer Science, University of Bath, UK, in 2003 and 2008, respectively; prior to his doctoral studies, he obtained a Diploma (M.Sc.) degree in Computer Science from the Computer Laboratory, University of Cambridge, UK, in 2004. Prior to 2011, he worked at University of Bath as a Research and Teaching Fellow. He is an Associate Editor of Neurocomputing and member of IEEE and BMVA.

    Tao Xiang received the Ph.D. degree in electrical and computer engineering from the National University of Singapore, in 2002. He is currently a Reader (Associate Professor) in the School of Electronic Engineering and Computer Science, Queen Mary University of London. His research interests include computer vision, machine learning, and data mining. He has published over 100 papers in international journals and conferences and co-authored a book, Visual Analysis of Behaviour: From Pixels to Semantics.

    Honggang Zhang received the B.S. degree from the Department of Electrical Engineering, Shandong University, in 1996, the Master and Ph.D. degrees from the School of Information Engineering, Beijing University of Posts and Telecommunications (BUPT), in 1999 and 2003, respectively. He worked as a visiting scholar in School of Computer Science, Carnegie Mellon University (CMU) from 2007 to 2008. He is currently an Associate Professor and Director of web search center at BUPT. His research interests include image retrieval, computer vision and pattern recognition. He published more than 30 papers on TPAMI, SCIENCE, Machine Vision and Applications, AAAI, ICPR, ICIP. He is a Senior Member of IEEE.

    Zheng-Hua Tan received the B.Sc. and M.Sc. degrees in Electrical Engineering from Hunan University, Changsha, China, in 1990 and 1996, respectively, and the Ph.D. degree in electronic engineering from Shanghai Jiao Tong University, Shanghai, China, in 1999. He is an Associate Professor in the Department of Electronic Systems at Aalborg University, Aalborg, Denmark, which he joined in May 2001. He was a Visiting Scientist at the Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, USA, an Associate Professor in the Department of Electronic Engineering at Shanghai Jiao Tong University, and a Postdoctoral Fellow in the Department of Computer Science at Korea Advanced Institute of Science and Technology, Daejeon, Korea. His research interests include speech and speaker recognition, noise-robust speech processing, multimedia signal and information processing, human–robot interaction, and machine learning. He has published extensively in these areas in refereed journals and conference proceedings. He has served as an Editorial Board Member/Associate Editor for Elsevier Computer Speech and Language, Elsevier Digital Signal Processing and Elsevier Computers and Electrical Engineering. He was a Lead Guest Editor for the IEEE Journal of Selected Topics in Signal Processing. He has served/serves as a Program Co-chair, Area and Session Chair, Tutorial Speaker and Committee Member in many major international conferences.

    View full text