Im2Sketch: Sketch generation by unconflicted perceptual grouping
Introduction
There exists plenty of prior work on sketch in computer vision, from sketch recognition [1], [2] to sketch-based image retrieval (SBIR) [3], [4]. Recently, sketch research has gained much momentum due to the proliferation of touch sensitive devices. Nonetheless, how to automatically produce sketches using machines as humans do is still an open problem [1], [5], [6]. Solving this problem importantly opens the door for many vision applications, especially for SBIR since better sketch conversion essentially closes the domain gap between query sketches and gallery photographs.
This paper sets out to tackle the problem of sketch generation via learning from established theories found in human visual cognition studies. Human visual system is very powerful so that we can easily find sense from chaos. In Neuroscience, one of the most critical problems is understanding how human brains perceive visual objects. In particular, perceptual grouping, a concept introduced by the Gestalt school of psychologists [7], advocates that human perceives certain elements of the visual world as going together more strongly than the others. Max Wertheimer, a pioneer in the Gestalt school, pointed out the significance of perceptual grouping and further listed a series of rules, such as proximity, similarity and continuation [8]. His work has consequently triggered plenty of research specifically aimed at understanding human visual systems [9], [10].
We treat sketch generation as a perceptual grouping and filtering task. Essentially, the underlying hypothesis is that perceptual grouping is able to find sense from chaos therefore leaving only signals corresponding to sketches of human resemblance. More specifically, our choice of utilizing an elementary grouping process for sketch generation is justified as follows: (i) sketches exhibit a lack of visual cues (black and white drawing versus textured image regions), making conventional vision algorithms sub-optimal, (ii) sketch despite abstract (e.g., a stickman human figure) is very iconic – it is often structural variations of strokes that capture object appearance, last but not least, (iii) sketches are the simplest form of depictions of human visual impressions that can be rendered by hand, therefore act as ideal basis for applying/testing theories found in human visual cognition [3].
Traditionally, applying perceptual grouping in vision applications incurs two critical design considerations: (i) how to combine multiple Gestalt rules into a single globally optimized framework, (ii) how to encode the relative importance of each rule. For the former problem, although unary Gestalt principle has been proven to be useful for contour grouping when used alone [11], [12], [13], very few work [14] attempt to investigate how they can be exploited jointly in a single framework. The latter problem, often referred as Gestalt confliction, remains unaddressed to date [7], [15]. Despite being the subject of investigation in the fields of psychology, little is known about how Gestalt confliction work in human vision systems [16], thus shedding little light on how to design a computer vision system.
In this paper, we first propose a unified grouping framework that is able to work with multiple Gestalt principles simultaneously. We then show how Gestalt confliction can be accounted for by learning from a dataset of pre-segmented human sketches. In particular, a multi-label graph-cuts [17], [18], [19] perceptual grouping framework is developed to group stroke segments while utilizing the learned importance of different Gestalt principles. It follows that, upon generating sketch from photograph, the same learned perceptual grouping framework can be used to form groups of image boundary segments, which are further filtered to produce human-drawing-like sketches. More specifically, a learning to rank strategy based on RankSVM [20] is proposed to learn the relative importance between two Gestalt principles, namely proximity and continuity. We learn from a subset of a large scale human-drawn sketch dataset [1], where each sketch is pre-segmented into semantic groups. The entire process of the proposed sketch generation framework is shown in Fig. 1.
To evaluate the quality of automatically generated sketches, we present a novel approach that recognizes them by sketch classifiers trained from human data. Prior works [5], [21] evaluate sketching performance by comparing computer generated sketches with tracings produced by humans. This evaluation strategy importantly does not account for likeness to human-made sketches because (i) sketches are abstract depictions that are fundamentally different from tracing of image boundaries, (ii) humans often sketch without reference to real photographs of objects, (iii) sketches exhibit much more intra-class variability, due to different levels of drawing skills and individual visual impressions. By measuring how well a human sketch classifier [1] recognizes machine-generated sketches, we essentially examine how closely they resemble human-made ones. Our results show that the sketches generated using our method outperform a number of state-of-the-arts alternatives.
To further demonstrate the quality of our sketches, we demonstrate its effectiveness for SBIR. Experimental results confirm a positive performance boost on the largest SBIR dataset to date [4], when compared with state-of-the-art alternatives. It importantly shows that our sketch generation algorithm is able to bridge the domain gap between sketches and natural images. As can be seen from Fig. 2, the proposed sketch converter yields cleaner sketch that matches better to the query when encoded using common descriptor such as Histogram of Oriented Gradients (HOG).
The contributions of this paper can be summarized as follows:
(1) We apply perceptual grouping as means for automated sketch generation and propose a learning to rank strategy to learn the relative importance among two Gestalt principles.
(2) A novel evaluation strategy is devised to quantitatively evaluate human likeness of sketches.
(3) We demonstrate the effectiveness of sketch generation in SBIR and show a performance boost when compared with state-of-the-art alternatives.
(4) A new dataset containing 96 object categories and 7680 sketches is released, where each sketch is segmented into semantic parts by human.
Section snippets
Perceptual grouping
Perceptual grouping is one particular kind of organization phenomenon. Historically, grouping is stated as the fact that observers perceive some elements of visual field as “going together” more strongly than others [7]. Wertheimer first laid out the problem of perceptual grouping [8] by asking what stimulus factors influence the perceived grouping of discrete elements. Several work have been triggered since to investigate the problem of perceptual grouping. To date, many Gestalt principles,
A multi-label graph-cuts model for grouping
In this section, a multi-label graph-cuts [18] based model for perceptual grouping is described which conjoins different Gestalt principles. In particular, two Gestalt principles, continuity and proximity, are utilized, however the framework can be extended to work with more principles. Specifically, we formulate the problem of grouping as a min-cut/max-flow optimization problem. We later show in Section 4 how Gestalt confliction can be learned and easily embedded into this framework to deal
Learning Gestalt confliction
Given the grouping framework detailed above, in this section, we aim to introduce a human-annotated sketch dataset containing 7680 sketches in 96 categories, and show how these annotations can be utilized to learn Gestalt confliction using a learning to rank strategy. We further demonstrate how the learned confliction information can be embedded into the general grouping framework laid out in the previous section.
Sketch generation
In this section, we introduce how sketches can be generated from real images using the proposed perceptual grouping framework. There are primarily three stages (Fig. 1) for automatic sketch generation: (i) extracting boundary map to produce curve segments as grouping primitives, (ii) grouping boundary by the proposed grouper while accounting for Gestalt confliction, (iii) filtering away redundancy by coarseness analysis of boundary groups. Details of each stage are presented as follows.
Experiments and analysis
We first conduct an experiment to evaluate the effectiveness of our proposed grouping framework, especially with and without the learned Gestalt confliction. Then we demonstrate how human-drawing likeness of machine-generated sketch can be measured using a novel sketch recognition experiments.
Sketch-based image retrieval
In this section, we present a novel application of sketch-based image retrieval (SBIR) which aims to retrieve natural images by a human drawn sketch query. It is a challenging task because images contain the same objects, but come from different domains (i.e., sketch and real image) produce distinct representations of objects. Therefore, we deal with this problem by converting real images into sketch-like images, which makes sketch-based image retrieval possible (see Fig. 2). More specifically,
Conclusion and future work
In this paper, we presented a novel approach for automatic sketch generation from a single natural image. We casted sketch extraction into a perceptual contour grouping and filtering problem, and by exploiting two commonly used perceptual grouping principles, i.e., continuity and proximity, we were able to develop an effective automated sketching algorithm to simulate how human draw objects. Furthermore, we were able to show that grouping performance could be improved by investigating the
Acknowledgments
This work was partially supported by National Natural Science Foundation of China under Grant nos. 61273217, 61175011, 61171193, 61402047, and the 111 project under Grant no. B08004.
Yonggang Qi is currently a Ph.D. candidate at School of Information and Communication Engineering, Beijing University of Posts and Telecommunications. His research interest is computer vision, particularly focus on perceptual grouping and object recognition. He was a visiting Ph.D. student at Department of Electronic Systems at Aalborg University, Aalborg, Denmark, in 2013.
References (45)
- et al.
A performance evaluation of gradient field HOG descriptor for sketch based image retrieval
Computer Vision and Image Understanding
(2013) - et al.
Primal sketch: Integrating structure and texture
Computer Vision and Image Understanding
(2007) - et al.
Bayesian estimation of dirichlet mixture model with variational inference
Pattern Recognition
(2014) - et al.
An evaluation ofdescriptors for large-scale image retrieval from sketched feature lines
Computers & Graphics
(2010) - et al.
How do humans sketch objects?
ACM Trans. Graph
(2012) - Y. Li, Y.-Z. Song, S. Gong, Sketch recognition by ensemble matching of structured features, in: BMVC...
- et al.
Sketch-based image retrieval: Benchmark and bag-of-features descriptors
IEEE Transactions on Visualization and Computer Graphics
(2011) - S. Marvaniya, S. Bhattacharjee, V. Manickavasagam, A. Mittal, Drawing an automatic sketch of deformable objects using...
- et al.
A century of Gestalt psychology in visual perception: I. Perceptual grouping and figure-ground organization
Psychological bulletin
(2012) Laws of organization in perceptual forms
(1938)
A generic grouping algorithm and its quantitative analysis, IEEE Trans
Pattern Anal. Mach. Intell
Edge grouping combining boundary and region information
IEEE Transactions on Image Processing
Adaptive pseudo dilation for gestalt edge grouping and contour detection
IEEE Transactions on Image Processing
In search of perceptually salient groupings
IEEE Transactions on Image Processing
A century of Gestalt psychology in visual perception: II
Conceptual and theoretical foundations., Psychological bulletin
The whole is equal to the sum of its parts: A probabilistic model of grouping by proximity and similarity in regular patterns
Psychological Review
An experimental comparison of mincut/max-flow algorithms for energy minimization in vision
IEEE Trans. Pattern Anal. Mach. Intell
Fast approximate energy minimization via graph cuts
IEEE Trans. Pattern Anal. Mach. Intell
Fast approximate energy minimization with label costs
International Journal of Computer Vision
Efficient algorithms for ranking with svms
Inf. Retr.
Cited by (31)
A deformable CNN-based triplet model for fine-grained sketch-based image retrieval
2022, Pattern RecognitionCitation Excerpt :We perform the validation experiments on the same nine category of images both in Sketchy and Flickr15k. State-of-the-art baselines include Edge box [51], Canny, Perceptual grouping [52], saliency based on a random walk model [53], HED [54], RCF [55], CycleGAN [56], DualGAN [48], Deep shape [57], StyleGAN [58], StyleGAN2 [59], Atten-Dual-GAN [60]. The nine overlapping image categories include horse, hot-air balloon, flower, butterfly, bicycle, airplane, sailboat, swan and duck.
Cali-sketch: Stroke calibration and completion for high-quality face image generation from human-like sketches
2021, NeurocomputingCitation Excerpt :Our method can help complete necessary object information critical for a reliable search performance. Compared with image inpainting [28,65] or image-to-sketch synthesis [49,54,39], generating photo-realistic image from human-like sketch is more challenging since there is less information in sketches. Thus, we temporarily experiment on frontal faces without large rotation and translation.
A relic sketch extraction framework based on detail-aware hierarchical deep network
2021, Signal ProcessingCitation Excerpt :Hussein et al. [24] proposed a hybrid optimization model combining particle swarm optimization with a local search algorithm for sketches. Qi et al. [25] proposed a learning ranking strategy that uses perceptual grouping to automatically generate sketches. These methods fuse prior knowledge into the sketch extraction and are robust to scattered lines and complex noise.
Better freehand sketch synthesis for sketch-based image retrieval: Beyond image edges
2018, NeurocomputingCitation Excerpt :To ensure generality and fairness to all algorithms, we perform the validation experiments on the same nine categories of images both in the Sketchy database and the Flickr15k [10] dataset. For more details, the baselines we used for validation include the methods of Canny [41], perceptual grouping [42], saliency based on a random model [43], HED [21], RCF [22], DualGAN [28], and the CycleGAN [34]. Images of the nine classes include airplane, horse, flower, butterfly, sailboat, swan, bicycle, hot-air balloon and duck.
Auto-painter: Cartoon image generation from sketch by using conditional Wasserstein generative adversarial networks
2018, NeurocomputingCitation Excerpt :Since it is not easy to find paired sketch-image cartoons for supervised training, we need to extract the sketches from the cartoon images as our training set. In the past, researchers pay attention to synthesis sketches of the images [36–38] for the practical application of sketch-based image retrieval. We employ an extended different-of-Gaussians operator among them (XDoG [38]) to extract sketches from colored cartoons.
Cross-modal subspace learning for fine-grained sketch-based image retrieval
2018, NeurocomputingCitation Excerpt :The vibrancy of the SBIR area also promoted the development of other related research problems, such as sketch recognition [11,12], sketch synthesis [13–15], sketch-based 3D retrieval [16], and sketch segmentation [17]. From a technical perspective, SBIR is traditionally cast into a classification task, with most prior work evaluating the retrieval performance at category-level [12,18–21]. More recently, fine-grained variants of SBIR [1,7] requires retrieval to be conducted within single object categories.
Yonggang Qi is currently a Ph.D. candidate at School of Information and Communication Engineering, Beijing University of Posts and Telecommunications. His research interest is computer vision, particularly focus on perceptual grouping and object recognition. He was a visiting Ph.D. student at Department of Electronic Systems at Aalborg University, Aalborg, Denmark, in 2013.
Jun Guo received his Ph.D. from Tohuku-Gakuin University, in 1993. He is currently the Vice-President of BUPT, a Distinguished Professor at Beijing University of Posts and Telecommunications, and the Dean of the School of Information and Communication Engineering. He is mainly engaged in the research of pattern recognition, web searching, and network management. He has more than 200 publications at International top journals and conferences, including SCIENCE, IEEE Trans. on PAMI, IEICE Trans, ICPR, ICCV, SIGIR. He received numerous international and national awards, including 3 IEEE International Awards, the second prize of Beijing scientific and technological progress, the second prize of the Ministry of Posts and Telecommunications scientific and technological progress.
Yi-Zhe Song is a Lecturer (Assistant Professor) at School of Electronic Engineering and Computer Science, Queen Mary, University of London. He researches into computer vision, computer graphics and their convergence, particularly perceptual grouping, image segmentation (description), cross-domain image analysis, non-photorealistic rendering, with a recent emphasis on human sketch representation, recognition and retrieval. He received both the B.Sc. (first class) and Ph.D. degrees in Computer Science from the Department of Computer Science, University of Bath, UK, in 2003 and 2008, respectively; prior to his doctoral studies, he obtained a Diploma (M.Sc.) degree in Computer Science from the Computer Laboratory, University of Cambridge, UK, in 2004. Prior to 2011, he worked at University of Bath as a Research and Teaching Fellow. He is an Associate Editor of Neurocomputing and member of IEEE and BMVA.
Tao Xiang received the Ph.D. degree in electrical and computer engineering from the National University of Singapore, in 2002. He is currently a Reader (Associate Professor) in the School of Electronic Engineering and Computer Science, Queen Mary University of London. His research interests include computer vision, machine learning, and data mining. He has published over 100 papers in international journals and conferences and co-authored a book, Visual Analysis of Behaviour: From Pixels to Semantics.
Honggang Zhang received the B.S. degree from the Department of Electrical Engineering, Shandong University, in 1996, the Master and Ph.D. degrees from the School of Information Engineering, Beijing University of Posts and Telecommunications (BUPT), in 1999 and 2003, respectively. He worked as a visiting scholar in School of Computer Science, Carnegie Mellon University (CMU) from 2007 to 2008. He is currently an Associate Professor and Director of web search center at BUPT. His research interests include image retrieval, computer vision and pattern recognition. He published more than 30 papers on TPAMI, SCIENCE, Machine Vision and Applications, AAAI, ICPR, ICIP. He is a Senior Member of IEEE.
Zheng-Hua Tan received the B.Sc. and M.Sc. degrees in Electrical Engineering from Hunan University, Changsha, China, in 1990 and 1996, respectively, and the Ph.D. degree in electronic engineering from Shanghai Jiao Tong University, Shanghai, China, in 1999. He is an Associate Professor in the Department of Electronic Systems at Aalborg University, Aalborg, Denmark, which he joined in May 2001. He was a Visiting Scientist at the Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, USA, an Associate Professor in the Department of Electronic Engineering at Shanghai Jiao Tong University, and a Postdoctoral Fellow in the Department of Computer Science at Korea Advanced Institute of Science and Technology, Daejeon, Korea. His research interests include speech and speaker recognition, noise-robust speech processing, multimedia signal and information processing, human–robot interaction, and machine learning. He has published extensively in these areas in refereed journals and conference proceedings. He has served as an Editorial Board Member/Associate Editor for Elsevier Computer Speech and Language, Elsevier Digital Signal Processing and Elsevier Computers and Electrical Engineering. He was a Lead Guest Editor for the IEEE Journal of Selected Topics in Signal Processing. He has served/serves as a Program Co-chair, Area and Session Chair, Tutorial Speaker and Committee Member in many major international conferences.