Skip to main content
Erschienen in: Neural Computing and Applications 21/2023

08.04.2023 | Original Article

A sketch semantic segmentation method using novel local feature aggregation and segment-level self-attention

verfasst von: Lei Wang, Shihui Zhang, Wei Wang, Weibo Zhao

Erschienen in: Neural Computing and Applications | Ausgabe 21/2023

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Sketch semantic segmentation presents great challenges, since sketches have simpler appearances and more levels of abstraction than natural images. To overcome these challenges, we propose a sketch semantic segmentation method. Concretely, we treat a sketch as a 2D point set and exploit the structures of strokes and the spatial position relationship among 2D points to develop a novel local feature aggregation module. The novel local feature aggregation module encodes informative local features, which are highly useful to analyze semantics. And we define “stroke distance” to balance the two-dimensional spatial distributions of sketches and the internal structures of strokes. Simultaneously, we design a segment-level self-attention module to establish and enhance the relationship between segments by encoding the contents and positions of segment features. Further, based on the above two modules, we construct a similar encoder–decoder structure with two sub-branches, which retains the features of the significant points and integrates the features of several intermediate stages by utilizing a global multi-scale mechanism. Finally, the two outputs of the two sub-branches are fused to obtain the final sketch semantic segmentation result. Extensive experiments on SPG and SketchSeg-150K show that our method achieves state-of-the-art results.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Li L, Zou CQ, Zheng YY et al (2021) Sketch-R2CNN: an RNN-rasterization-CNN architecture for vector sketch recognition. IEEE Trans Vis Comput Graph 27(9):3745–3754CrossRef Li L, Zou CQ, Zheng YY et al (2021) Sketch-R2CNN: an RNN-rasterization-CNN architecture for vector sketch recognition. IEEE Trans Vis Comput Graph 27(9):3745–3754CrossRef
2.
Zurück zum Zitat Wan J, Zhang KH, Li HD et al (2021) Angular-driven feedback restoration networks for imperfect sketch recognition. IEEE Trans Image Process 30:5085–5095CrossRef Wan J, Zhang KH, Li HD et al (2021) Angular-driven feedback restoration networks for imperfect sketch recognition. IEEE Trans Image Process 30:5085–5095CrossRef
3.
Zurück zum Zitat Lin H, Fu Y, Jiang Y G et al (2020) Sketch-BERT: learning sketch bidirectional encoder representation from transformers by self-supervised learning of sketch gestalt. In: IEEE conference on computer vision and pattern recognition. IEEE Computer Society, pp 6757–6766 Lin H, Fu Y, Jiang Y G et al (2020) Sketch-BERT: learning sketch bidirectional encoder representation from transformers by self-supervised learning of sketch gestalt. In: IEEE conference on computer vision and pattern recognition. IEEE Computer Society, pp 6757–6766
4.
Zurück zum Zitat Zhang XL, Shen ML, Xue M et al (2022) A deformable CNN-based triplet model for fine-grained sketch-based image retrieval. Pattern Recognit 125:108508CrossRef Zhang XL, Shen ML, Xue M et al (2022) A deformable CNN-based triplet model for fine-grained sketch-based image retrieval. Pattern Recognit 125:108508CrossRef
5.
Zurück zum Zitat Chen YD, Zhang ZL, Wang YF et al (2022) AE-Net: fine-grained sketch-based image retrieval via attention-enhanced network. Pattern Recognit 122:108291CrossRef Chen YD, Zhang ZL, Wang YF et al (2022) AE-Net: fine-grained sketch-based image retrieval via attention-enhanced network. Pattern Recognit 122:108291CrossRef
6.
Zurück zum Zitat Bhunia AK, Chowdhury PN, Sain A et al (2021) More photos are all you need: semi-supervised learning for fine-grained sketch based image retrieval. In: IEEE computer society conference on computer vision and pattern recognition. IEEE Computer Society, pp 4245–4254 Bhunia AK, Chowdhury PN, Sain A et al (2021) More photos are all you need: semi-supervised learning for fine-grained sketch based image retrieval. In: IEEE computer society conference on computer vision and pattern recognition. IEEE Computer Society, pp 4245–4254
7.
Zurück zum Zitat Gryaditskaya YL, Song JF, Yang YX et al (2021) Toward fine-grained sketch-based 3d shape retrieval. IEEE Trans Image Process 30:8595–8606CrossRef Gryaditskaya YL, Song JF, Yang YX et al (2021) Toward fine-grained sketch-based 3d shape retrieval. IEEE Trans Image Process 30:8595–8606CrossRef
8.
Zurück zum Zitat He X, Zhou Y, Zhou Z et al (2018) Triplet-center loss for multi-view 3d object retrieval. In: IEEE conference on computer vision and pattern recognition. IEEE Computer Society, pp 1945–1954 He X, Zhou Y, Zhou Z et al (2018) Triplet-center loss for multi-view 3d object retrieval. In: IEEE conference on computer vision and pattern recognition. IEEE Computer Society, pp 1945–1954
9.
Zurück zum Zitat Ge C, Sun HF, Song YZ et al (2022) Exploring local detail perception for scene sketch semantic segmentation. IEEE Trans Image Process 31:1447–1461CrossRef Ge C, Sun HF, Song YZ et al (2022) Exploring local detail perception for scene sketch semantic segmentation. IEEE Trans Image Process 31:1447–1461CrossRef
10.
Zurück zum Zitat Yang LM, Zhuang JJ, Fu HB et al (2021) SketchGNN: semantic sketch segmentation with graph neural networks. ACM Trans Graph 40(3):1–13CrossRef Yang LM, Zhuang JJ, Fu HB et al (2021) SketchGNN: semantic sketch segmentation with graph neural networks. ACM Trans Graph 40(3):1–13CrossRef
11.
Zurück zum Zitat Sarvadevabhatla RK, Dwivedi I, Biswas A et al (2017) Sketchparse: towards rich descriptions for poorly drawn sketches using multi-task hierarchical deep networks. In: ACM international conference on multimedia. Association for Computing Machinery, pp 10–18 Sarvadevabhatla RK, Dwivedi I, Biswas A et al (2017) Sketchparse: towards rich descriptions for poorly drawn sketches using multi-task hierarchical deep networks. In: ACM international conference on multimedia. Association for Computing Machinery, pp 10–18
12.
Zurück zum Zitat Zhu MR, Li J, Wang NN et al (2021) Learning deep patch representation for probabilistic graphical model-based face sketch synthesis. Int J Comput Vision 129(6):1820–1836CrossRef Zhu MR, Li J, Wang NN et al (2021) Learning deep patch representation for probabilistic graphical model-based face sketch synthesis. Int J Comput Vision 129(6):1820–1836CrossRef
13.
Zurück zum Zitat Willis KD, Jayaraman PK, Lambourne JG et al (2021) Engineering sketch generation for computer-aided design. In: IEEE computer society conference on computer vision and pattern recognition workshops. IEEE Computer Society, pp 2105–2114 Willis KD, Jayaraman PK, Lambourne JG et al (2021) Engineering sketch generation for computer-aided design. In: IEEE computer society conference on computer vision and pattern recognition workshops. IEEE Computer Society, pp 2105–2114
14.
Zurück zum Zitat Xu BX, Chang W, Sheffer A et al (2014) True2Form: 3D curve networks from 2D sketches via selective regularization. ACM Trans Graph 33(4):1–13CrossRef Xu BX, Chang W, Sheffer A et al (2014) True2Form: 3D curve networks from 2D sketches via selective regularization. ACM Trans Graph 33(4):1–13CrossRef
15.
Zurück zum Zitat Xu K, Chen K et al (2013) Sketch2scene: sketch-based co-retrieval and co-placement of 3d models. ACM Trans Graph 32(4):123:1-123:15CrossRef Xu K, Chen K et al (2013) Sketch2scene: sketch-based co-retrieval and co-placement of 3d models. ACM Trans Graph 32(4):123:1-123:15CrossRef
16.
Zurück zum Zitat Pu JT, Gur D (2009) Automated freehand sketch segmentation using radial basis functions. CAD Comput Aided Des 41(12):857–864CrossRef Pu JT, Gur D (2009) Automated freehand sketch segmentation using radial basis functions. CAD Comput Aided Des 41(12):857–864CrossRef
17.
Zurück zum Zitat Sun ZB, Wang CH, Zhang LQ et al (2012) Free hand-drawn sketch segmentation. European conference on computer vision. Springer, New York, pp 626–639 Sun ZB, Wang CH, Zhang LQ et al (2012) Free hand-drawn sketch segmentation. European conference on computer vision. Springer, New York, pp 626–639
18.
Zurück zum Zitat Schneider RG, Tuytelaars T (2016) Example-based sketch segmentation and labeling using CRFs. ACM Trans Graph 35(5):1–9CrossRef Schneider RG, Tuytelaars T (2016) Example-based sketch segmentation and labeling using CRFs. ACM Trans Graph 35(5):1–9CrossRef
19.
Zurück zum Zitat Wu XY, Qi YG, Liu J et al (2018) Sketchsegnet: aRNN model for labeling sketch strokes. In: IEEE international workshop on machine learning for signal processing. IEEE Computer Society, pp 1–6 Wu XY, Qi YG, Liu J et al (2018) Sketchsegnet: aRNN model for labeling sketch strokes. In: IEEE international workshop on machine learning for signal processing. IEEE Computer Society, pp 1–6
20.
Zurück zum Zitat Li K, Pang KY, Song YZ et al (2019) Towards deep universal sketch perceptual grouper. IEEE Trans Image Process 28(7):3219–3231MathSciNetCrossRefMATH Li K, Pang KY, Song YZ et al (2019) Towards deep universal sketch perceptual grouper. IEEE Trans Image Process 28(7):3219–3231MathSciNetCrossRefMATH
21.
Zurück zum Zitat Qi YG, Tan ZH (2019) SketchSegNet+: an end-to-end learning of RNN for multi-class sketch semantic segmentation. IEEE Access 7:102717–102726CrossRef Qi YG, Tan ZH (2019) SketchSegNet+: an end-to-end learning of RNN for multi-class sketch semantic segmentation. IEEE Access 7:102717–102726CrossRef
22.
Zurück zum Zitat Li L, Fu HB, Tai CL (2019) Fast sketch segmentation and labeling with deep learning. IEEE Comput Graph Appl 39(2):38–51CrossRef Li L, Fu HB, Tai CL (2019) Fast sketch segmentation and labeling with deep learning. IEEE Comput Graph Appl 39(2):38–51CrossRef
23.
Zurück zum Zitat Zhu XY, Xiao Y, Zheng Y (2020) 2D freehand sketch labeling using CNN and CRF. Multimed Tools Appl 79(3):1–18 Zhu XY, Xiao Y, Zheng Y (2020) 2D freehand sketch labeling using CNN and CRF. Multimed Tools Appl 79(3):1–18
24.
Zurück zum Zitat Wang F, Lin SJ, Li HH et al (2020) Multi-column point-CNN for sketch segmentation. Neurocomputing 392:50–59CrossRef Wang F, Lin SJ, Li HH et al (2020) Multi-column point-CNN for sketch segmentation. Neurocomputing 392:50–59CrossRef
25.
Zurück zum Zitat Wang F, Lin S, Wu H et al (2019) SPFusionNet: sketch segmentation using multi-modal data fusion. In: IEEE international conference on multimedia and expo. IEEE Computer Society, pp 1654–1659 Wang F, Lin S, Wu H et al (2019) SPFusionNet: sketch segmentation using multi-modal data fusion. In: IEEE international conference on multimedia and expo. IEEE Computer Society, pp 1654–1659
26.
Zurück zum Zitat Huang Z, Fu HB, Lau RWH et al (2014) Data-driven segmentation and labeling of freehand sketches. ACM Trans Graph 33(6):1–10CrossRef Huang Z, Fu HB, Lau RWH et al (2014) Data-driven segmentation and labeling of freehand sketches. ACM Trans Graph 33(6):1–10CrossRef
27.
Zurück zum Zitat Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. IEEE Trans Pattern Anal Mach Intell 39(4):640–651 Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. IEEE Trans Pattern Anal Mach Intell 39(4):640–651
28.
Zurück zum Zitat Badrinarayanan V, Kendall A, Cipolla R (2017) SegNet: a deep convolutional encoder–decoder architecture for image segmentation. IEEE Trans Pattern Anal Mach Intell 39(12):2481–2495CrossRef Badrinarayanan V, Kendall A, Cipolla R (2017) SegNet: a deep convolutional encoder–decoder architecture for image segmentation. IEEE Trans Pattern Anal Mach Intell 39(12):2481–2495CrossRef
29.
Zurück zum Zitat Chen LC, Papandreou G, Kokkinos I et al (2015) Semantic image segmentation with deep convolutional nets and fully connected CRFs. In: International conference on learning representations, ICLR Chen LC, Papandreou G, Kokkinos I et al (2015) Semantic image segmentation with deep convolutional nets and fully connected CRFs. In: International conference on learning representations, ICLR
30.
Zurück zum Zitat Chen LC, Papandreou G, Kokkinos I et al (2018) DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans Pattern Anal Mach Intell 40(4):834–848CrossRef Chen LC, Papandreou G, Kokkinos I et al (2018) DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans Pattern Anal Mach Intell 40(4):834–848CrossRef
31.
Zurück zum Zitat Chen LC, Papandreou G, Schroff F et al (2017) Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587 Chen LC, Papandreou G, Schroff F et al (2017) Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:​1706.​05587
32.
Zurück zum Zitat Chen LC, Zhu Y, Papandreou G et al (2018) Encoder-decoder with atrous separable convolution for semantic image segmentation. European conference on computer vision. Springer, New York, pp 833–851 Chen LC, Zhu Y, Papandreou G et al (2018) Encoder-decoder with atrous separable convolution for semantic image segmentation. European conference on computer vision. Springer, New York, pp 833–851
33.
Zurück zum Zitat Ronneberger O, Fischer P, Brox T (2015) U-Net: convolutional networks for biomedical image segmentation. International conference on medical image computing and computer-assisted intervention. Springer Verlag, New York, pp 234–241 Ronneberger O, Fischer P, Brox T (2015) U-Net: convolutional networks for biomedical image segmentation. International conference on medical image computing and computer-assisted intervention. Springer Verlag, New York, pp 234–241
34.
Zurück zum Zitat Oktay O, Schlemper J, Folgoc L L et al (2018) Attention U-Net: learning where to look for the pancreas. arXiv preprint arXiv:1804.03999 Oktay O, Schlemper J, Folgoc L L et al (2018) Attention U-Net: learning where to look for the pancreas. arXiv preprint arXiv:​1804.​03999
35.
Zurück zum Zitat Zhou Z, Siddiquee M, Tajbakhsh N et al (2018) U-Net++: a nested U-Net architecture for medical image segmentation. Lect Notes Comput Sci 11045:3–11CrossRef Zhou Z, Siddiquee M, Tajbakhsh N et al (2018) U-Net++: a nested U-Net architecture for medical image segmentation. Lect Notes Comput Sci 11045:3–11CrossRef
36.
Zurück zum Zitat Alom MZ, Hasan M, Yakopcic C et al (2018) Recurrent residual convolutional neural network based on U-Net (R2U-Net) for medical image segmentation. arXiv preprint arXiv:1802.06955 Alom MZ, Hasan M, Yakopcic C et al (2018) Recurrent residual convolutional neural network based on U-Net (R2U-Net) for medical image segmentation. arXiv preprint arXiv:​1802.​06955
37.
Zurück zum Zitat Zhang X, Xu HM, Mo H et al (2021) DCNAs: Densely connected neural architecture search for semantic image segmentation. In: IEEE conference on computer vision and pattern recognition. IEEE Computer Society, pp 13951–13962 Zhang X, Xu HM, Mo H et al (2021) DCNAs: Densely connected neural architecture search for semantic image segmentation. In: IEEE conference on computer vision and pattern recognition. IEEE Computer Society, pp 13951–13962
38.
Zurück zum Zitat Yu F, Koltun V (2016) Multi-scale context aggregation by dilated convolutions. In: International conference on learning representations, ICLR Yu F, Koltun V (2016) Multi-scale context aggregation by dilated convolutions. In: International conference on learning representations, ICLR
39.
Zurück zum Zitat Ma X, Qin C, You H X et al (2022) Rethinking network design and local geometry in point cloud: a simple residual MLP framework. arXiv preprint arXiv:2202.07123 Ma X, Qin C, You H X et al (2022) Rethinking network design and local geometry in point cloud: a simple residual MLP framework. arXiv preprint arXiv:​2202.​07123
40.
Zurück zum Zitat Yu Q, Yang Y, Liu F et al (2017) Sketch-a-Net: a deep neural network that beats humans. Int J Comput Vision 122(3):411–425MathSciNetCrossRef Yu Q, Yang Y, Liu F et al (2017) Sketch-a-Net: a deep neural network that beats humans. Int J Comput Vision 122(3):411–425MathSciNetCrossRef
41.
Zurück zum Zitat Weinberger KQ, Saul LK (2009) Distance metric learning for large margin nearest neighbor classification. J Mach Learn Res 10:207–244MATH Weinberger KQ, Saul LK (2009) Distance metric learning for large margin nearest neighbor classification. J Mach Learn Res 10:207–244MATH
42.
Zurück zum Zitat Stefano Z, Shabab B, Stefan H et al (2022) PolyWorld: polygonal building extraction with graph neural networks in satellite images. In: IEEE/CVF conference on computer vision and pattern recognition, IEEE Stefano Z, Shabab B, Stefan H et al (2022) PolyWorld: polygonal building extraction with graph neural networks in satellite images. In: IEEE/CVF conference on computer vision and pattern recognition, IEEE
43.
Zurück zum Zitat Odena A, Olah C, Shlens J (2017) Conditional image synthesis with auxiliary classifier gans. In: International conference on machine learning. IMLS, pp 4043–4055 Odena A, Olah C, Shlens J (2017) Conditional image synthesis with auxiliary classifier gans. In: International conference on machine learning. IMLS, pp 4043–4055
Metadaten
Titel
A sketch semantic segmentation method using novel local feature aggregation and segment-level self-attention
verfasst von
Lei Wang
Shihui Zhang
Wei Wang
Weibo Zhao
Publikationsdatum
08.04.2023
Verlag
Springer London
Erschienen in
Neural Computing and Applications / Ausgabe 21/2023
Print ISSN: 0941-0643
Elektronische ISSN: 1433-3058
DOI
https://doi.org/10.1007/s00521-023-08504-1

Weitere Artikel der Ausgabe 21/2023

Neural Computing and Applications 21/2023 Zur Ausgabe

Premium Partner