Skip to main content
Top
Published in: Pattern Recognition and Image Analysis 3/2019

01-07-2019 | REPRESENTATION, PROCESSING, ANALYSIS, AND UNDERSTANDING OF IMAGES

Image Classification Model Using Visual Bag of Semantic Words

Published in: Pattern Recognition and Image Analysis | Issue 3/2019

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

In the image classification field, the visual bag of words (BoW) has two drawbacks. One is low classification accuracy because a visual BoW is typically extracted from local low-level visual feature vectors via key points, without considering the high-level semantics of an image. The other is excessive time consumption because the size of the vocabulary is very large, especially for images with explicit backgrounds and object content. To solve these two problems, we propose a novel image classification model based on a visual bag of semantic words (BoSW), which includes an automatic segmentation algorithm based on graph cuts to extract major semantic regions and a semantic annotation algorithm based on support vector machine to label the regions with a visual semantic vocabulary. The proposed BoSW model refines image semantics by introducing user conceptions for extracting semantic vocabularies and reducing the size of the vocabulary. Experimental results demonstrate the superiority of the proposed algorithm through comparisons with state-of-the-art methods on benchmark datasets.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference J. Sivic and A. Zisserman, “Video Google: A text retrieval approach to object matching in videos,” in Proc. Ninth IEEE Int. Conf. on Computer Vision (ICCV 2003) (Nice, France, 2003), Vol. 2, pp. 1470–1477. J. Sivic and A. Zisserman, “Video Google: A text retrieval approach to object matching in videos,” in Proc. Ninth IEEE Int. Conf. on Computer Vision (ICCV 2003) (Nice, France, 2003), Vol. 2, pp. 1470–1477.
2.
go back to reference C. Wang and K. Huang, “How to use Bag-of-Words model better for image classification,” Image Vision Comput. 38, 65–74 (2015).CrossRef C. Wang and K. Huang, “How to use Bag-of-Words model better for image classification,” Image Vision Comput. 38, 65–74 (2015).CrossRef
3.
go back to reference J. Yang, K. Yu, Y. Gong, and T. Huang, “Linear spatial pyramid matching using sparse coding for image classification,” in Proc. 2009 IEEE Conf. on Computer Vision and Pattern Recognition (CVPR 2009) (Miami, FL, USA, 2009), pp. 1794–1801. J. Yang, K. Yu, Y. Gong, and T. Huang, “Linear spatial pyramid matching using sparse coding for image classification,” in Proc. 2009 IEEE Conf. on Computer Vision and Pattern Recognition (CVPR 2009) (Miami, FL, USA, 2009), pp. 1794–1801.
4.
go back to reference K. Vyas, Y. Vora, and R. Vastani, “Using bag of visual words and spatial pyramid matching for object classification along with applications for RIS,” Procedia Comput. Sci. 89, 457–464 (2016).CrossRef K. Vyas, Y. Vora, and R. Vastani, “Using bag of visual words and spatial pyramid matching for object classification along with applications for RIS,” Procedia Comput. Sci. 89, 457–464 (2016).CrossRef
5.
go back to reference L. Liu, L. Wang, and X. Liu, “In defense of soft-assignment coding,” in Proc. 2011 IEEE Int. Conf. on Computer Vision (ICCV 2011) (Barcelona, Spain, 2011), pp. 2486–2493. L. Liu, L. Wang, and X. Liu, “In defense of soft-assignment coding,” in Proc. 2011 IEEE Int. Conf. on Computer Vision (ICCV 2011) (Barcelona, Spain, 2011), pp. 2486–2493.
6.
go back to reference C. Zhang, J. Liu, C. Liang, et al., “Image classification by non-negative sparse coding, correlation constrained low-rank and sparse decomposition,” Comput. Vision Image Understand. 123, 14–22 (2014).CrossRef C. Zhang, J. Liu, C. Liang, et al., “Image classification by non-negative sparse coding, correlation constrained low-rank and sparse decomposition,” Comput. Vision Image Understand. 123, 14–22 (2014).CrossRef
7.
go back to reference J. Shi, Y. Li, J. Zhu, et al., “Joint sparse coding based spatial pyramid matching for classification of color medical image,” Comput. Med. Imaging Graphics 41, 61–66 (2015).CrossRef J. Shi, Y. Li, J. Zhu, et al., “Joint sparse coding based spatial pyramid matching for classification of color medical image,” Comput. Med. Imaging Graphics 41, 61–66 (2015).CrossRef
8.
go back to reference P. Li, Y. Liu, G. Liu, et al., “A robust local sparse coding method for image classification with Histogram Intersection Kernel,” Neurocomput. 184, 36–42 (2016).CrossRef P. Li, Y. Liu, G. Liu, et al., “A robust local sparse coding method for image classification with Histogram Intersection Kernel,” Neurocomput. 184, 36–42 (2016).CrossRef
9.
go back to reference S. Ensafi, S. Lu, A. A. Kassim, and C. L. Tan, “Accurate HEp-2 cell classification based on sparse bag of words coding,” Comput. Med. Imaging Graphics 57, 40–49 (2017).CrossRef S. Ensafi, S. Lu, A. A. Kassim, and C. L. Tan, “Accurate HEp-2 cell classification based on sparse bag of words coding,” Comput. Med. Imaging Graphics 57, 40–49 (2017).CrossRef
10.
go back to reference H. Jégou, F. Perronnin, M. Douze, et al., “Aggregating local image descriptors into compact codes,” IEEE Trans. Pattern Anal. Mach. Intell. 34 (9), 1704–1716 (2012).CrossRef H. Jégou, F. Perronnin, M. Douze, et al., “Aggregating local image descriptors into compact codes,” IEEE Trans. Pattern Anal. Mach. Intell. 34 (9), 1704–1716 (2012).CrossRef
11.
go back to reference X. Guo and X. Cao, “Good match exploration using triangle constraint,” Pattern Recogn. Lett. 33 (7), 872–881 (2012).CrossRef X. Guo and X. Cao, “Good match exploration using triangle constraint,” Pattern Recogn. Lett. 33 (7), 872–881 (2012).CrossRef
12.
go back to reference M.-M. Cheng, N. J. Mitra, X. Huang, et al., “Global contrast based salient region detection,” IEEE Trans. Pattern Anal. Mach. Intell. 37 (3), 569–582 (2015).CrossRef M.-M. Cheng, N. J. Mitra, X. Huang, et al., “Global contrast based salient region detection,” IEEE Trans. Pattern Anal. Mach. Intell. 37 (3), 569–582 (2015).CrossRef
13.
go back to reference R. Wang, K. Ding, J. Yang, and L. Xue, “A novel method for image classification based on bag of visual words,” J. Visual Commun. Image Represent. 40 (Part A), 24–33 (2016). R. Wang, K. Ding, J. Yang, and L. Xue, “A novel method for image classification based on bag of visual words,” J. Visual Commun. Image Represent. 40 (Part A), 24–33 (2016).
14.
go back to reference Y. L. Boureau, J. Ponce, and Y. LeCun, “A theoretical analysis of feature pooling in visual recognition,” in Proc. 27th Int. Conf. on Machine Learning (ICML 2010) (Haifa, Israel, 2010), pp. 111–118. Y. L. Boureau, J. Ponce, and Y. LeCun, “A theoretical analysis of feature pooling in visual recognition,” in Proc. 27th Int. Conf. on Machine Learning (ICML 2010) (Haifa, Israel, 2010), pp. 111–118.
15.
go back to reference X. Zhou, K. Yu, T. Zhang, and T. S. Huang, “Image classification using super-vector coding of local image descriptors,” in Computer Vision–ECCV 2010, Proc. 11th European Conference on Computer Vision, Part V, Ed. by K. Daniilidis, P. Maragos, and N. Paragios, Lecture Notes in Computer Science (Springer, Berlin, Heidelberg, 2010), Vol. 6315, pp. 141–154. X. Zhou, K. Yu, T. Zhang, and T. S. Huang, “Image classification using super-vector coding of local image descriptors,” in Computer VisionECCV 2010, Proc. 11th European Conference on Computer Vision, Part V, Ed. by K. Daniilidis, P. Maragos, and N. Paragios, Lecture Notes in Computer Science (Springer, Berlin, Heidelberg, 2010), Vol. 6315, pp. 141–154.
16.
go back to reference Y. Huang, K. Huang, Y. Yu, and T. Tan, “Salient coding for image classification,” in Proc. 2011 IEEE Conf. on Computer Vision and Pattern Recognition (CVPR 2011) (Colorado Springs, CO, USA, 2011), pp. 1753–1760. Y. Huang, K. Huang, Y. Yu, and T. Tan, “Salient coding for image classification,” in Proc. 2011 IEEE Conf. on Computer Vision and Pattern Recognition (CVPR 2011) (Colorado Springs, CO, USA, 2011), pp. 1753–1760.
17.
go back to reference Z. Lu, L. Wang, and J.-R. Wen, “Image classification by visual bag-of-words refinement and reduction,” Neurocomput. 173 (Part 2), 373–384 (2016).CrossRef Z. Lu, L. Wang, and J.-R. Wen, “Image classification by visual bag-of-words refinement and reduction,” Neurocomput. 173 (Part 2), 373–384 (2016).CrossRef
18.
go back to reference M. Guillaumin, J. Verbeek, and C. Schmid, “Multimodal semi-supervised learning for image classification,” in Proc. 2010 IEEE Computer Society Conf. on Computer Vision and Pattern Recognition (CVPR 2011) (San Francisco, CA, USA, 2010), pp. 902–909. M. Guillaumin, J. Verbeek, and C. Schmid, “Multimodal semi-supervised learning for image classification,” in Proc. 2010 IEEE Computer Society Conf. on Computer Vision and Pattern Recognition (CVPR 2011) (San Francisco, CA, USA, 2010), pp. 902–909.
19.
go back to reference R. Khan, C. Barat, D. Muselet, and C. Ducottet, “Spatial histograms of soft pairwise similar patches to improve the bag-of-visual-words model,” Comput. Vision Image Understand. 132, 102–112 (2015).CrossRef R. Khan, C. Barat, D. Muselet, and C. Ducottet, “Spatial histograms of soft pairwise similar patches to improve the bag-of-visual-words model,” Comput. Vision Image Understand. 132, 102–112 (2015).CrossRef
20.
go back to reference R. Xu, S. Jiang, X. Song, et al., “MIAR ICT participation at Robot Vision 2013,” in Working Notes of ImageCLEF 2013 Robot Vision (Valencia, Spain, 2013). R. Xu, S. Jiang, X. Song, et al., “MIAR ICT participation at Robot Vision 2013,” in Working Notes of ImageCLEF 2013 Robot Vision (Valencia, Spain, 2013).
21.
go back to reference C. Zhang, R. Li, Q. Huang, and Q. Tian, “Hierarchical deep semantic representation for visual categorization,” Neurocomput. 257, 88–96 (2017).CrossRef C. Zhang, R. Li, Q. Huang, and Q. Tian, “Hierarchical deep semantic representation for visual categorization,” Neurocomput. 257, 88–96 (2017).CrossRef
22.
go back to reference S. Bloehdorn, K. Petridis, C. Saathoff, et al., “Semantic annotation of images and videos for multimedia analysis,” in The Semantic Web: Research and Applications, Proc. 2nd European Semantic Web Conference, ESWC 2005, Ed. by A. Gómez-Pérez and J. Euzenat, Lecture Notes in Computer Science (Springer, Berlin, Heidelberg, 2005), Vol. 3532, pp. 592–607. S. Bloehdorn, K. Petridis, C. Saathoff, et al., “Semantic annotation of images and videos for multimedia analysis,” in The Semantic Web: Research and Applications, Proc. 2nd European Semantic Web Conference, ESWC 2005, Ed. by A. Gómez-Pérez and J. Euzenat, Lecture Notes in Computer Science (Springer, Berlin, Heidelberg, 2005), Vol. 3532, pp. 592–607.
23.
go back to reference S. Lafon and A. B. Lee, “Diffusion maps and coarse-graining: A unified framework for dimensionality reduction, graph partitioning, and data set parameterization,” IEEE Trans. Pattern Anal. Mach. Intell. 28 (9), 1393–1403 (2006).CrossRef S. Lafon and A. B. Lee, “Diffusion maps and coarse-graining: A unified framework for dimensionality reduction, graph partitioning, and data set parameterization,” IEEE Trans. Pattern Anal. Mach. Intell. 28 (9), 1393–1403 (2006).CrossRef
24.
go back to reference S. Yan, D. Xu, B. Zhang, et al., “Graph embedding and extensions: A general framework for dimensionality reduction,” IEEE Trans. Pattern Anal. Mach. Intell. 29 (1), 40–51 (2007).CrossRef S. Yan, D. Xu, B. Zhang, et al., “Graph embedding and extensions: A general framework for dimensionality reduction,” IEEE Trans. Pattern Anal. Mach. Intell. 29 (1), 40–51 (2007).CrossRef
25.
go back to reference J. Liu, Y. Yang, and M. Shah, “Learning semantic visual vocabularies using diffusion distance,” in Proc. 2009 IEEE Computer Society Conf. on Computer Vision and Pattern Recognition (CVPR 2009) (Miami, FL, USA, 2009), pp. 461–468. J. Liu, Y. Yang, and M. Shah, “Learning semantic visual vocabularies using diffusion distance,” in Proc. 2009 IEEE Computer Society Conf. on Computer Vision and Pattern Recognition (CVPR 2009) (Miami, FL, USA, 2009), pp. 461–468.
26.
go back to reference Z. Lu and Y. Peng, “Exhaustive and efficient constraint propagation: A graph-based learning approach and its applications,” Int. J. Comput. Vision 103 (3), 306–325 (2013).MathSciNetCrossRefMATH Z. Lu and Y. Peng, “Exhaustive and efficient constraint propagation: A graph-based learning approach and its applications,” Int. J. Comput. Vision 103 (3), 306–325 (2013).MathSciNetCrossRefMATH
27.
go back to reference Y. Boykov, O. Veksler, and R. Zabih, “Fast approximate energy minimization via graph cuts,” in Proc. Seventh IEEE Int. Conf. on Computer Vision (ICCV’99) (Kerkyra, Greece, 2002), Vol. 1, pp. 377–384. Y. Boykov, O. Veksler, and R. Zabih, “Fast approximate energy minimization via graph cuts,” in Proc. Seventh IEEE Int. Conf. on Computer Vision (ICCV99) (Kerkyra, Greece, 2002), Vol. 1, pp. 377–384.
28.
go back to reference N.-C. Yang, W.-H. Chang, C.-M. Kuo, and T.-H. Li, “A fast MPEG-7 dominant color extraction with new similarity measure for image retrieval,” J. Visual Commun. Image Represent. 19 (2), 92–105 (2008).CrossRef N.-C. Yang, W.-H. Chang, C.-M. Kuo, and T.-H. Li, “A fast MPEG-7 dominant color extraction with new similarity measure for image retrieval,” J. Visual Commun. Image Represent. 19 (2), 92–105 (2008).CrossRef
29.
go back to reference Y. Zheng and P. Chen, “Clustering based on enhanced α-expansion move,” IEEE Trans. Knowl. Data Eng. 25 (10), 2206–2216 (2013).CrossRef Y. Zheng and P. Chen, “Clustering based on enhanced α-expansion move,” IEEE Trans. Knowl. Data Eng. 25 (10), 2206–2216 (2013).CrossRef
30.
go back to reference A. Bi, F. Chung, S. Wang, et al., “Bayesian Enhanced α-Expansion Move clustering with loose link constraints,” Neurocomput. 194, 288–300 (2016).CrossRef A. Bi, F. Chung, S. Wang, et al., “Bayesian Enhanced α-Expansion Move clustering with loose link constraints,” Neurocomput. 194, 288–300 (2016).CrossRef
31.
go back to reference L. Dong, X. Li, and G. Xie, “Nonlinear methodologies for identifying seismic event and nuclear explosion ujsing Random Forest, Support Vector Machine, and Naïve Bayes Classification,” Abstr. Appl. Anal. 2014, Article ID 459137, 1–8 (2014). L. Dong, X. Li, and G. Xie, “Nonlinear methodologies for identifying seismic event and nuclear explosion ujsing Random Forest, Support Vector Machine, and Naïve Bayes Classification,” Abstr. Appl. Anal. 2014, Article ID 459137, 1–8 (2014).
32.
go back to reference J. Fan, Y. Gao, and H. Luo, “Integrating concept ontology and multitask learning to achieve more effective classifier training for multilevel image annotation,” IEEE Trans. Image Process. 17 (3), 407–426 (2008).MathSciNetCrossRef J. Fan, Y. Gao, and H. Luo, “Integrating concept ontology and multitask learning to achieve more effective classifier training for multilevel image annotation,” IEEE Trans. Image Process. 17 (3), 407–426 (2008).MathSciNetCrossRef
33.
go back to reference Z. Qian, P. Zhong, and J. Chen, “Integrating global and local visual features with semantic hierarchies for two-level image annotation,” Neurocomput. 171, 1167–1174 (2016).CrossRef Z. Qian, P. Zhong, and J. Chen, “Integrating global and local visual features with semantic hierarchies for two-level image annotation,” Neurocomput. 171, 1167–1174 (2016).CrossRef
34.
go back to reference S. Bernard, C. Chatelain, S. Adam, and R. Sabourin, “The Multiclass ROC Front method for cost-sensitive classification,” Pattern Recogn. 52, 46–60 (2015).CrossRef S. Bernard, C. Chatelain, S. Adam, and R. Sabourin, “The Multiclass ROC Front method for cost-sensitive classification,” Pattern Recogn. 52, 46–60 (2015).CrossRef
35.
go back to reference L. Li, C. C. Yan, X. Chen, C. Zhang, J. Yin, B. Jiang, and Q. Huang, “Distributed image understanding with semantic dictionary and semantic expansion,” Neurocomput. 174 (Part A), 384–392 (2016). L. Li, C. C. Yan, X. Chen, C. Zhang, J. Yin, B. Jiang, and Q. Huang, “Distributed image understanding with semantic dictionary and semantic expansion,” Neurocomput. 174 (Part A), 384–392 (2016).
36.
go back to reference C.-C. Chang and C.-J. Lin, “LIBSVM: A library for support vector machines,” ACM Trans. Intell. Syst. Technol. 2 (3), 1–27 (2011). C.-C. Chang and C.-J. Lin, “LIBSVM: A library for support vector machines,” ACM Trans. Intell. Syst. Technol. 2 (3), 1–27 (2011).
Metadata
Title
Image Classification Model Using Visual Bag of Semantic Words
Publication date
01-07-2019
Published in
Pattern Recognition and Image Analysis / Issue 3/2019
Print ISSN: 1054-6618
Electronic ISSN: 1555-6212
DOI
https://doi.org/10.1134/S1054661819030222

Other articles of this Issue 3/2019

Pattern Recognition and Image Analysis 3/2019 Go to the issue

REPRESENTATION, PROCESSING, ANALYSIS, AND UNDERSTANDING OF IMAGES

Algebraic Interpretation of Image Analysis Operations

REPRESENTATION, PROCESSING, ANALYSIS, AND UNDERSTANDING OF IMAGES

The Stability and Noise Tolerance of Cartesian Zernike Moments Invariants

Premium Partner