Skip to main content
Erschienen in: Soft Computing 18/2019

22.08.2018 | Methodologies and Application

Deep sparse representation-based mid-level visual elements discovery in fine-grained classification

verfasst von: Le Lv, Dongbin Zhao, Kun Shao

Erschienen in: Soft Computing | Ausgabe 18/2019

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In this paper, we propose a new mid-level visual elements discovery method and apply it to the fine-grained classification. We present the duality between image patches and features extracted by the convolutional winner-take-all autoencoder (CONV-WTA-AE). The sparsity constraints used by CONV-WTA-AE make a group of objects sharing the same feature components. Hence, the image patches could be clustered by their sharing feature components and the feature components can be clustered by their co-occurrence in the image patches. We propose formulating the mid-level visual elements mining as a bipartite graph partitioning problem. The spectral partitioning algorithm is employed to co-cluster image patches and feature components. The CONV-WTA-AE is an unsupervised feature learning method. Hence, it avoids using expensive annotations. Our experiments demonstrate that the spectral partitioning method is very efficient but only the confident instances in a cluster are well discriminated. The similarity metric used by this algorithm is not accurate enough. Hence, we propose training a group of linear support vector machine (SVM) to refine the clustering results. These SVMs will be trained on the initial confident instances and provide a better discriminative similarity. Then we can re-assign instances to each clusters. To avoid overfitting, this process is iterated on many data subsets. We conduct a series of experiments on the MNIST dataset to verify our algorithm. The experimental results show that our method can discover meaningful image patch clusters. In the fine-grained classification task, visual elements are input into an ensemble of convolutional neural networks. The experiments on the CompCars dataset illustrate that our method can achieve the state-of-the-art performance.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Alpert CJ, Yao S-Z (1995) Spectral partitioning: The more eigenvectors, the better. In: Proceedings of the 32st Conference on Design Automation, San Francisco, California, USA, Moscone Center, June 12–16, 1995., pp 195–200 Alpert CJ, Yao S-Z (1995) Spectral partitioning: The more eigenvectors, the better. In: Proceedings of the 32st Conference on Design Automation, San Francisco, California, USA, Moscone Center, June 12–16, 1995., pp 195–200
Zurück zum Zitat Bengio Y, Courville AC, Vincent P (2012) Unsupervised feature learning and deep learning:a review and new perspectives. CoRR, abs/1206.5538 Bengio Y, Courville AC, Vincent P (2012) Unsupervised feature learning and deep learning:a review and new perspectives. CoRR, abs/1206.5538
Zurück zum Zitat Chen Y, Zhao D, Lv L, Zhang Q (2018) Multi-task learning for dangerous object detection in autonomous driving. Inf Sci 432:559–571CrossRef Chen Y, Zhao D, Lv L, Zhang Q (2018) Multi-task learning for dangerous object detection in autonomous driving. Inf Sci 432:559–571CrossRef
Zurück zum Zitat Chen Y, Zhao D, Li H, Guo P (2018) A temporal-based deep learning method for multiple objects detection in autonomous driving. In: 2018 international joint conference on neural networks (IJCNN) Chen Y, Zhao D, Li H, Guo P (2018) A temporal-based deep learning method for multiple objects detection in autonomous driving. In: 2018 international joint conference on neural networks (IJCNN)
Zurück zum Zitat Coates A, Ng AY, Lee H (2011) An analysis of single-layer networks in unsupervised feature learning. In: proceedings of the fourteenth international conference on artificial intelligence and statistics, aistats 2011, Fort Lauderdale, USA, April 11–13, 2011, pp 215–223 Coates A, Ng AY, Lee H (2011) An analysis of single-layer networks in unsupervised feature learning. In: proceedings of the fourteenth international conference on artificial intelligence and statistics, aistats 2011, Fort Lauderdale, USA, April 11–13, 2011, pp 215–223
Zurück zum Zitat Dhillon IS (2001) Co-clustering documents and words using bipartite spectral graph partitioning. In: ACM SIGKDD international conference on knowledge discovery and data mining, pp 269–274 Dhillon IS (2001) Co-clustering documents and words using bipartite spectral graph partitioning. In: ACM SIGKDD international conference on knowledge discovery and data mining, pp 269–274
Zurück zum Zitat Doersch C, Gupta A, Efros AA (2013) Mid-level visual element discovery as discriminative mode seeking. In: Advances in neural information processing systems 26: 27th annual conference on neural information processing systems 2013. Proceedings of a meeting held December 5-8, 2013, Lake Tahoe, Nevada, United States., pp 494–502 Doersch C, Gupta A, Efros AA (2013) Mid-level visual element discovery as discriminative mode seeking. In: Advances in neural information processing systems 26: 27th annual conference on neural information processing systems 2013. Proceedings of a meeting held December 5-8, 2013, Lake Tahoe, Nevada, United States., pp 494–502
Zurück zum Zitat Erhan D, Bengio Y, Courville AC, Manzagol P-A, Vincent P, Bengio S (2010) Why does unsupervised pre-training help deep learning? J Mach Learn Res 11:625–660MathSciNetMATH Erhan D, Bengio Y, Courville AC, Manzagol P-A, Vincent P, Bengio S (2010) Why does unsupervised pre-training help deep learning? J Mach Learn Res 11:625–660MathSciNetMATH
Zurück zum Zitat He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: 2016 IEEE conference on computer vision and pattern recognition, CVPR 2016, Las Vegas, NV, USA, June 27–30, 2016, pp 770–778 He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: 2016 IEEE conference on computer vision and pattern recognition, CVPR 2016, Las Vegas, NV, USA, June 27–30, 2016, pp 770–778
Zurück zum Zitat Jianbo S, Jitendra M (1997) Normalized cuts and image segmentation. In: 1997 conference on computer vision and pattern recognition (CVPR ’97), June 17–19, 1997. San Juan, Puerto Rico, pp 731–737 Jianbo S, Jitendra M (1997) Normalized cuts and image segmentation. In: 1997 conference on computer vision and pattern recognition (CVPR ’97), June 17–19, 1997. San Juan, Puerto Rico, pp 731–737
Zurück zum Zitat Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2014) Caffe: convolutional architecture for fast feature embedding. arXiv preprint arXiv:1408.5093 Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2014) Caffe: convolutional architecture for fast feature embedding. arXiv preprint arXiv:​1408.​5093
Zurück zum Zitat Kavukcuoglu K, Ranzato MA, LeCun Y (2010) Fast inference in sparse coding algorithms with applications to object recognition. CoRR, abs/1010.3467 Kavukcuoglu K, Ranzato MA, LeCun Y (2010) Fast inference in sparse coding algorithms with applications to object recognition. CoRR, abs/1010.3467
Zurück zum Zitat Kingma DP, Welling M (2013) Auto-encoding variational bayes. CoRR, abs/1312.6114 Kingma DP, Welling M (2013) Auto-encoding variational bayes. CoRR, abs/1312.6114
Zurück zum Zitat Lecun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324CrossRef Lecun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324CrossRef
Zurück zum Zitat Li Y, Liu L, Shen C, van den Hengel A (2015) Mid-level deep pattern mining. In: IEEE conference on computer vision and pattern recognition, CVPR 2015, Boston, MA, USA, June 7-12, 2015, pp 971–980 Li Y, Liu L, Shen C, van den Hengel A (2015) Mid-level deep pattern mining. In: IEEE conference on computer vision and pattern recognition, CVPR 2015, Boston, MA, USA, June 7-12, 2015, pp 971–980
Zurück zum Zitat Li J, Liu G, Wong L (2007) Mining statistically important equivalence classes and delta-discriminative emerging patterns. In: Proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining, San Jose, California, USA, August 12–15, 2007, pp 430–439 Li J, Liu G, Wong L (2007) Mining statistically important equivalence classes and delta-discriminative emerging patterns. In: Proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining, San Jose, California, USA, August 12–15, 2007, pp 430–439
Zurück zum Zitat Li D, Zhao D, Chen Y, Zhang Q (2018) Deepsign: Deep learning based traffic sign recognition. In: 2018 international joint conference on neural networks (IJCNN), July 2018 Li D, Zhao D, Chen Y, Zhang Q (2018) Deepsign: Deep learning based traffic sign recognition. In: 2018 international joint conference on neural networks (IJCNN), July 2018
Zurück zum Zitat Lv L, Zhao D, Deng Q (2016) A semi-supervised predictive sparse decomposition based on task-driven dictionary learning. Cognit Comput, pp 1–10 Lv L, Zhao D, Deng Q (2016) A semi-supervised predictive sparse decomposition based on task-driven dictionary learning. Cognit Comput, pp 1–10
Zurück zum Zitat Maas AL, Hannun AY, Ng AY (2013) Rectifier nonlinearities improve neural network acoustic models. In: ICML workshop on deep learning for audio, speech and language processing Maas AL, Hannun AY, Ng AY (2013) Rectifier nonlinearities improve neural network acoustic models. In: ICML workshop on deep learning for audio, speech and language processing
Zurück zum Zitat Makhzani A, Frey BJ (2015) Winner-take-all autoencoders. In: Advances in neural information processing systems 28: annual conference on neural information processing systems 2015, December 7–12, 2015, Montreal, Quebec, Canada, pp 2791–2799 Makhzani A, Frey BJ (2015) Winner-take-all autoencoders. In: Advances in neural information processing systems 28: annual conference on neural information processing systems 2015, December 7–12, 2015, Montreal, Quebec, Canada, pp 2791–2799
Zurück zum Zitat Malisiewicz T, Gupta A, Efros AA (2011) Ensemble of exemplar-svms for object detection and beyond. In: IEEE international conference on computer vision, ICCV 2011, Barcelona, Spain, November 6–13, 2011, pp 89–96 Malisiewicz T, Gupta A, Efros AA (2011) Ensemble of exemplar-svms for object detection and beyond. In: IEEE international conference on computer vision, ICCV 2011, Barcelona, Spain, November 6–13, 2011, pp 89–96
Zurück zum Zitat Moon H-M, Seo C-H, Pan SB (2017) A face recognition system based on convolution neural network using multiple distance face. Soft Comput 21(17):4995–5002CrossRef Moon H-M, Seo C-H, Pan SB (2017) A face recognition system based on convolution neural network using multiple distance face. Soft Comput 21(17):4995–5002CrossRef
Zurück zum Zitat Ng AY, Jordan MI, Weiss Y (2001) On spectral clustering: Analysis and an algorithm. In: advances in neural information processing systems 14 [Neural Information Processing Systems: Natural and Synthetic, NIPS 2001, December 3–8, 2001, Vancouver, British Columbia, Canada], pp 849–856 Ng AY, Jordan MI, Weiss Y (2001) On spectral clustering: Analysis and an algorithm. In: advances in neural information processing systems 14 [Neural Information Processing Systems: Natural and Synthetic, NIPS 2001, December 3–8, 2001, Vancouver, British Columbia, Canada], pp 849–856
Zurück zum Zitat Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: machine learning in Python. J Mach Learn Res 12:2825–2830MathSciNetMATH Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: machine learning in Python. J Mach Learn Res 12:2825–2830MathSciNetMATH
Zurück zum Zitat Rafique MA, Pedrycz W, Jeon M (2017) Vehicle license plate detection using region-based convolutional neural networks. Soft Comput Rafique MA, Pedrycz W, Jeon M (2017) Vehicle license plate detection using region-based convolutional neural networks. Soft Comput
Zurück zum Zitat Sanja F, Gregor B, Ales L (2006) Hierarchical statistical learning of generic parts of object structure. In: 2006 IEEE computer society conference on computer vision and pattern recognition (CVPR 2006), 17–22 June 2006, New York, NY, USA, pp 182–189 Sanja F, Gregor B, Ales L (2006) Hierarchical statistical learning of generic parts of object structure. In: 2006 IEEE computer society conference on computer vision and pattern recognition (CVPR 2006), 17–22 June 2006, New York, NY, USA, pp 182–189
Zurück zum Zitat Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. CoRR, abs/1409.1556 Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. CoRR, abs/1409.1556
Zurück zum Zitat Singh Saurabh, Gupta Abhinav, Efros Alexei A (2012) Unsupervised discovery of mid-level discriminative patches. In: Computer Vision-ECCV 2012-12th european conference on computer vision, Florence, Italy, October 7–13, 2012, Proceedings, Part II, pages 73–86 Singh Saurabh, Gupta Abhinav, Efros Alexei A (2012) Unsupervised discovery of mid-level discriminative patches. In: Computer Vision-ECCV 2012-12th european conference on computer vision, Florence, Italy, October 7–13, 2012, Proceedings, Part II, pages 73–86
Zurück zum Zitat Spielman DA, Teng S-H (1996) Spectral partitioning works: Planar graphs and finite element meshes. In: 37th annual symposium on foundations of computer science, FOCS ’96, Burlington, Vermont, USA, 14–16 October, 1996, pp 96–105 Spielman DA, Teng S-H (1996) Spectral partitioning works: Planar graphs and finite element meshes. In: 37th annual symposium on foundations of computer science, FOCS ’96, Burlington, Vermont, USA, 14–16 October, 1996, pp 96–105
Zurück zum Zitat Xiao T, Xu Y, Yang K, Zhang J, Peng Y, Zhang Z (2015) The application of two-level attention models in deep convolutional neural network for fine-grained image classification. In: IEEE conference on computer vision and pattern recognition, CVPR 2015, Boston, MA, USA, June 7–12, 2015, pp 842–850 Xiao T, Xu Y, Yang K, Zhang J, Peng Y, Zhang Z (2015) The application of two-level attention models in deep convolutional neural network for fine-grained image classification. In: IEEE conference on computer vision and pattern recognition, CVPR 2015, Boston, MA, USA, June 7–12, 2015, pp 842–850
Zurück zum Zitat Yang L, Luo P, Loy CC, Tang X (2015) A large-scale car dataset for fine-grained categorization and verification. In: IEEE conference on computer vision and pattern recognition, CVPR 2015, Boston, MA, USA, June 7–12, 2015, pp 3973–3981 Yang L, Luo P, Loy CC, Tang X (2015) A large-scale car dataset for fine-grained categorization and verification. In: IEEE conference on computer vision and pattern recognition, CVPR 2015, Boston, MA, USA, June 7–12, 2015, pp 3973–3981
Zurück zum Zitat Zeiler MD, Fergus R (2014) Visualizing and understanding convolutional networks. In: Computer vision - ECCV 2014-13th European conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part I, pp 818–833 Zeiler MD, Fergus R (2014) Visualizing and understanding convolutional networks. In: Computer vision - ECCV 2014-13th European conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part I, pp 818–833
Zurück zum Zitat Zhao D, Chen Y, Lv L (2017) Deep reinforcement learning with visual attention for vehicle classification. IEEE Trans Cognit Dev Syst 9(4):356–367CrossRef Zhao D, Chen Y, Lv L (2017) Deep reinforcement learning with visual attention for vehicle classification. IEEE Trans Cognit Dev Syst 9(4):356–367CrossRef
Zurück zum Zitat Zhao X, Zhang Q, Zhao D, Pange Z (2018) Overview of image segmentation and its application on free space detection. In: 2018 IEEE 7th data driven control and learning systems conference Zhao X, Zhang Q, Zhao D, Pange Z (2018) Overview of image segmentation and its application on free space detection. In: 2018 IEEE 7th data driven control and learning systems conference
Metadaten
Titel
Deep sparse representation-based mid-level visual elements discovery in fine-grained classification
verfasst von
Le Lv
Dongbin Zhao
Kun Shao
Publikationsdatum
22.08.2018
Verlag
Springer Berlin Heidelberg
Erschienen in
Soft Computing / Ausgabe 18/2019
Print ISSN: 1432-7643
Elektronische ISSN: 1433-7479
DOI
https://doi.org/10.1007/s00500-018-3468-3

Weitere Artikel der Ausgabe 18/2019

Soft Computing 18/2019 Zur Ausgabe