Skip to main content
Top

2016 | OriginalPaper | Chapter

SPLeaP: Soft Pooling of Learned Parts for Image Classification

Authors : Praveen Kulkarni, Frédéric Jurie, Joaquin Zepeda, Patrick Pérez, Louis Chevallier

Published in: Computer Vision – ECCV 2016

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

The aggregation of image statistics – the so-called pooling step of image classification algorithms – as well as the construction of part-based models, are two distinct and well-studied topics in the literature. The former aims at leveraging a whole set of local descriptors that an image can contain (through spatial pyramids or Fisher vectors for instance) while the latter argues that only a few of the regions an image contains are actually useful for its classification. This paper bridges the two worlds by proposing a new pooling framework based on the discovery of useful parts involved in the pooling of local representations. The key contribution lies in a model integrating a boosted non-linear part classifier as well as a parametric soft-max pooling component, both trained jointly with the image classifier. The experimental validation shows that the proposed model not only consistently surpasses standard pooling approaches but also improves over state-of-the-art part-based models, on several different and challenging classification tasks.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
1
Our own implementation of this method achieves results below those reported in [33].
 
Literature
1.
go back to reference Fischler, M.A., Elschlager, R.A.: The representation and matching of pictorial structures. IEEE Trans. Comput. 22(1), 67–92 (1973)CrossRef Fischler, M.A., Elschlager, R.A.: The representation and matching of pictorial structures. IEEE Trans. Comput. 22(1), 67–92 (1973)CrossRef
2.
go back to reference Weber, M., Welling, M., Perona, P.: Towards automatic discovery of object categories. In: IEEE International Conference on Computer Vision and Pattern Recognition (2000) Weber, M., Welling, M., Perona, P.: Towards automatic discovery of object categories. In: IEEE International Conference on Computer Vision and Pattern Recognition (2000)
3.
go back to reference Ullman, S., Sali, E., Vidal-Naquet, M.: A fragment-based approach to object representation and classification. In: Arcelli, C., Cordella, L.P., Sanniti di Baja, G. (eds.) IWVF 2001. LNCS, vol. 2059, pp. 85–100. Springer, Heidelberg (2001). doi:10.1007/3-540-45129-3_7 CrossRef Ullman, S., Sali, E., Vidal-Naquet, M.: A fragment-based approach to object representation and classification. In: Arcelli, C., Cordella, L.P., Sanniti di Baja, G. (eds.) IWVF 2001. LNCS, vol. 2059, pp. 85–100. Springer, Heidelberg (2001). doi:10.​1007/​3-540-45129-3_​7 CrossRef
4.
go back to reference Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part-based models. IEEE Trans. Pattern Anal. Mach. Intell. 32(9), 1627–1645 (2010)CrossRef Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part-based models. IEEE Trans. Pattern Anal. Mach. Intell. 32(9), 1627–1645 (2010)CrossRef
5.
go back to reference Doersch, C., Gupta, A., Efros, A.A.: Mid-level visual element discovery as discriminative mode seeking. In: Proceedings on Neural Information Processing Systems (2013) Doersch, C., Gupta, A., Efros, A.A.: Mid-level visual element discovery as discriminative mode seeking. In: Proceedings on Neural Information Processing Systems (2013)
6.
go back to reference Singh, S., Gupta, A., Efros, A.: Unsupervised discovery of mid-level discriminative patches. In: European Conference on Computer Vision, pp. 73–86 (2012) Singh, S., Gupta, A., Efros, A.: Unsupervised discovery of mid-level discriminative patches. In: European Conference on Computer Vision, pp. 73–86 (2012)
7.
go back to reference Juneja, M., Vedaldi, A., Jawahar, C., Zisserman, A.: Blocks that shout: distinctive parts for scene classification. In: IEEE International Conference on Computer Vision and Pattern Recognition (2013) Juneja, M., Vedaldi, A., Jawahar, C., Zisserman, A.: Blocks that shout: distinctive parts for scene classification. In: IEEE International Conference on Computer Vision and Pattern Recognition (2013)
8.
go back to reference Doersch, C., Singh, S., Gupta, A., Sivic, J., Efros, A.: What makes Paris look like paris? ACM Trans. Graph. 31(4) (2012) Doersch, C., Singh, S., Gupta, A., Sivic, J., Efros, A.: What makes Paris look like paris? ACM Trans. Graph. 31(4) (2012)
9.
go back to reference Parizi, S.N., Vedaldi, A., Zisserman, A., Felzenszwalb, P.: Automatic discovery and optimization of parts for image classification. In: International Conference on Learning Representations (2015) Parizi, S.N., Vedaldi, A., Zisserman, A., Felzenszwalb, P.: Automatic discovery and optimization of parts for image classification. In: International Conference on Learning Representations (2015)
10.
go back to reference Lobel, H., Vidal, R., Soto, A.: Hierarchical joint max-margin learning of mid and top level representations for visual recognition. In: IEEE International Conference on Computer Vision (2013) Lobel, H., Vidal, R., Soto, A.: Hierarchical joint max-margin learning of mid and top level representations for visual recognition. In: IEEE International Conference on Computer Vision (2013)
11.
go back to reference Hoai, M., Zisserman, A.: Improving human action recognition using score distribution and ranking. In: Asian Conference on Computer Vision (2014) Hoai, M., Zisserman, A.: Improving human action recognition using score distribution and ranking. In: Asian Conference on Computer Vision (2014)
12.
go back to reference Mason, L., Baxter, J., Bartlett, P., Frean, M.: Boosting algorithms as gradient descent in function space. In: NIPS (1999) Mason, L., Baxter, J., Bartlett, P., Frean, M.: Boosting algorithms as gradient descent in function space. In: NIPS (1999)
13.
go back to reference Friedman, J., Hastie, T., Tibshirani, R., et al.: Additive logistic regression: a statistical view of boosting (with discussion and a rejoinder by the authors). Ann. Stat. 28(2), 337–407 (2000)CrossRefMATH Friedman, J., Hastie, T., Tibshirani, R., et al.: Additive logistic regression: a statistical view of boosting (with discussion and a rejoinder by the authors). Ann. Stat. 28(2), 337–407 (2000)CrossRefMATH
14.
go back to reference Oquab, M., Bottou, L., Laptev, I., Sivic, J.: Learning and transferring mid-level image representations using convolutional neural networks. In: IEEE International Conference on Computer Vision and Pattern Recognition (2014) Oquab, M., Bottou, L., Laptev, I., Sivic, J.: Learning and transferring mid-level image representations using convolutional neural networks. In: IEEE International Conference on Computer Vision and Pattern Recognition (2014)
15.
go back to reference Chatfield, K., Simonyan, K., Vedaldi, A., Zisserman, A.: Return of the devil in the details: delving deep into convolutional nets. In: British Machine Vision Conference (2014) Chatfield, K., Simonyan, K., Vedaldi, A., Zisserman, A.: Return of the devil in the details: delving deep into convolutional nets. In: British Machine Vision Conference (2014)
16.
go back to reference Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Proceedings on Neural Information Processing Systems (2012) Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Proceedings on Neural Information Processing Systems (2012)
17.
go back to reference Razavian, A.S., Azizpour, H., Sullivan, J., Carlsson, S.: CNN features off-the-shelf: an astounding baseline for recognition. In: Computer Vision and Pattern Recognition Workshops (2014) Razavian, A.S., Azizpour, H., Sullivan, J., Carlsson, S.: CNN features off-the-shelf: an astounding baseline for recognition. In: Computer Vision and Pattern Recognition Workshops (2014)
18.
go back to reference Everingham, M., Eslami, S.M.A., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The pascal visual object classes challenge: a retrospective. Int. J. Comput. Vis. 111(1), 98–136 (2015)CrossRef Everingham, M., Eslami, S.M.A., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The pascal visual object classes challenge: a retrospective. Int. J. Comput. Vis. 111(1), 98–136 (2015)CrossRef
19.
go back to reference Gong, Y., Wang, L., Guo, R., Lazebnik, S.: Multi-scale orderless pooling of deep convolutional activation features. In: European Conference on Computer Vision (2014) Gong, Y., Wang, L., Guo, R., Lazebnik, S.: Multi-scale orderless pooling of deep convolutional activation features. In: European Conference on Computer Vision (2014)
20.
go back to reference Kulkarni, P., Zepeda, J., Jurie, F., Perez, P., Chevallier, L.: Hybrid multi-layer deep cnn/aggregator feature for image classification. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (2015) Kulkarni, P., Zepeda, J., Jurie, F., Perez, P., Chevallier, L.: Hybrid multi-layer deep cnn/aggregator feature for image classification. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (2015)
21.
go back to reference Cimpoi, M., Maji, S., Vedaldi, A.: Deep filter banks for texture recognition and segmentation. In: IEEE International Conference on Computer Vision and Pattern Recognition (2015) Cimpoi, M., Maji, S., Vedaldi, A.: Deep filter banks for texture recognition and segmentation. In: IEEE International Conference on Computer Vision and Pattern Recognition (2015)
22.
go back to reference Jégou, H., Douze, M., Schmid, C., Pérez, P.: Aggregating local descriptors into a compact image representation. In: IEEE International Conference on Computer Vision and Pattern Recognition (2010) Jégou, H., Douze, M., Schmid, C., Pérez, P.: Aggregating local descriptors into a compact image representation. In: IEEE International Conference on Computer Vision and Pattern Recognition (2010)
23.
go back to reference Li, Y., Liu, L., Shen, C., van den Hengel, A.: Mid-level deep pattern mining. In: IEEE International Conference on Computer Vision and Pattern Recognition (2015) Li, Y., Liu, L., Shen, C., van den Hengel, A.: Mid-level deep pattern mining. In: IEEE International Conference on Computer Vision and Pattern Recognition (2015)
24.
go back to reference Sicre, R., Jurie, F.: Discovering and aligning discriminative mid-level features for image classification. In: International Conference on Pattern Recognition (2014) Sicre, R., Jurie, F.: Discovering and aligning discriminative mid-level features for image classification. In: International Conference on Pattern Recognition (2014)
25.
go back to reference Gulcehre, C., Cho, K., Pascanu, R., Bengio, Y.: Learned-norm pooling for deep feedforward and recurrent neural networks. In: Calders, T., Esposito, F., Hüllermeier, E., Meo, R. (eds.) ECML PKDD 2014. LNCS (LNAI), vol. 8724, pp. 530–546. Springer, Heidelberg (2014). doi:10.1007/978-3-662-44848-9_34 Gulcehre, C., Cho, K., Pascanu, R., Bengio, Y.: Learned-norm pooling for deep feedforward and recurrent neural networks. In: Calders, T., Esposito, F., Hüllermeier, E., Meo, R. (eds.) ECML PKDD 2014. LNCS (LNAI), vol. 8724, pp. 530–546. Springer, Heidelberg (2014). doi:10.​1007/​978-3-662-44848-9_​34
26.
go back to reference Goodfellow, I.J., Warde-Farley, D., Mirza, M., Courville, A.C., Bengio, Y.: Maxout networks. ICML 28(3), 1319–1327 (2013) Goodfellow, I.J., Warde-Farley, D., Mirza, M., Courville, A.C., Bengio, Y.: Maxout networks. ICML 28(3), 1319–1327 (2013)
27.
go back to reference Lee, C.Y., Gallagher, P.W., Tu, Z.: Generalizing pooling functions in convolutional neural networks: mixed, gated, and tree. In: International Conference on Artificial Intelligence and Statistics (2016) Lee, C.Y., Gallagher, P.W., Tu, Z.: Generalizing pooling functions in convolutional neural networks: mixed, gated, and tree. In: International Conference on Artificial Intelligence and Statistics (2016)
28.
go back to reference Quattoni, A., Torralba, A.: Recognizing indoor scenes. In: IEEE International Conference on Computer Vision and Pattern Recognition (2009) Quattoni, A., Torralba, A.: Recognizing indoor scenes. In: IEEE International Conference on Computer Vision and Pattern Recognition (2009)
29.
go back to reference Delaitre, V., Laptev, I., Sivic, J.: Recognizing human actions in still images: a study of bag-of-features and part-based representations. In: British Machine Vision Conference (2010) Delaitre, V., Laptev, I., Sivic, J.: Recognizing human actions in still images: a study of bag-of-features and part-based representations. In: British Machine Vision Conference (2010)
30.
go back to reference van de Sande, K.E.A., Uijlings, J.R.R., Gevers, T., Smeulders, A.W.M.: Segmentation as selective search for object recognition. In: IEEE International Conference on Computer Vision (2011) van de Sande, K.E.A., Uijlings, J.R.R., Gevers, T., Smeulders, A.W.M.: Segmentation as selective search for object recognition. In: IEEE International Conference on Computer Vision (2011)
31.
32.
go back to reference Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: convolutional architecture for fast feature embedding. In: Proceedings of the 22nd ACM International Conference on Multimedia (2014) Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: convolutional architecture for fast feature embedding. In: Proceedings of the 22nd ACM International Conference on Multimedia (2014)
33.
go back to reference Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:​1409.​1556 (2014)
34.
go back to reference Zhou, B., Lapedriza, A., Xiao, J., Torralba, A., Oliva, A.: Learning deep features for scene recognition using places database. In: Proceedings on Neural Information Processing Systems (2014) Zhou, B., Lapedriza, A., Xiao, J., Torralba, A., Oliva, A.: Learning deep features for scene recognition using places database. In: Proceedings on Neural Information Processing Systems (2014)
35.
go back to reference Kulkarni, P., Zepeda, J., Jurie, F., Perez, P., Chevallier, L.: Max-margin, single-layer adaptation of transferred image features. In: BigVision Workshop, Computer Vision and Pattern Recognition (2015) Kulkarni, P., Zepeda, J., Jurie, F., Perez, P., Chevallier, L.: Max-margin, single-layer adaptation of transferred image features. In: BigVision Workshop, Computer Vision and Pattern Recognition (2015)
36.
go back to reference He, K., Zhang, X., Ren, S., Sun, J.: Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 37(9), 1904–1916 (2015)CrossRef He, K., Zhang, X., Ren, S., Sun, J.: Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 37(9), 1904–1916 (2015)CrossRef
37.
go back to reference Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: IEEE International Conference on Computer Vision and Pattern Recognition (2015) Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: IEEE International Conference on Computer Vision and Pattern Recognition (2015)
38.
go back to reference Khan, F.S., Anwer, R.M., van de Weijer, J., Bagdanov, A.D., Lopez, A.M., Felsberg, M.: Coloring action recognition in still images. Int. J. Comput. Vis. 105(3), 205–221 (2013)CrossRef Khan, F.S., Anwer, R.M., van de Weijer, J., Bagdanov, A.D., Lopez, A.M., Felsberg, M.: Coloring action recognition in still images. Int. J. Comput. Vis. 105(3), 205–221 (2013)CrossRef
39.
go back to reference Sharma, G., Jurie, F., Schmid, C.: Discriminative spatial saliency for image classification. In: IEEE International Conference on Computer Vision and Pattern Recognition (2012) Sharma, G., Jurie, F., Schmid, C.: Discriminative spatial saliency for image classification. In: IEEE International Conference on Computer Vision and Pattern Recognition (2012)
40.
go back to reference Sharma, G., Jurie, F., Schmid, C.: Expanded parts model for human attribute and action recognition in still images. In: IEEE International Conference on Computer Vision and Pattern Recognition (2013) Sharma, G., Jurie, F., Schmid, C.: Expanded parts model for human attribute and action recognition in still images. In: IEEE International Conference on Computer Vision and Pattern Recognition (2013)
Metadata
Title
SPLeaP: Soft Pooling of Learned Parts for Image Classification
Authors
Praveen Kulkarni
Frédéric Jurie
Joaquin Zepeda
Patrick Pérez
Louis Chevallier
Copyright Year
2016
DOI
https://doi.org/10.1007/978-3-319-46484-8_20

Premium Partner