Skip to main content
Top
Published in: International Journal of Computer Vision 3/2017

29-08-2016

Mining Mid-level Visual Patterns with Deep CNN Activations

Authors: Yao Li, Lingqiao Liu, Chunhua Shen, Anton van den Hengel

Published in: International Journal of Computer Vision | Issue 3/2017

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

The purpose of mid-level visual element discovery is to find clusters of image patches that are representative of, and which discriminate between, the contents of the relevant images. Here we propose a pattern-mining approach to the problem of identifying mid-level elements within images, motivated by the observation that such techniques have been very effective, and efficient, in achieving similar goals when applied to other data types. We show that Convolutional Neural Network (CNN) activations extracted from image patches typical possess two appealing properties that enable seamless integration with pattern mining techniques. The marriage between CNN activations and a pattern mining technique leads to fast and effective discovery of representative and discriminative patterns from a huge number of image patches, from which mid-level elements are retrieved. Given the patterns and retrieved mid-level visual elements, we propose two methods to generate image feature representations. The first encoding method uses the patterns as codewords in a dictionary in a manner similar to the Bag-of-Visual-Words model. We thus label this a Bag-of-Patterns representation. The second relies on mid-level visual elements to construct a Bag-of-Elements representation. We evaluate the two encoding methods on object and scene classification tasks, and demonstrate that our approach outperforms or matches the performance of the state-of-the-arts on these tasks.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Footnotes
1
Answer key: 1. aeroplane, 2. train, 3. cow, 4. motorbike, 5. bike, 6. sofa.
 
Literature
go back to reference Agarwal, A., & Triggs, B. (2008). Multilevel image coding with hyperfeatures. International Journal of Computer Vision, 78(1), 15–27.CrossRef Agarwal, A., & Triggs, B. (2008). Multilevel image coding with hyperfeatures. International Journal of Computer Vision, 78(1), 15–27.CrossRef
go back to reference Agrawal, P., Girshick, R., & Malik, J. (2014). Analyzing the performance of multilayer neural networks for object recognition. In Proceedings European Conference on Computer Vision, (pp. 329–344). Agrawal, P., Girshick, R., & Malik, J. (2014). Analyzing the performance of multilayer neural networks for object recognition. In Proceedings European Conference on Computer Vision, (pp. 329–344).
go back to reference Agrawal, R., & Srikant, R. (1994). Fast algorithms for mining association rules in large databases. In Proceedings International Conference Very Large Databases, (pp. 487–499). Agrawal, R., & Srikant, R. (1994). Fast algorithms for mining association rules in large databases. In Proceedings International Conference Very Large Databases, (pp. 487–499).
go back to reference Aubry, M., Maturana, D., Efros, A. A., Russell, B. C., Sivic, J. (2014a) Seeing 3d chairs: exemplar part-based 2d-3d alignment using a large dataset of cad models. In Proceedings of IEEE Conference on Computer Vision Pattern Recognition, (pp. 3762–3769). Aubry, M., Maturana, D., Efros, A. A., Russell, B. C., Sivic, J. (2014a) Seeing 3d chairs: exemplar part-based 2d-3d alignment using a large dataset of cad models. In Proceedings of IEEE Conference on Computer Vision Pattern Recognition, (pp. 3762–3769).
go back to reference Aubry, M., Russell, B. C., & Sivic, J. (2014b). Painting-to-3d model alignment via discriminative visual elements. In Proceedings Annual ACM SIGIR Conference, 33(2), p. 14. Aubry, M., Russell, B. C., & Sivic, J. (2014b). Painting-to-3d model alignment via discriminative visual elements. In Proceedings Annual ACM SIGIR Conference, 33(2), p. 14.
go back to reference Azizpour, H., Razavian, A. S., Sullivan, J., Maki, A., & Carlsson, S. (2016). Factors of transferability for a generic convnet representation. IEEE Transactions Pattern Analysis and Machine Intelligence, 38(9),1790–1802. Azizpour, H., Razavian, A. S., Sullivan, J., Maki, A., & Carlsson, S. (2016). Factors of transferability for a generic convnet representation. IEEE Transactions Pattern Analysis and Machine Intelligence, 38(9),1790–1802.
go back to reference Borgelt, C. (2012). Frequent item set mining. Wiley Interdisc Review: Data Mining and Knowledge Discovery, 2(6), 437–456. Borgelt, C. (2012). Frequent item set mining. Wiley Interdisc Review: Data Mining and Knowledge Discovery, 2(6), 437–456.
go back to reference Bossard, L., Guillaumin, M., & Gool, L. V. (2014). Food-101 mining discriminative components with random forests. In Proceedings European Conference on Computer Vision, (pp. 446–461). Bossard, L., Guillaumin, M., & Gool, L. V. (2014). Food-101 mining discriminative components with random forests. In Proceedings European Conference on Computer Vision, (pp. 446–461).
go back to reference Bourdev, L. D., & Malik, J. (2009). Poselets: Body part detectors trained using 3d human pose annotations. In Proceedings IEEE International Conference on Computer Vision, (pp. 1365–1372). Bourdev, L. D., & Malik, J. (2009). Poselets: Body part detectors trained using 3d human pose annotations. In Proceedings IEEE International Conference on Computer Vision, (pp. 1365–1372).
go back to reference Bourdev, L. D., Maji, S., Brox, T., & Malik, J. (2010). Detecting people using mutually consistent poselet activations. In Proceeding European Conference on Computer Vision, (pp. 168–181). Bourdev, L. D., Maji, S., Brox, T., & Malik, J. (2010). Detecting people using mutually consistent poselet activations. In Proceeding European Conference on Computer Vision, (pp. 168–181).
go back to reference Bourdev, L. D., Maji, S., & Malik, J. (2011). Describing people: A poselet-based approach to attribute classification. In Proceedings IEEE International Conference on Computer Vision, (pp. 1543–1550). Bourdev, L. D., Maji, S., & Malik, J. (2011). Describing people: A poselet-based approach to attribute classification. In Proceedings IEEE International Conference on Computer Vision, (pp. 1543–1550).
go back to reference Boureau, Y., Bach, F. R., LeCun, Y., & Ponce, J. (2010). Learning mid-level features for recognition. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (pp. 2559–2566). Boureau, Y., Bach, F. R., LeCun, Y., & Ponce, J. (2010). Learning mid-level features for recognition. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (pp. 2559–2566).
go back to reference Chatfield, K., Simonyan, K., Vedaldi, A., & Zisserman, A. (2014). Return of the devil in the details: Delving deep into convolutional nets. In Proceedings British Machine Vision Conference. Chatfield, K., Simonyan, K., Vedaldi, A., & Zisserman, A. (2014). Return of the devil in the details: Delving deep into convolutional nets. In Proceedings British Machine Vision Conference.
go back to reference Cheng, H., Yan, X., Han, J., & Yu, P. S. (2008). Direct discriminative pattern mining for effective classification. In Proceedings IEEE International Conference on Data Engineering, (pp. 169–178). Cheng, H., Yan, X., Han, J., & Yu, P. S. (2008). Direct discriminative pattern mining for effective classification. In Proceedings IEEE International Conference on Data Engineering, (pp. 169–178).
go back to reference Choi, M. J., Torralba, A., & Willsky, A. S. (2012). A tree-based context model for object recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(2), 240–252.CrossRef Choi, M. J., Torralba, A., & Willsky, A. S. (2012). A tree-based context model for object recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(2), 240–252.CrossRef
go back to reference Cimpoi, M., Maji, S., & Vedaldi, A. (2015). Deep filter banks for texture recognition and segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (pp. 3828–3836). Cimpoi, M., Maji, S., & Vedaldi, A. (2015). Deep filter banks for texture recognition and segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (pp. 3828–3836).
go back to reference Cimpoi, M., Maji, S., Kokkinos, I., & Vedaldi, A. (2016). Deep filter banks for texture recognition, description, and segmentation. International Journal of Computer Vision, 118(1), 65–94.MathSciNetCrossRef Cimpoi, M., Maji, S., Kokkinos, I., & Vedaldi, A. (2016). Deep filter banks for texture recognition, description, and segmentation. International Journal of Computer Vision, 118(1), 65–94.MathSciNetCrossRef
go back to reference Courbariaux, M., & Bengio, Y. (2016). Binarynet: Training deep neural networks with weights and activations constrained to +1 or -1. arXiv preprint arXiv:1602.02830 Courbariaux, M., & Bengio, Y. (2016). Binarynet: Training deep neural networks with weights and activations constrained to +1 or -1. arXiv preprint arXiv:​1602.​02830
go back to reference Crowley, E., & Zisserman, A. (2014). The state of the art: Object retrieval in paintings using discriminative regions. In Proceedings British Machine Vision Conference. Crowley, E., & Zisserman, A. (2014). The state of the art: Object retrieval in paintings using discriminative regions. In Proceedings British Machine Vision Conference.
go back to reference Deng, J., Dong, W., Socher, R., Li, L. J., Li, K., & Li, F. F. (2009). Imagenet: A large-scale hierarchical image database. In Proceedings IEEE Conference on Computer Vision and Pattern Recognition, (pp. 248–255). Deng, J., Dong, W., Socher, R., Li, L. J., Li, K., & Li, F. F. (2009). Imagenet: A large-scale hierarchical image database. In Proceedings IEEE Conference on Computer Vision and Pattern Recognition, (pp. 248–255).
go back to reference Diba, A., Pazandeh, A. M., Pirsiavash, H., & Gool, L. V. (2016). Deepcamp: Deep convolutional action & attribute mid-level patterns. In Proceedings IEEE Conference on Computer Vision and Pattern Recognition. Diba, A., Pazandeh, A. M., Pirsiavash, H., & Gool, L. V. (2016). Deepcamp: Deep convolutional action & attribute mid-level patterns. In Proceedings IEEE Conference on Computer Vision and Pattern Recognition.
go back to reference Divvala, S. K., Hoiem, D., Hays, J., Efros, A. A., Hebert, M. (2009). An empirical study of context in object detection. In Proceedings IEEE Conference on Computer Vision and Pattern Recognition, (pp. 1271–1278). Divvala, S. K., Hoiem, D., Hays, J., Efros, A. A., Hebert, M. (2009). An empirical study of context in object detection. In Proceedings IEEE Conference on Computer Vision and Pattern Recognition, (pp. 1271–1278).
go back to reference Doersch, C., Singh, S., Gupta, A., Sivic, J., & Efros, A. A. (2012). What makes paris look like paris? In Proceedings Annual International ACM SIGIR Conference, 31(4), p. 101. Doersch, C., Singh, S., Gupta, A., Sivic, J., & Efros, A. A. (2012). What makes paris look like paris? In Proceedings Annual International ACM SIGIR Conference, 31(4), p. 101.
go back to reference Doersch, C., Gupta, A., & Efros, A. A. (2013). Mid-level visual element discovery as discriminative mode seeking. In Proceedings Advances in Neural Information Processing Systems, (pp. 494–502). Doersch, C., Gupta, A., & Efros, A. A. (2013). Mid-level visual element discovery as discriminative mode seeking. In Proceedings Advances in Neural Information Processing Systems, (pp. 494–502).
go back to reference Dosovitskiy, A., & Brox, T. (2016). Inverting visual representations with convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Dosovitskiy, A., & Brox, T. (2016). Inverting visual representations with convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.
go back to reference Endres, I., Shih, K. J., Jiaa, J., & Hoiem, D. (2013). Learning collections of part models for object recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (pp. 939–946). Endres, I., Shih, K. J., Jiaa, J., & Hoiem, D. (2013). Learning collections of part models for object recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (pp. 939–946).
go back to reference Everingham, M., Gool, L. J. V., Williams, C. K. I., Winn, J. M., & Zisserman, A. (2010). The pascal visual object classes (VOC) challenge. International Journal of Computer Vision, 88(2), 303–338.CrossRef Everingham, M., Gool, L. J. V., Williams, C. K. I., Winn, J. M., & Zisserman, A. (2010). The pascal visual object classes (VOC) challenge. International Journal of Computer Vision, 88(2), 303–338.CrossRef
go back to reference Everingham, M., Eslami, S. M. A., Gool, L. V., Williams, C. K. I., Winn, J. M., & Zisserman, A. (2015). The pascal visual object classes challenge: A retrospective. International Journal of Computer Vision, 111(1), 98–136.CrossRef Everingham, M., Eslami, S. M. A., Gool, L. V., Williams, C. K. I., Winn, J. M., & Zisserman, A. (2015). The pascal visual object classes challenge: A retrospective. International Journal of Computer Vision, 111(1), 98–136.CrossRef
go back to reference Fan, R. E., Chang, K. W., Hsieh, C. J., Wang, X. R., & Lin, C. J. (2008). Liblinear: A library for large linear classification. Journal of Machine Learning Research, 9, 1871–1874.MATH Fan, R. E., Chang, K. W., Hsieh, C. J., Wang, X. R., & Lin, C. J. (2008). Liblinear: A library for large linear classification. Journal of Machine Learning Research, 9, 1871–1874.MATH
go back to reference Felzenszwalb, P. F., Girshick, R. B., McAllester, D. A., & Ramanan, D. (2010). Object detection with discriminatively trained part-based models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(9), 1627–1645.CrossRef Felzenszwalb, P. F., Girshick, R. B., McAllester, D. A., & Ramanan, D. (2010). Object detection with discriminatively trained part-based models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(9), 1627–1645.CrossRef
go back to reference Fernando, B., & Tuytelaars, T. (2013). Mining multiple queries for image retrieval: On-the-fly learning of an object-specific mid-level representation. In Proceedings of IEEE International Conference on Computer Vision, (pp. 2544–2551). Fernando, B., & Tuytelaars, T. (2013). Mining multiple queries for image retrieval: On-the-fly learning of an object-specific mid-level representation. In Proceedings of IEEE International Conference on Computer Vision, (pp. 2544–2551).
go back to reference Fernando, B., Fromont, É., & Tuytelaars, T. (2012). Effective use of frequent itemset mining for image classification. In Proceedings of European Conference on Computer Vision, (pp. 214–227). Fernando, B., Fromont, É., & Tuytelaars, T. (2012). Effective use of frequent itemset mining for image classification. In Proceedings of European Conference on Computer Vision, (pp. 214–227).
go back to reference Fernando, B., Fromont, É., & Tuytelaars, T. (2014). Mining mid-level features for image classification. International Journal of Computer Vision, 108(3), 186–203.MathSciNetCrossRef Fernando, B., Fromont, É., & Tuytelaars, T. (2014). Mining mid-level features for image classification. International Journal of Computer Vision, 108(3), 186–203.MathSciNetCrossRef
go back to reference Fouhey, D. F., Gupta, A., & Hebert, M. (2013). Data-driven 3d primitives for single image understanding. In Proceedings of IEEE International Conference on Computer Vision, (pp. 3392–3399). Fouhey, D. F., Gupta, A., & Hebert, M. (2013). Data-driven 3d primitives for single image understanding. In Proceedings of IEEE International Conference on Computer Vision, (pp. 3392–3399).
go back to reference Fouhey, D. F., Hussain, W., Gupta, A., & Hebert, M. (2015). Single image 3d without a single 3d image. In Proceedings of IEEE International Conference on Computer Vision, (pp. 1053–1061). Fouhey, D. F., Hussain, W., Gupta, A., & Hebert, M. (2015). Single image 3d without a single 3d image. In Proceedings of IEEE International Conference on Computer Vision, (pp. 1053–1061).
go back to reference Gao, Y., Beijbom, O., Zhang, N., & Darrell, T. (2010). Compact bilinear pooling. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 317–326). Gao, Y., Beijbom, O., Zhang, N., & Darrell, T. (2010). Compact bilinear pooling. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 317–326).
go back to reference Gilbert, A., & Bowden, R. (2014). Data mining for action recognition. In Proceedings of Asian Conference on Computer Vision, (pp. 290–303). Gilbert, A., & Bowden, R. (2014). Data mining for action recognition. In Proceedings of Asian Conference on Computer Vision, (pp. 290–303).
go back to reference Gilbert, A., Illingworth, J., & Bowden, R. (2011). Action recognition using mined hierarchical compound features. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(5), 883–897.CrossRef Gilbert, A., Illingworth, J., & Bowden, R. (2011). Action recognition using mined hierarchical compound features. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(5), 883–897.CrossRef
go back to reference Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (pp. 580–587). Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (pp. 580–587).
go back to reference Girshick, R. B., Donahue, J., Darrell, T., & Malik, J. (2016). Region-based convolutional networks for accurate object detection and segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(1), 142–158.CrossRef Girshick, R. B., Donahue, J., Darrell, T., & Malik, J. (2016). Region-based convolutional networks for accurate object detection and segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(1), 142–158.CrossRef
go back to reference Gong, Y., Wang, L., Guo, R., & Lazebnik, S. (2014). Multi-scale orderless pooling of deep convolutional activation features. In Proceedings of European Conference on Computer Vision, (pp. 392–407). Gong, Y., Wang, L., Guo, R., & Lazebnik, S. (2014). Multi-scale orderless pooling of deep convolutional activation features. In Proceedings of European Conference on Computer Vision, (pp. 392–407).
go back to reference Grahne, G., & Zhu, J. (2005). Fast algorithms for frequent itemset mining using fp-trees. IEEE Transactions on Knowledge and Data Engineering, 17(10), 1347–1362.CrossRef Grahne, G., & Zhu, J. (2005). Fast algorithms for frequent itemset mining using fp-trees. IEEE Transactions on Knowledge and Data Engineering, 17(10), 1347–1362.CrossRef
go back to reference Hariharan, B., Malik, J., & Ramanan, D. (2012). Discriminative decorrelation for clustering and classification. In Proceedings of European Conference on Computer Vision, (pp. 459–472). Hariharan, B., Malik, J., & Ramanan, D. (2012). Discriminative decorrelation for clustering and classification. In Proceedings of European Conference on Computer Vision, (pp. 459–472).
go back to reference He, K., Zhang, X., Ren, S., & Sun, J. (2015). Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(9), 1904–1916.CrossRef He, K., Zhang, X., Ren, S., & Sun, J. (2015). Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(9), 1904–1916.CrossRef
go back to reference Hoiem, D., Efros, A. A., & Hebert, M. (2008). Putting objects in perspective. International Journal of Computer Vision, 80(1), 3–15.CrossRef Hoiem, D., Efros, A. A., & Hebert, M. (2008). Putting objects in perspective. International Journal of Computer Vision, 80(1), 3–15.CrossRef
go back to reference Jain, A., Gupta, A., Rodriguez, M., & Davis, L. S. (2013). Representing videos using mid-level discriminative patches. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (pp. 2571–2578). Jain, A., Gupta, A., Rodriguez, M., & Davis, L. S. (2013). Representing videos using mid-level discriminative patches. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (pp. 2571–2578).
go back to reference Jegou, H., Douze, M., Schmid, C., & Pérez, P. (2010). Aggregating local descriptors into a compact image representation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (pp. 3304–3311). Jegou, H., Douze, M., Schmid, C., & Pérez, P. (2010). Aggregating local descriptors into a compact image representation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (pp. 3304–3311).
go back to reference Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., & Darrell, T. (2014). Caffe: Convolutional architecture for fast feature embedding. arXiv preprint arXiv:1408.5093 Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., & Darrell, T. (2014). Caffe: Convolutional architecture for fast feature embedding. arXiv preprint arXiv:​1408.​5093
go back to reference Juneja, M., Vedaldi, A., Jawahar, C. V., & Zisserman, A. (2013). Blocks that shout: Distinctive parts for scene classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (pp. 923–930). Juneja, M., Vedaldi, A., Jawahar, C. V., & Zisserman, A. (2013). Blocks that shout: Distinctive parts for scene classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (pp. 923–930).
go back to reference Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Proceedings of Advances Neural Information Processing Systems, (pp. 1106–1114). Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Proceedings of Advances Neural Information Processing Systems, (pp. 1106–1114).
go back to reference Lazebnik, S., Schmid, C., & Ponce, J. (2006). Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (pp. 2169–2178). Lazebnik, S., Schmid, C., & Ponce, J. (2006). Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (pp. 2169–2178).
go back to reference Lee, Y. J., Efros, A. A., & Hebert, M. (2013). Style-aware mid-level representation for discovering visual connections in space and time. In Proceedings of IEEE International Conference on Computer Vision, (pp. 1857–1864). Lee, Y. J., Efros, A. A., & Hebert, M. (2013). Style-aware mid-level representation for discovering visual connections in space and time. In Proceedings of IEEE International Conference on Computer Vision, (pp. 1857–1864).
go back to reference Li, Q., Wu, J., & Tu, Z. (2013). Harvesting mid-level visual concepts from large-scale internet images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (pp. 851–858). Li, Q., Wu, J., & Tu, Z. (2013). Harvesting mid-level visual concepts from large-scale internet images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (pp. 851–858).
go back to reference Li, Y., Liu, L., Shen, C., & van den Hengel, A. (2015). Mid-level deep pattern mining. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (pp. 971–980). Li, Y., Liu, L., Shen, C., & van den Hengel, A. (2015). Mid-level deep pattern mining. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (pp. 971–980).
go back to reference Lin, T., RoyChowdhury, A., & Maji, S. (2015). Bilinear CNN models for fine-grained visual recognition. In Proceedings of European Conference on Computer Vision, (pp. 1449–1457). Lin, T., RoyChowdhury, A., & Maji, S. (2015). Bilinear CNN models for fine-grained visual recognition. In Proceedings of European Conference on Computer Vision, (pp. 1449–1457).
go back to reference Liu, L., & Wang, L. (2012). What has my classifier learned? visualizing the classification rules of bag-of-feature model by support region detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (pp. 3586–3593). Liu, L., & Wang, L. (2012). What has my classifier learned? visualizing the classification rules of bag-of-feature model by support region detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (pp. 3586–3593).
go back to reference Liu, L., Shen, C., Wang, L., van den Hengel, A., & Wang, C. (2014). Encoding high dimensional local features by sparse coding based fisher vectors. In Proceedings of Advances Neural Information Processing Systems, (pp. 1143–1151). Liu, L., Shen, C., Wang, L., van den Hengel, A., & Wang, C. (2014). Encoding high dimensional local features by sparse coding based fisher vectors. In Proceedings of Advances Neural Information Processing Systems, (pp. 1143–1151).
go back to reference Liu, L., Shen, C., & van den Hengel, A. (2015). The treasure beneath convolutional layers: Cross convolutional layer pooling for image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (pp. 4749–4757). Liu, L., Shen, C., & van den Hengel, A. (2015). The treasure beneath convolutional layers: Cross convolutional layer pooling for image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (pp. 4749–4757).
go back to reference Malisiewicz, T., & Efros, A. A. (2009). Beyond categories: The visual memex model for reasoning about object relationships. In Proceedings of Advances Neural Information Processing Systems, (pp. 1222–1230). Malisiewicz, T., & Efros, A. A. (2009). Beyond categories: The visual memex model for reasoning about object relationships. In Proceedings of Advances Neural Information Processing Systems, (pp. 1222–1230).
go back to reference Malisiewicz, T., Gupta, A., & Efros, A. A. (2011). Ensemble of exemplar-svms for object detection and beyond. In Proceedings of IEEE International Conference on Computer Vision, (pp. 89–96). Malisiewicz, T., Gupta, A., & Efros, A. A. (2011). Ensemble of exemplar-svms for object detection and beyond. In Proceedings of IEEE International Conference on Computer Vision, (pp. 89–96).
go back to reference Matzen, K., & Snavely, N. (2015). Bubblenet: Foveated imaging for visual discovery. In Proceedings of IEEE International Conference on Computer Vision, (pp. 1931–1939). Matzen, K., & Snavely, N. (2015). Bubblenet: Foveated imaging for visual discovery. In Proceedings of IEEE International Conference on Computer Vision, (pp. 1931–1939).
go back to reference Mettes, P., van Gemert, J. C., & Snoek, C. G. M. (2016). No spare parts: Sharing part detectors for image categorization. Computer Vision Image Understanding Mettes, P., van Gemert, J. C., & Snoek, C. G. M. (2016). No spare parts: Sharing part detectors for image categorization. Computer Vision Image Understanding
go back to reference Oquab, M., Bottou, L., Laptev, I., & Sivic, J. (2014). Learning and transferring mid-level image representations using convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (pp. 1717–1724). Oquab, M., Bottou, L., Laptev, I., & Sivic, J. (2014). Learning and transferring mid-level image representations using convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (pp. 1717–1724).
go back to reference Oramas, J., & Tuytelaars, T. (2016). Modeling visual compatibility through hierarchical mid-level elements. arXiv preprint arXiv:1604.00036 Oramas, J., & Tuytelaars, T. (2016). Modeling visual compatibility through hierarchical mid-level elements. arXiv preprint arXiv:​1604.​00036
go back to reference Owens, A., Xiao, J., Torralba, A., & Freeman, W. T. (2013). Shape anchors for data-driven multi-view reconstruction. In Proceedings of IEEE International Conference on Computer Vision, (pp. 33–40). Owens, A., Xiao, J., Torralba, A., & Freeman, W. T. (2013). Shape anchors for data-driven multi-view reconstruction. In Proceedings of IEEE International Conference on Computer Vision, (pp. 33–40).
go back to reference Parizi, S. N., Vedaldi, A., Zisserman, A., & Felzenszwalb, P. (2015). Automatic discovery and optimization of parts for image classification. In Proceedings International Conference on Learning Representations. Parizi, S. N., Vedaldi, A., Zisserman, A., & Felzenszwalb, P. (2015). Automatic discovery and optimization of parts for image classification. In Proceedings International Conference on Learning Representations.
go back to reference Perronnin, F., Liu, Y., Sánchez, J., Poirier, H. (2010a) Large-scale image retrieval with compressed fisher vectors. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (pp. 3384–3391). Perronnin, F., Liu, Y., Sánchez, J., Poirier, H. (2010a) Large-scale image retrieval with compressed fisher vectors. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (pp. 3384–3391).
go back to reference Perronnin, F., Sánchez, J., Mensink, T. (2010b) Improving the fisher kernel for large-scale image classification. In Proceedings of European Conference on Computer Vision, (pp. 143–156). Perronnin, F., Sánchez, J., Mensink, T. (2010b) Improving the fisher kernel for large-scale image classification. In Proceedings of European Conference on Computer Vision, (pp. 143–156).
go back to reference Quack, T., Ferrari, V., Leibe, B., & Gool, L. J. V. (2007). Efficient mining of frequent and distinctive feature configurations. In Proceedings of IEEE International Conference on Computer Vision, (pp. 1–8). Quack, T., Ferrari, V., Leibe, B., & Gool, L. J. V. (2007). Efficient mining of frequent and distinctive feature configurations. In Proceedings of IEEE International Conference on Computer Vision, (pp. 1–8).
go back to reference Quattoni, A., & Torralba, A. (2009). Recognizing indoor scenes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (pp. 413–420). Quattoni, A., & Torralba, A. (2009). Recognizing indoor scenes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (pp. 413–420).
go back to reference Rastegari, M., Ordonez, V., Redmon, J., & Farhadi, A. (2016). In Proceedings of European Conference on Computer Vision. Rastegari, M., Ordonez, V., Redmon, J., & Farhadi, A. (2016). In Proceedings of European Conference on Computer Vision.
go back to reference Razavian, A. S., Azizpour, H., Sullivan, J., & Carlsson, S. (2014). Cnn features off-the-shelf: An astounding baseline for recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, (pp. 512–519). Razavian, A. S., Azizpour, H., Sullivan, J., & Carlsson, S. (2014). Cnn features off-the-shelf: An astounding baseline for recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, (pp. 512–519).
go back to reference Rematas, K., Fernando, B., Dellaert, F., & Tuytelaars, T. (2015). Dataset fingerprints: Exploring image collections through data mining. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (pp. 4867–4875). Rematas, K., Fernando, B., Dellaert, F., & Tuytelaars, T. (2015). Dataset fingerprints: Exploring image collections through data mining. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (pp. 4867–4875).
go back to reference Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., et al. (2015). Imagenet large scale visual recognition challenge. International Journal of Computer Vision, 115(3), 211–252.MathSciNetCrossRef Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., et al. (2015). Imagenet large scale visual recognition challenge. International Journal of Computer Vision, 115(3), 211–252.MathSciNetCrossRef
go back to reference Shih, K. J., Endres, I., & Hoiem, D. (2015). Learning discriminative collections of part detectors for object recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(8), 1571–1584.CrossRef Shih, K. J., Endres, I., & Hoiem, D. (2015). Learning discriminative collections of part detectors for object recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(8), 1571–1584.CrossRef
go back to reference Shrivastava, A., Malisiewicz, T., Gupta, A., & Efros, A. A. (2011). Data-driven visual similarity for cross-domain image matching. Proceedings of Annual ACM SIGIR Conference, 30(6), p. 154. Shrivastava, A., Malisiewicz, T., Gupta, A., & Efros, A. A. (2011). Data-driven visual similarity for cross-domain image matching. Proceedings of Annual ACM SIGIR Conference, 30(6), p. 154.
go back to reference Simonyan, K., & Zisserman, A. (2015). Very deep convolutional networks for large-scale image recognition. In Proceedings International Conference on Learning Representations. Simonyan, K., & Zisserman, A. (2015). Very deep convolutional networks for large-scale image recognition. In Proceedings International Conference on Learning Representations.
go back to reference Simonyan, K., Vedaldi, A., & Zisserman, A. (2013). Deep fisher networks for large-scale image classification. In Proceedings of Advances Neural Information Processing Systems, (pp. 163–171). Simonyan, K., Vedaldi, A., & Zisserman, A. (2013). Deep fisher networks for large-scale image classification. In Proceedings of Advances Neural Information Processing Systems, (pp. 163–171).
go back to reference Singh, S., Gupta, A., & Efros, A. A. (2012). Unsupervised discovery of mid-level discriminative patches. In Proceedings of European Conference on Computer Vision, (pp. 73–86). Singh, S., Gupta, A., & Efros, A. A. (2012). Unsupervised discovery of mid-level discriminative patches. In Proceedings of European Conference on Computer Vision, (pp. 73–86).
go back to reference Sivic, J., & Zisserman, A. (2003). Video google: A text retrieval approach to object matching in videos. In Proceedings of IEEE International Conference on Computer Vision, (pp. 1470–1477). Sivic, J., & Zisserman, A. (2003). Video google: A text retrieval approach to object matching in videos. In Proceedings of IEEE International Conference on Computer Vision, (pp. 1470–1477).
go back to reference Song, H. O., Lee, Y. J., Jegelka, S., & Darrell, T. (2014). Weakly-supervised discovery of visual pattern configurations. In Proceedings of Advances Neural Information Processing Systems, (pp. 1637–1645). Song, H. O., Lee, Y. J., Jegelka, S., & Darrell, T. (2014). Weakly-supervised discovery of visual pattern configurations. In Proceedings of Advances Neural Information Processing Systems, (pp. 1637–1645).
go back to reference Sun, J., & Ponce, J. (2013). Learning discriminative part detectors for image classification and cosegmentation. In Proceedings of IEEE International Conference on Computer Vision, (pp. 3400–3407). Sun, J., & Ponce, J. (2013). Learning discriminative part detectors for image classification and cosegmentation. In Proceedings of IEEE International Conference on Computer Vision, (pp. 3400–3407).
go back to reference Sun, J., & Ponce, J. (2016). Learning dictionary of discriminative part detectors for image categorization and cosegmentation. International Journal of Computer Vision, 2, 1–23.MathSciNet Sun, J., & Ponce, J. (2016). Learning dictionary of discriminative part detectors for image categorization and cosegmentation. International Journal of Computer Vision, 2, 1–23.MathSciNet
go back to reference Torralba, A. (2003). Contextual priming for object detection. International Journal of Computer Vision, 53(2), 169–191.MathSciNetCrossRef Torralba, A. (2003). Contextual priming for object detection. International Journal of Computer Vision, 53(2), 169–191.MathSciNetCrossRef
go back to reference Uno, T., Asai, T., Uchida, Y., & Arimura, H. (2003). LCM: An efficient algorithm for enumerating frequent closed item sets. In Proceedings of the Workshop on Frequent Itemset Mining Implementations, International Conference on Data Mining. Uno, T., Asai, T., Uchida, Y., & Arimura, H. (2003). LCM: An efficient algorithm for enumerating frequent closed item sets. In Proceedings of the Workshop on Frequent Itemset Mining Implementations, International Conference on Data Mining.
go back to reference Voravuthikunchai, W., Crémilleux, B., & Jurie, F. (2014). Histograms of pattern sets for image classification and object recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (pp. 224–231). Voravuthikunchai, W., Crémilleux, B., & Jurie, F. (2014). Histograms of pattern sets for image classification and object recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (pp. 224–231).
go back to reference Vreeken, J., van Leeuwen, M., & Siebes, A. (2011). Krimp: mining itemsets that compress. Data Mining and Knowledge Discovery, 23(1), 169–214.MathSciNetCrossRefMATH Vreeken, J., van Leeuwen, M., & Siebes, A. (2011). Krimp: mining itemsets that compress. Data Mining and Knowledge Discovery, 23(1), 169–214.MathSciNetCrossRefMATH
go back to reference Wang, J., Liu, Z., Wu, Y., & Yuan, J. (2014). Learning actionlet ensemble for 3d human action recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(5), 914–927.CrossRef Wang, J., Liu, Z., Wu, Y., & Yuan, J. (2014). Learning actionlet ensemble for 3d human action recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(5), 914–927.CrossRef
go back to reference Wang, J., Yang, Y., Mao, J., Huang, Z., & Xu, C. H. W. (2016a). Cnn-rnn: A unified framework for multi-label image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Wang, J., Yang, Y., Mao, J., Huang, Z., & Xu, C. H. W. (2016a). Cnn-rnn: A unified framework for multi-label image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.
go back to reference Wang, L., Qiao, Y., Tang, X. (2013a) Motionlets: Mid-level 3d parts for human motion recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (pp. 2674–2681). Wang, L., Qiao, Y., Tang, X. (2013a) Motionlets: Mid-level 3d parts for human motion recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (pp. 2674–2681).
go back to reference Wang, X., Wang, B., Bai, X., Liu, W., Tu, Z. (2013b) Max-margin multiple-instance dictionary learning. In Proceedings International Conference on Machine Learning, (pp. 846–854). Wang, X., Wang, B., Bai, X., Liu, W., Tu, Z. (2013b) Max-margin multiple-instance dictionary learning. In Proceedings International Conference on Machine Learning, (pp. 846–854).
go back to reference Wang, Y., Choi, J., Morariu, V. I., & Davis, L. S. (2016b). Mining discriminative triplets of patches for fine-grained classification. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (pp. 1163–1172). Wang, Y., Choi, J., Morariu, V. I., & Davis, L. S. (2016b). Mining discriminative triplets of patches for fine-grained classification. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (pp. 1163–1172).
go back to reference Yao, B., & Fei-Fei, L. (2010). Grouplet: A structured image representation for recognizing human and object interactions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (pp. 9–16). Yao, B., & Fei-Fei, L. (2010). Grouplet: A structured image representation for recognizing human and object interactions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (pp. 9–16).
go back to reference Yoo, D., Park, S., Lee, J. Y., & Kweon, I. S. (2015). Multi-scale pyramid pooling for deep convolutional representation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, (pp. 71–80). Yoo, D., Park, S., Lee, J. Y., & Kweon, I. S. (2015). Multi-scale pyramid pooling for deep convolutional representation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, (pp. 71–80).
go back to reference Yuan, J., Wu, Y., & Yang, M. (2007). Discovery of collocation patterns: from visual words to visual phrases. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Yuan, J., Wu, Y., & Yang, M. (2007). Discovery of collocation patterns: from visual words to visual phrases. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.
go back to reference Zeiler, M. D., & Fergus, R. (2014). Visualizing and understanding convolutional networks. In Proceedings of European Conference on Computer Vision, (pp. 818–833). Zeiler, M. D., & Fergus, R. (2014). Visualizing and understanding convolutional networks. In Proceedings of European Conference on Computer Vision, (pp. 818–833).
go back to reference Zhao, R., Ouyang, W., & Wang, X. (2014). Learning mid-level filters for person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (pp. 144–151). Zhao, R., Ouyang, W., & Wang, X. (2014). Learning mid-level filters for person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (pp. 144–151).
go back to reference Zhou, B., Lapedriza À, Xiao, J., Torralba, A., & Oliva, A. (2014). Learning deep features for scene recognition using places database. In Proceedings of Advances Neural Information Processing Systems, (pp. 487–495). Zhou, B., Lapedriza À, Xiao, J., Torralba, A., & Oliva, A. (2014). Learning deep features for scene recognition using places database. In Proceedings of Advances Neural Information Processing Systems, (pp. 487–495).
Metadata
Title
Mining Mid-level Visual Patterns with Deep CNN Activations
Authors
Yao Li
Lingqiao Liu
Chunhua Shen
Anton van den Hengel
Publication date
29-08-2016
Publisher
Springer US
Published in
International Journal of Computer Vision / Issue 3/2017
Print ISSN: 0920-5691
Electronic ISSN: 1573-1405
DOI
https://doi.org/10.1007/s11263-016-0945-y

Other articles of this Issue 3/2017

International Journal of Computer Vision 3/2017 Go to the issue

Premium Partner