Skip to main content
Erschienen in: International Journal of Computer Vision 3/2017

29.08.2016

Mining Mid-level Visual Patterns with Deep CNN Activations

verfasst von: Yao Li, Lingqiao Liu, Chunhua Shen, Anton van den Hengel

Erschienen in: International Journal of Computer Vision | Ausgabe 3/2017

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The purpose of mid-level visual element discovery is to find clusters of image patches that are representative of, and which discriminate between, the contents of the relevant images. Here we propose a pattern-mining approach to the problem of identifying mid-level elements within images, motivated by the observation that such techniques have been very effective, and efficient, in achieving similar goals when applied to other data types. We show that Convolutional Neural Network (CNN) activations extracted from image patches typical possess two appealing properties that enable seamless integration with pattern mining techniques. The marriage between CNN activations and a pattern mining technique leads to fast and effective discovery of representative and discriminative patterns from a huge number of image patches, from which mid-level elements are retrieved. Given the patterns and retrieved mid-level visual elements, we propose two methods to generate image feature representations. The first encoding method uses the patterns as codewords in a dictionary in a manner similar to the Bag-of-Visual-Words model. We thus label this a Bag-of-Patterns representation. The second relies on mid-level visual elements to construct a Bag-of-Elements representation. We evaluate the two encoding methods on object and scene classification tasks, and demonstrate that our approach outperforms or matches the performance of the state-of-the-arts on these tasks.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Fußnoten
1
Answer key: 1. aeroplane, 2. train, 3. cow, 4. motorbike, 5. bike, 6. sofa.
 
Literatur
Zurück zum Zitat Agarwal, A., & Triggs, B. (2008). Multilevel image coding with hyperfeatures. International Journal of Computer Vision, 78(1), 15–27.CrossRef Agarwal, A., & Triggs, B. (2008). Multilevel image coding with hyperfeatures. International Journal of Computer Vision, 78(1), 15–27.CrossRef
Zurück zum Zitat Agrawal, P., Girshick, R., & Malik, J. (2014). Analyzing the performance of multilayer neural networks for object recognition. In Proceedings European Conference on Computer Vision, (pp. 329–344). Agrawal, P., Girshick, R., & Malik, J. (2014). Analyzing the performance of multilayer neural networks for object recognition. In Proceedings European Conference on Computer Vision, (pp. 329–344).
Zurück zum Zitat Agrawal, R., & Srikant, R. (1994). Fast algorithms for mining association rules in large databases. In Proceedings International Conference Very Large Databases, (pp. 487–499). Agrawal, R., & Srikant, R. (1994). Fast algorithms for mining association rules in large databases. In Proceedings International Conference Very Large Databases, (pp. 487–499).
Zurück zum Zitat Aubry, M., Maturana, D., Efros, A. A., Russell, B. C., Sivic, J. (2014a) Seeing 3d chairs: exemplar part-based 2d-3d alignment using a large dataset of cad models. In Proceedings of IEEE Conference on Computer Vision Pattern Recognition, (pp. 3762–3769). Aubry, M., Maturana, D., Efros, A. A., Russell, B. C., Sivic, J. (2014a) Seeing 3d chairs: exemplar part-based 2d-3d alignment using a large dataset of cad models. In Proceedings of IEEE Conference on Computer Vision Pattern Recognition, (pp. 3762–3769).
Zurück zum Zitat Aubry, M., Russell, B. C., & Sivic, J. (2014b). Painting-to-3d model alignment via discriminative visual elements. In Proceedings Annual ACM SIGIR Conference, 33(2), p. 14. Aubry, M., Russell, B. C., & Sivic, J. (2014b). Painting-to-3d model alignment via discriminative visual elements. In Proceedings Annual ACM SIGIR Conference, 33(2), p. 14.
Zurück zum Zitat Azizpour, H., Razavian, A. S., Sullivan, J., Maki, A., & Carlsson, S. (2016). Factors of transferability for a generic convnet representation. IEEE Transactions Pattern Analysis and Machine Intelligence, 38(9),1790–1802. Azizpour, H., Razavian, A. S., Sullivan, J., Maki, A., & Carlsson, S. (2016). Factors of transferability for a generic convnet representation. IEEE Transactions Pattern Analysis and Machine Intelligence, 38(9),1790–1802.
Zurück zum Zitat Bansal, A., Shrivastava, A., Doersch, C., & Gupta, A. (2015). Mid-level elements for object detection. arXiv preprint arXiv:1504.07284 Bansal, A., Shrivastava, A., Doersch, C., & Gupta, A. (2015). Mid-level elements for object detection. arXiv preprint arXiv:​1504.​07284
Zurück zum Zitat Borgelt, C. (2012). Frequent item set mining. Wiley Interdisc Review: Data Mining and Knowledge Discovery, 2(6), 437–456. Borgelt, C. (2012). Frequent item set mining. Wiley Interdisc Review: Data Mining and Knowledge Discovery, 2(6), 437–456.
Zurück zum Zitat Bossard, L., Guillaumin, M., & Gool, L. V. (2014). Food-101 mining discriminative components with random forests. In Proceedings European Conference on Computer Vision, (pp. 446–461). Bossard, L., Guillaumin, M., & Gool, L. V. (2014). Food-101 mining discriminative components with random forests. In Proceedings European Conference on Computer Vision, (pp. 446–461).
Zurück zum Zitat Bourdev, L. D., & Malik, J. (2009). Poselets: Body part detectors trained using 3d human pose annotations. In Proceedings IEEE International Conference on Computer Vision, (pp. 1365–1372). Bourdev, L. D., & Malik, J. (2009). Poselets: Body part detectors trained using 3d human pose annotations. In Proceedings IEEE International Conference on Computer Vision, (pp. 1365–1372).
Zurück zum Zitat Bourdev, L. D., Maji, S., Brox, T., & Malik, J. (2010). Detecting people using mutually consistent poselet activations. In Proceeding European Conference on Computer Vision, (pp. 168–181). Bourdev, L. D., Maji, S., Brox, T., & Malik, J. (2010). Detecting people using mutually consistent poselet activations. In Proceeding European Conference on Computer Vision, (pp. 168–181).
Zurück zum Zitat Bourdev, L. D., Maji, S., & Malik, J. (2011). Describing people: A poselet-based approach to attribute classification. In Proceedings IEEE International Conference on Computer Vision, (pp. 1543–1550). Bourdev, L. D., Maji, S., & Malik, J. (2011). Describing people: A poselet-based approach to attribute classification. In Proceedings IEEE International Conference on Computer Vision, (pp. 1543–1550).
Zurück zum Zitat Boureau, Y., Bach, F. R., LeCun, Y., & Ponce, J. (2010). Learning mid-level features for recognition. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (pp. 2559–2566). Boureau, Y., Bach, F. R., LeCun, Y., & Ponce, J. (2010). Learning mid-level features for recognition. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (pp. 2559–2566).
Zurück zum Zitat Chatfield, K., Simonyan, K., Vedaldi, A., & Zisserman, A. (2014). Return of the devil in the details: Delving deep into convolutional nets. In Proceedings British Machine Vision Conference. Chatfield, K., Simonyan, K., Vedaldi, A., & Zisserman, A. (2014). Return of the devil in the details: Delving deep into convolutional nets. In Proceedings British Machine Vision Conference.
Zurück zum Zitat Cheng, H., Yan, X., Han, J., & Yu, P. S. (2008). Direct discriminative pattern mining for effective classification. In Proceedings IEEE International Conference on Data Engineering, (pp. 169–178). Cheng, H., Yan, X., Han, J., & Yu, P. S. (2008). Direct discriminative pattern mining for effective classification. In Proceedings IEEE International Conference on Data Engineering, (pp. 169–178).
Zurück zum Zitat Choi, M. J., Torralba, A., & Willsky, A. S. (2012). A tree-based context model for object recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(2), 240–252.CrossRef Choi, M. J., Torralba, A., & Willsky, A. S. (2012). A tree-based context model for object recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(2), 240–252.CrossRef
Zurück zum Zitat Cimpoi, M., Maji, S., & Vedaldi, A. (2015). Deep filter banks for texture recognition and segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (pp. 3828–3836). Cimpoi, M., Maji, S., & Vedaldi, A. (2015). Deep filter banks for texture recognition and segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (pp. 3828–3836).
Zurück zum Zitat Cimpoi, M., Maji, S., Kokkinos, I., & Vedaldi, A. (2016). Deep filter banks for texture recognition, description, and segmentation. International Journal of Computer Vision, 118(1), 65–94.MathSciNetCrossRef Cimpoi, M., Maji, S., Kokkinos, I., & Vedaldi, A. (2016). Deep filter banks for texture recognition, description, and segmentation. International Journal of Computer Vision, 118(1), 65–94.MathSciNetCrossRef
Zurück zum Zitat Courbariaux, M., & Bengio, Y. (2016). Binarynet: Training deep neural networks with weights and activations constrained to +1 or -1. arXiv preprint arXiv:1602.02830 Courbariaux, M., & Bengio, Y. (2016). Binarynet: Training deep neural networks with weights and activations constrained to +1 or -1. arXiv preprint arXiv:​1602.​02830
Zurück zum Zitat Crowley, E., & Zisserman, A. (2014). The state of the art: Object retrieval in paintings using discriminative regions. In Proceedings British Machine Vision Conference. Crowley, E., & Zisserman, A. (2014). The state of the art: Object retrieval in paintings using discriminative regions. In Proceedings British Machine Vision Conference.
Zurück zum Zitat Deng, J., Dong, W., Socher, R., Li, L. J., Li, K., & Li, F. F. (2009). Imagenet: A large-scale hierarchical image database. In Proceedings IEEE Conference on Computer Vision and Pattern Recognition, (pp. 248–255). Deng, J., Dong, W., Socher, R., Li, L. J., Li, K., & Li, F. F. (2009). Imagenet: A large-scale hierarchical image database. In Proceedings IEEE Conference on Computer Vision and Pattern Recognition, (pp. 248–255).
Zurück zum Zitat Diba, A., Pazandeh, A. M., Pirsiavash, H., & Gool, L. V. (2016). Deepcamp: Deep convolutional action & attribute mid-level patterns. In Proceedings IEEE Conference on Computer Vision and Pattern Recognition. Diba, A., Pazandeh, A. M., Pirsiavash, H., & Gool, L. V. (2016). Deepcamp: Deep convolutional action & attribute mid-level patterns. In Proceedings IEEE Conference on Computer Vision and Pattern Recognition.
Zurück zum Zitat Divvala, S. K., Hoiem, D., Hays, J., Efros, A. A., Hebert, M. (2009). An empirical study of context in object detection. In Proceedings IEEE Conference on Computer Vision and Pattern Recognition, (pp. 1271–1278). Divvala, S. K., Hoiem, D., Hays, J., Efros, A. A., Hebert, M. (2009). An empirical study of context in object detection. In Proceedings IEEE Conference on Computer Vision and Pattern Recognition, (pp. 1271–1278).
Zurück zum Zitat Doersch, C., Singh, S., Gupta, A., Sivic, J., & Efros, A. A. (2012). What makes paris look like paris? In Proceedings Annual International ACM SIGIR Conference, 31(4), p. 101. Doersch, C., Singh, S., Gupta, A., Sivic, J., & Efros, A. A. (2012). What makes paris look like paris? In Proceedings Annual International ACM SIGIR Conference, 31(4), p. 101.
Zurück zum Zitat Doersch, C., Gupta, A., & Efros, A. A. (2013). Mid-level visual element discovery as discriminative mode seeking. In Proceedings Advances in Neural Information Processing Systems, (pp. 494–502). Doersch, C., Gupta, A., & Efros, A. A. (2013). Mid-level visual element discovery as discriminative mode seeking. In Proceedings Advances in Neural Information Processing Systems, (pp. 494–502).
Zurück zum Zitat Dosovitskiy, A., & Brox, T. (2016). Inverting visual representations with convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Dosovitskiy, A., & Brox, T. (2016). Inverting visual representations with convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.
Zurück zum Zitat Endres, I., Shih, K. J., Jiaa, J., & Hoiem, D. (2013). Learning collections of part models for object recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (pp. 939–946). Endres, I., Shih, K. J., Jiaa, J., & Hoiem, D. (2013). Learning collections of part models for object recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (pp. 939–946).
Zurück zum Zitat Everingham, M., Gool, L. J. V., Williams, C. K. I., Winn, J. M., & Zisserman, A. (2010). The pascal visual object classes (VOC) challenge. International Journal of Computer Vision, 88(2), 303–338.CrossRef Everingham, M., Gool, L. J. V., Williams, C. K. I., Winn, J. M., & Zisserman, A. (2010). The pascal visual object classes (VOC) challenge. International Journal of Computer Vision, 88(2), 303–338.CrossRef
Zurück zum Zitat Everingham, M., Eslami, S. M. A., Gool, L. V., Williams, C. K. I., Winn, J. M., & Zisserman, A. (2015). The pascal visual object classes challenge: A retrospective. International Journal of Computer Vision, 111(1), 98–136.CrossRef Everingham, M., Eslami, S. M. A., Gool, L. V., Williams, C. K. I., Winn, J. M., & Zisserman, A. (2015). The pascal visual object classes challenge: A retrospective. International Journal of Computer Vision, 111(1), 98–136.CrossRef
Zurück zum Zitat Fan, R. E., Chang, K. W., Hsieh, C. J., Wang, X. R., & Lin, C. J. (2008). Liblinear: A library for large linear classification. Journal of Machine Learning Research, 9, 1871–1874.MATH Fan, R. E., Chang, K. W., Hsieh, C. J., Wang, X. R., & Lin, C. J. (2008). Liblinear: A library for large linear classification. Journal of Machine Learning Research, 9, 1871–1874.MATH
Zurück zum Zitat Felzenszwalb, P. F., Girshick, R. B., McAllester, D. A., & Ramanan, D. (2010). Object detection with discriminatively trained part-based models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(9), 1627–1645.CrossRef Felzenszwalb, P. F., Girshick, R. B., McAllester, D. A., & Ramanan, D. (2010). Object detection with discriminatively trained part-based models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(9), 1627–1645.CrossRef
Zurück zum Zitat Fernando, B., & Tuytelaars, T. (2013). Mining multiple queries for image retrieval: On-the-fly learning of an object-specific mid-level representation. In Proceedings of IEEE International Conference on Computer Vision, (pp. 2544–2551). Fernando, B., & Tuytelaars, T. (2013). Mining multiple queries for image retrieval: On-the-fly learning of an object-specific mid-level representation. In Proceedings of IEEE International Conference on Computer Vision, (pp. 2544–2551).
Zurück zum Zitat Fernando, B., Fromont, É., & Tuytelaars, T. (2012). Effective use of frequent itemset mining for image classification. In Proceedings of European Conference on Computer Vision, (pp. 214–227). Fernando, B., Fromont, É., & Tuytelaars, T. (2012). Effective use of frequent itemset mining for image classification. In Proceedings of European Conference on Computer Vision, (pp. 214–227).
Zurück zum Zitat Fernando, B., Fromont, É., & Tuytelaars, T. (2014). Mining mid-level features for image classification. International Journal of Computer Vision, 108(3), 186–203.MathSciNetCrossRef Fernando, B., Fromont, É., & Tuytelaars, T. (2014). Mining mid-level features for image classification. International Journal of Computer Vision, 108(3), 186–203.MathSciNetCrossRef
Zurück zum Zitat Fouhey, D. F., Gupta, A., & Hebert, M. (2013). Data-driven 3d primitives for single image understanding. In Proceedings of IEEE International Conference on Computer Vision, (pp. 3392–3399). Fouhey, D. F., Gupta, A., & Hebert, M. (2013). Data-driven 3d primitives for single image understanding. In Proceedings of IEEE International Conference on Computer Vision, (pp. 3392–3399).
Zurück zum Zitat Fouhey, D. F., Hussain, W., Gupta, A., & Hebert, M. (2015). Single image 3d without a single 3d image. In Proceedings of IEEE International Conference on Computer Vision, (pp. 1053–1061). Fouhey, D. F., Hussain, W., Gupta, A., & Hebert, M. (2015). Single image 3d without a single 3d image. In Proceedings of IEEE International Conference on Computer Vision, (pp. 1053–1061).
Zurück zum Zitat Gao, Y., Beijbom, O., Zhang, N., & Darrell, T. (2010). Compact bilinear pooling. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 317–326). Gao, Y., Beijbom, O., Zhang, N., & Darrell, T. (2010). Compact bilinear pooling. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 317–326).
Zurück zum Zitat Gilbert, A., & Bowden, R. (2014). Data mining for action recognition. In Proceedings of Asian Conference on Computer Vision, (pp. 290–303). Gilbert, A., & Bowden, R. (2014). Data mining for action recognition. In Proceedings of Asian Conference on Computer Vision, (pp. 290–303).
Zurück zum Zitat Gilbert, A., Illingworth, J., & Bowden, R. (2011). Action recognition using mined hierarchical compound features. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(5), 883–897.CrossRef Gilbert, A., Illingworth, J., & Bowden, R. (2011). Action recognition using mined hierarchical compound features. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(5), 883–897.CrossRef
Zurück zum Zitat Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (pp. 580–587). Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (pp. 580–587).
Zurück zum Zitat Girshick, R. B., Donahue, J., Darrell, T., & Malik, J. (2016). Region-based convolutional networks for accurate object detection and segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(1), 142–158.CrossRef Girshick, R. B., Donahue, J., Darrell, T., & Malik, J. (2016). Region-based convolutional networks for accurate object detection and segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(1), 142–158.CrossRef
Zurück zum Zitat Gong, Y., Wang, L., Guo, R., & Lazebnik, S. (2014). Multi-scale orderless pooling of deep convolutional activation features. In Proceedings of European Conference on Computer Vision, (pp. 392–407). Gong, Y., Wang, L., Guo, R., & Lazebnik, S. (2014). Multi-scale orderless pooling of deep convolutional activation features. In Proceedings of European Conference on Computer Vision, (pp. 392–407).
Zurück zum Zitat Grahne, G., & Zhu, J. (2005). Fast algorithms for frequent itemset mining using fp-trees. IEEE Transactions on Knowledge and Data Engineering, 17(10), 1347–1362.CrossRef Grahne, G., & Zhu, J. (2005). Fast algorithms for frequent itemset mining using fp-trees. IEEE Transactions on Knowledge and Data Engineering, 17(10), 1347–1362.CrossRef
Zurück zum Zitat Hariharan, B., Malik, J., & Ramanan, D. (2012). Discriminative decorrelation for clustering and classification. In Proceedings of European Conference on Computer Vision, (pp. 459–472). Hariharan, B., Malik, J., & Ramanan, D. (2012). Discriminative decorrelation for clustering and classification. In Proceedings of European Conference on Computer Vision, (pp. 459–472).
Zurück zum Zitat He, K., Zhang, X., Ren, S., & Sun, J. (2015). Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(9), 1904–1916.CrossRef He, K., Zhang, X., Ren, S., & Sun, J. (2015). Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(9), 1904–1916.CrossRef
Zurück zum Zitat Hoiem, D., Efros, A. A., & Hebert, M. (2008). Putting objects in perspective. International Journal of Computer Vision, 80(1), 3–15.CrossRef Hoiem, D., Efros, A. A., & Hebert, M. (2008). Putting objects in perspective. International Journal of Computer Vision, 80(1), 3–15.CrossRef
Zurück zum Zitat Jain, A., Gupta, A., Rodriguez, M., & Davis, L. S. (2013). Representing videos using mid-level discriminative patches. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (pp. 2571–2578). Jain, A., Gupta, A., Rodriguez, M., & Davis, L. S. (2013). Representing videos using mid-level discriminative patches. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (pp. 2571–2578).
Zurück zum Zitat Jegou, H., Douze, M., Schmid, C., & Pérez, P. (2010). Aggregating local descriptors into a compact image representation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (pp. 3304–3311). Jegou, H., Douze, M., Schmid, C., & Pérez, P. (2010). Aggregating local descriptors into a compact image representation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (pp. 3304–3311).
Zurück zum Zitat Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., & Darrell, T. (2014). Caffe: Convolutional architecture for fast feature embedding. arXiv preprint arXiv:1408.5093 Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., & Darrell, T. (2014). Caffe: Convolutional architecture for fast feature embedding. arXiv preprint arXiv:​1408.​5093
Zurück zum Zitat Juneja, M., Vedaldi, A., Jawahar, C. V., & Zisserman, A. (2013). Blocks that shout: Distinctive parts for scene classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (pp. 923–930). Juneja, M., Vedaldi, A., Jawahar, C. V., & Zisserman, A. (2013). Blocks that shout: Distinctive parts for scene classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (pp. 923–930).
Zurück zum Zitat Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Proceedings of Advances Neural Information Processing Systems, (pp. 1106–1114). Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Proceedings of Advances Neural Information Processing Systems, (pp. 1106–1114).
Zurück zum Zitat Lazebnik, S., Schmid, C., & Ponce, J. (2006). Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (pp. 2169–2178). Lazebnik, S., Schmid, C., & Ponce, J. (2006). Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (pp. 2169–2178).
Zurück zum Zitat Lee, Y. J., Efros, A. A., & Hebert, M. (2013). Style-aware mid-level representation for discovering visual connections in space and time. In Proceedings of IEEE International Conference on Computer Vision, (pp. 1857–1864). Lee, Y. J., Efros, A. A., & Hebert, M. (2013). Style-aware mid-level representation for discovering visual connections in space and time. In Proceedings of IEEE International Conference on Computer Vision, (pp. 1857–1864).
Zurück zum Zitat Li, Q., Wu, J., & Tu, Z. (2013). Harvesting mid-level visual concepts from large-scale internet images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (pp. 851–858). Li, Q., Wu, J., & Tu, Z. (2013). Harvesting mid-level visual concepts from large-scale internet images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (pp. 851–858).
Zurück zum Zitat Li, Y., Liu, L., Shen, C., & van den Hengel, A. (2015). Mid-level deep pattern mining. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (pp. 971–980). Li, Y., Liu, L., Shen, C., & van den Hengel, A. (2015). Mid-level deep pattern mining. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (pp. 971–980).
Zurück zum Zitat Lin, T., RoyChowdhury, A., & Maji, S. (2015). Bilinear CNN models for fine-grained visual recognition. In Proceedings of European Conference on Computer Vision, (pp. 1449–1457). Lin, T., RoyChowdhury, A., & Maji, S. (2015). Bilinear CNN models for fine-grained visual recognition. In Proceedings of European Conference on Computer Vision, (pp. 1449–1457).
Zurück zum Zitat Liu, L., & Wang, L. (2012). What has my classifier learned? visualizing the classification rules of bag-of-feature model by support region detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (pp. 3586–3593). Liu, L., & Wang, L. (2012). What has my classifier learned? visualizing the classification rules of bag-of-feature model by support region detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (pp. 3586–3593).
Zurück zum Zitat Liu, L., Shen, C., Wang, L., van den Hengel, A., & Wang, C. (2014). Encoding high dimensional local features by sparse coding based fisher vectors. In Proceedings of Advances Neural Information Processing Systems, (pp. 1143–1151). Liu, L., Shen, C., Wang, L., van den Hengel, A., & Wang, C. (2014). Encoding high dimensional local features by sparse coding based fisher vectors. In Proceedings of Advances Neural Information Processing Systems, (pp. 1143–1151).
Zurück zum Zitat Liu, L., Shen, C., & van den Hengel, A. (2015). The treasure beneath convolutional layers: Cross convolutional layer pooling for image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (pp. 4749–4757). Liu, L., Shen, C., & van den Hengel, A. (2015). The treasure beneath convolutional layers: Cross convolutional layer pooling for image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (pp. 4749–4757).
Zurück zum Zitat Malisiewicz, T., & Efros, A. A. (2009). Beyond categories: The visual memex model for reasoning about object relationships. In Proceedings of Advances Neural Information Processing Systems, (pp. 1222–1230). Malisiewicz, T., & Efros, A. A. (2009). Beyond categories: The visual memex model for reasoning about object relationships. In Proceedings of Advances Neural Information Processing Systems, (pp. 1222–1230).
Zurück zum Zitat Malisiewicz, T., Gupta, A., & Efros, A. A. (2011). Ensemble of exemplar-svms for object detection and beyond. In Proceedings of IEEE International Conference on Computer Vision, (pp. 89–96). Malisiewicz, T., Gupta, A., & Efros, A. A. (2011). Ensemble of exemplar-svms for object detection and beyond. In Proceedings of IEEE International Conference on Computer Vision, (pp. 89–96).
Zurück zum Zitat Matzen, K., & Snavely, N. (2015). Bubblenet: Foveated imaging for visual discovery. In Proceedings of IEEE International Conference on Computer Vision, (pp. 1931–1939). Matzen, K., & Snavely, N. (2015). Bubblenet: Foveated imaging for visual discovery. In Proceedings of IEEE International Conference on Computer Vision, (pp. 1931–1939).
Zurück zum Zitat Mettes, P., van Gemert, J. C., & Snoek, C. G. M. (2016). No spare parts: Sharing part detectors for image categorization. Computer Vision Image Understanding Mettes, P., van Gemert, J. C., & Snoek, C. G. M. (2016). No spare parts: Sharing part detectors for image categorization. Computer Vision Image Understanding
Zurück zum Zitat Oquab, M., Bottou, L., Laptev, I., & Sivic, J. (2014). Learning and transferring mid-level image representations using convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (pp. 1717–1724). Oquab, M., Bottou, L., Laptev, I., & Sivic, J. (2014). Learning and transferring mid-level image representations using convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (pp. 1717–1724).
Zurück zum Zitat Oramas, J., & Tuytelaars, T. (2016). Modeling visual compatibility through hierarchical mid-level elements. arXiv preprint arXiv:1604.00036 Oramas, J., & Tuytelaars, T. (2016). Modeling visual compatibility through hierarchical mid-level elements. arXiv preprint arXiv:​1604.​00036
Zurück zum Zitat Owens, A., Xiao, J., Torralba, A., & Freeman, W. T. (2013). Shape anchors for data-driven multi-view reconstruction. In Proceedings of IEEE International Conference on Computer Vision, (pp. 33–40). Owens, A., Xiao, J., Torralba, A., & Freeman, W. T. (2013). Shape anchors for data-driven multi-view reconstruction. In Proceedings of IEEE International Conference on Computer Vision, (pp. 33–40).
Zurück zum Zitat Parizi, S. N., Vedaldi, A., Zisserman, A., & Felzenszwalb, P. (2015). Automatic discovery and optimization of parts for image classification. In Proceedings International Conference on Learning Representations. Parizi, S. N., Vedaldi, A., Zisserman, A., & Felzenszwalb, P. (2015). Automatic discovery and optimization of parts for image classification. In Proceedings International Conference on Learning Representations.
Zurück zum Zitat Perronnin, F., Liu, Y., Sánchez, J., Poirier, H. (2010a) Large-scale image retrieval with compressed fisher vectors. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (pp. 3384–3391). Perronnin, F., Liu, Y., Sánchez, J., Poirier, H. (2010a) Large-scale image retrieval with compressed fisher vectors. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (pp. 3384–3391).
Zurück zum Zitat Perronnin, F., Sánchez, J., Mensink, T. (2010b) Improving the fisher kernel for large-scale image classification. In Proceedings of European Conference on Computer Vision, (pp. 143–156). Perronnin, F., Sánchez, J., Mensink, T. (2010b) Improving the fisher kernel for large-scale image classification. In Proceedings of European Conference on Computer Vision, (pp. 143–156).
Zurück zum Zitat Quack, T., Ferrari, V., Leibe, B., & Gool, L. J. V. (2007). Efficient mining of frequent and distinctive feature configurations. In Proceedings of IEEE International Conference on Computer Vision, (pp. 1–8). Quack, T., Ferrari, V., Leibe, B., & Gool, L. J. V. (2007). Efficient mining of frequent and distinctive feature configurations. In Proceedings of IEEE International Conference on Computer Vision, (pp. 1–8).
Zurück zum Zitat Quattoni, A., & Torralba, A. (2009). Recognizing indoor scenes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (pp. 413–420). Quattoni, A., & Torralba, A. (2009). Recognizing indoor scenes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (pp. 413–420).
Zurück zum Zitat Rastegari, M., Ordonez, V., Redmon, J., & Farhadi, A. (2016). In Proceedings of European Conference on Computer Vision. Rastegari, M., Ordonez, V., Redmon, J., & Farhadi, A. (2016). In Proceedings of European Conference on Computer Vision.
Zurück zum Zitat Razavian, A. S., Azizpour, H., Sullivan, J., & Carlsson, S. (2014). Cnn features off-the-shelf: An astounding baseline for recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, (pp. 512–519). Razavian, A. S., Azizpour, H., Sullivan, J., & Carlsson, S. (2014). Cnn features off-the-shelf: An astounding baseline for recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, (pp. 512–519).
Zurück zum Zitat Rematas, K., Fernando, B., Dellaert, F., & Tuytelaars, T. (2015). Dataset fingerprints: Exploring image collections through data mining. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (pp. 4867–4875). Rematas, K., Fernando, B., Dellaert, F., & Tuytelaars, T. (2015). Dataset fingerprints: Exploring image collections through data mining. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (pp. 4867–4875).
Zurück zum Zitat Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., et al. (2015). Imagenet large scale visual recognition challenge. International Journal of Computer Vision, 115(3), 211–252.MathSciNetCrossRef Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., et al. (2015). Imagenet large scale visual recognition challenge. International Journal of Computer Vision, 115(3), 211–252.MathSciNetCrossRef
Zurück zum Zitat Shih, K. J., Endres, I., & Hoiem, D. (2015). Learning discriminative collections of part detectors for object recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(8), 1571–1584.CrossRef Shih, K. J., Endres, I., & Hoiem, D. (2015). Learning discriminative collections of part detectors for object recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(8), 1571–1584.CrossRef
Zurück zum Zitat Shrivastava, A., Malisiewicz, T., Gupta, A., & Efros, A. A. (2011). Data-driven visual similarity for cross-domain image matching. Proceedings of Annual ACM SIGIR Conference, 30(6), p. 154. Shrivastava, A., Malisiewicz, T., Gupta, A., & Efros, A. A. (2011). Data-driven visual similarity for cross-domain image matching. Proceedings of Annual ACM SIGIR Conference, 30(6), p. 154.
Zurück zum Zitat Simonyan, K., & Zisserman, A. (2015). Very deep convolutional networks for large-scale image recognition. In Proceedings International Conference on Learning Representations. Simonyan, K., & Zisserman, A. (2015). Very deep convolutional networks for large-scale image recognition. In Proceedings International Conference on Learning Representations.
Zurück zum Zitat Simonyan, K., Vedaldi, A., & Zisserman, A. (2013). Deep fisher networks for large-scale image classification. In Proceedings of Advances Neural Information Processing Systems, (pp. 163–171). Simonyan, K., Vedaldi, A., & Zisserman, A. (2013). Deep fisher networks for large-scale image classification. In Proceedings of Advances Neural Information Processing Systems, (pp. 163–171).
Zurück zum Zitat Singh, S., Gupta, A., & Efros, A. A. (2012). Unsupervised discovery of mid-level discriminative patches. In Proceedings of European Conference on Computer Vision, (pp. 73–86). Singh, S., Gupta, A., & Efros, A. A. (2012). Unsupervised discovery of mid-level discriminative patches. In Proceedings of European Conference on Computer Vision, (pp. 73–86).
Zurück zum Zitat Sivic, J., & Zisserman, A. (2003). Video google: A text retrieval approach to object matching in videos. In Proceedings of IEEE International Conference on Computer Vision, (pp. 1470–1477). Sivic, J., & Zisserman, A. (2003). Video google: A text retrieval approach to object matching in videos. In Proceedings of IEEE International Conference on Computer Vision, (pp. 1470–1477).
Zurück zum Zitat Song, H. O., Lee, Y. J., Jegelka, S., & Darrell, T. (2014). Weakly-supervised discovery of visual pattern configurations. In Proceedings of Advances Neural Information Processing Systems, (pp. 1637–1645). Song, H. O., Lee, Y. J., Jegelka, S., & Darrell, T. (2014). Weakly-supervised discovery of visual pattern configurations. In Proceedings of Advances Neural Information Processing Systems, (pp. 1637–1645).
Zurück zum Zitat Sun, J., & Ponce, J. (2013). Learning discriminative part detectors for image classification and cosegmentation. In Proceedings of IEEE International Conference on Computer Vision, (pp. 3400–3407). Sun, J., & Ponce, J. (2013). Learning discriminative part detectors for image classification and cosegmentation. In Proceedings of IEEE International Conference on Computer Vision, (pp. 3400–3407).
Zurück zum Zitat Sun, J., & Ponce, J. (2016). Learning dictionary of discriminative part detectors for image categorization and cosegmentation. International Journal of Computer Vision, 2, 1–23.MathSciNet Sun, J., & Ponce, J. (2016). Learning dictionary of discriminative part detectors for image categorization and cosegmentation. International Journal of Computer Vision, 2, 1–23.MathSciNet
Zurück zum Zitat Torralba, A. (2003). Contextual priming for object detection. International Journal of Computer Vision, 53(2), 169–191.MathSciNetCrossRef Torralba, A. (2003). Contextual priming for object detection. International Journal of Computer Vision, 53(2), 169–191.MathSciNetCrossRef
Zurück zum Zitat Uno, T., Asai, T., Uchida, Y., & Arimura, H. (2003). LCM: An efficient algorithm for enumerating frequent closed item sets. In Proceedings of the Workshop on Frequent Itemset Mining Implementations, International Conference on Data Mining. Uno, T., Asai, T., Uchida, Y., & Arimura, H. (2003). LCM: An efficient algorithm for enumerating frequent closed item sets. In Proceedings of the Workshop on Frequent Itemset Mining Implementations, International Conference on Data Mining.
Zurück zum Zitat Voravuthikunchai, W., Crémilleux, B., & Jurie, F. (2014). Histograms of pattern sets for image classification and object recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (pp. 224–231). Voravuthikunchai, W., Crémilleux, B., & Jurie, F. (2014). Histograms of pattern sets for image classification and object recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (pp. 224–231).
Zurück zum Zitat Vreeken, J., van Leeuwen, M., & Siebes, A. (2011). Krimp: mining itemsets that compress. Data Mining and Knowledge Discovery, 23(1), 169–214.MathSciNetCrossRefMATH Vreeken, J., van Leeuwen, M., & Siebes, A. (2011). Krimp: mining itemsets that compress. Data Mining and Knowledge Discovery, 23(1), 169–214.MathSciNetCrossRefMATH
Zurück zum Zitat Wang, J., Liu, Z., Wu, Y., & Yuan, J. (2014). Learning actionlet ensemble for 3d human action recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(5), 914–927.CrossRef Wang, J., Liu, Z., Wu, Y., & Yuan, J. (2014). Learning actionlet ensemble for 3d human action recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(5), 914–927.CrossRef
Zurück zum Zitat Wang, J., Yang, Y., Mao, J., Huang, Z., & Xu, C. H. W. (2016a). Cnn-rnn: A unified framework for multi-label image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Wang, J., Yang, Y., Mao, J., Huang, Z., & Xu, C. H. W. (2016a). Cnn-rnn: A unified framework for multi-label image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.
Zurück zum Zitat Wang, L., Qiao, Y., Tang, X. (2013a) Motionlets: Mid-level 3d parts for human motion recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (pp. 2674–2681). Wang, L., Qiao, Y., Tang, X. (2013a) Motionlets: Mid-level 3d parts for human motion recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (pp. 2674–2681).
Zurück zum Zitat Wang, X., Wang, B., Bai, X., Liu, W., Tu, Z. (2013b) Max-margin multiple-instance dictionary learning. In Proceedings International Conference on Machine Learning, (pp. 846–854). Wang, X., Wang, B., Bai, X., Liu, W., Tu, Z. (2013b) Max-margin multiple-instance dictionary learning. In Proceedings International Conference on Machine Learning, (pp. 846–854).
Zurück zum Zitat Wang, Y., Choi, J., Morariu, V. I., & Davis, L. S. (2016b). Mining discriminative triplets of patches for fine-grained classification. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (pp. 1163–1172). Wang, Y., Choi, J., Morariu, V. I., & Davis, L. S. (2016b). Mining discriminative triplets of patches for fine-grained classification. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (pp. 1163–1172).
Zurück zum Zitat Wei, Y., Xia, W., Huang, J., Ni, B., Dong, J., Zhao, Y., Yan, S. (2014). CNN: single-label to multi-label. CoRR arXiv:1406.5726 Wei, Y., Xia, W., Huang, J., Ni, B., Dong, J., Zhao, Y., Yan, S. (2014). CNN: single-label to multi-label. CoRR arXiv:​1406.​5726
Zurück zum Zitat Yao, B., & Fei-Fei, L. (2010). Grouplet: A structured image representation for recognizing human and object interactions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (pp. 9–16). Yao, B., & Fei-Fei, L. (2010). Grouplet: A structured image representation for recognizing human and object interactions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (pp. 9–16).
Zurück zum Zitat Yoo, D., Park, S., Lee, J. Y., & Kweon, I. S. (2015). Multi-scale pyramid pooling for deep convolutional representation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, (pp. 71–80). Yoo, D., Park, S., Lee, J. Y., & Kweon, I. S. (2015). Multi-scale pyramid pooling for deep convolutional representation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, (pp. 71–80).
Zurück zum Zitat Yuan, J., Wu, Y., & Yang, M. (2007). Discovery of collocation patterns: from visual words to visual phrases. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Yuan, J., Wu, Y., & Yang, M. (2007). Discovery of collocation patterns: from visual words to visual phrases. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.
Zurück zum Zitat Zeiler, M. D., & Fergus, R. (2014). Visualizing and understanding convolutional networks. In Proceedings of European Conference on Computer Vision, (pp. 818–833). Zeiler, M. D., & Fergus, R. (2014). Visualizing and understanding convolutional networks. In Proceedings of European Conference on Computer Vision, (pp. 818–833).
Zurück zum Zitat Zhao, R., Ouyang, W., & Wang, X. (2014). Learning mid-level filters for person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (pp. 144–151). Zhao, R., Ouyang, W., & Wang, X. (2014). Learning mid-level filters for person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (pp. 144–151).
Zurück zum Zitat Zhou, B., Lapedriza À, Xiao, J., Torralba, A., & Oliva, A. (2014). Learning deep features for scene recognition using places database. In Proceedings of Advances Neural Information Processing Systems, (pp. 487–495). Zhou, B., Lapedriza À, Xiao, J., Torralba, A., & Oliva, A. (2014). Learning deep features for scene recognition using places database. In Proceedings of Advances Neural Information Processing Systems, (pp. 487–495).
Metadaten
Titel
Mining Mid-level Visual Patterns with Deep CNN Activations
verfasst von
Yao Li
Lingqiao Liu
Chunhua Shen
Anton van den Hengel
Publikationsdatum
29.08.2016
Verlag
Springer US
Erschienen in
International Journal of Computer Vision / Ausgabe 3/2017
Print ISSN: 0920-5691
Elektronische ISSN: 1573-1405
DOI
https://doi.org/10.1007/s11263-016-0945-y

Weitere Artikel der Ausgabe 3/2017

International Journal of Computer Vision 3/2017 Zur Ausgabe

Premium Partner