Skip to main content

2018 | OriginalPaper | Buchkapitel

Hierarchy of Alternating Specialists for Scene Recognition

verfasst von : Hyo Jin Kim, Jan-Michael Frahm

Erschienen in: Computer Vision – ECCV 2018

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

We introduce a method for improving convolutional neural networks (CNNs) for scene classification. We present a hierarchy of specialist networks, which disentangles the intra-class variation and inter-class similarity in a coarse to fine manner. Our key insight is that each subset within a class is often associated with different types of inter-class similarity. This suggests that existing network of experts approaches that organize classes into coarse categories are suboptimal. In contrast, we group images based on high-level appearance features rather than their class membership and dedicate a specialist model per group. In addition, we propose an alternating architecture with a global ordered- and a global orderless-representation to account for both the coarse layout of the scene and the transient objects. We demonstrate that it leads to better performance than using a single type of representation as well as the fused features. We also introduce a mini-batch soft k-means that allows end-to-end fine-tuning, as well as a novel routing function for assigning images to specialists. Experimental results show that the proposed approach achieves a significant improvement over baselines including the existing tree-structured CNNs with class-based grouping.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Literatur
3.
Zurück zum Zitat Ahmed, K., Torresani, L.: Branchconnect: Large-scale visual recognition with learned branch connections. In: WACV (2018) Ahmed, K., Torresani, L.: Branchconnect: Large-scale visual recognition with learned branch connections. In: WACV (2018)
4.
Zurück zum Zitat Aljundi, R., Chakravarty, P., Tuytelaars, T.: Expert gate: Lifelong learning with a network of experts. In: CVPR (2017) Aljundi, R., Chakravarty, P., Tuytelaars, T.: Expert gate: Lifelong learning with a network of experts. In: CVPR (2017)
5.
Zurück zum Zitat Arandjelović, R., Gronat, P., Torii, A., Pajdla, T., Sivic, J.: NetVLAD: CNN architecture for weakly supervised place recognition. In: CVPR (2016) Arandjelović, R., Gronat, P., Torii, A., Pajdla, T., Sivic, J.: NetVLAD: CNN architecture for weakly supervised place recognition. In: CVPR (2016)
6.
Zurück zum Zitat Bau, D., Zhou, B., Khosla, A., Oliva, A., Torralba, A.: Network dissection: quantifying interpretability of deep visual representations. In: CVPR (2017) Bau, D., Zhou, B., Khosla, A., Oliva, A., Torralba, A.: Network dissection: quantifying interpretability of deep visual representations. In: CVPR (2017)
7.
Zurück zum Zitat Bishop, C.M., Svenskn, M.: Bayesian hierarchical mixtures of experts. In: Uncertainty in Artificial Intelligence (2002) Bishop, C.M., Svenskn, M.: Bayesian hierarchical mixtures of experts. In: Uncertainty in Artificial Intelligence (2002)
8.
Zurück zum Zitat Chen, W., Chen, X., Zhang, J., Huang, K.: Beyond triplet loss: a deep quadruplet network for person re-identification. In: CVPR (2017) Chen, W., Chen, X., Zhang, J., Huang, K.: Beyond triplet loss: a deep quadruplet network for person re-identification. In: CVPR (2017)
9.
Zurück zum Zitat Cheng, G., Zhou, P., Han, J.: RIFD-CNN: Rotation-invariant and fisher discriminative convolutional neural networks for object detection. In: CVPR (2016) Cheng, G., Zhou, P., Han, J.: RIFD-CNN: Rotation-invariant and fisher discriminative convolutional neural networks for object detection. In: CVPR (2016)
10.
Zurück zum Zitat Cheng, X., Lu, J., Feng, J., Yuan, B., Zhou, J.: Scene recognition with objectness. Pattern Recogn. (2018) Cheng, X., Lu, J., Feng, J., Yuan, B., Zhou, J.: Scene recognition with objectness. Pattern Recogn. (2018)
11.
Zurück zum Zitat Deng, J., Berg, A.C., Fei-Fei, L.: Hierarchical semantic indexing for large scale image retrieval. In: CVPR (2011) Deng, J., Berg, A.C., Fei-Fei, L.: Hierarchical semantic indexing for large scale image retrieval. In: CVPR (2011)
12.
Zurück zum Zitat Deng, J., et al.: Large-scale object classification using label relation graphs. In: ECCV (2014) Deng, J., et al.: Large-scale object classification using label relation graphs. In: ECCV (2014)
13.
Zurück zum Zitat Dixit, M., Chen, S., Gao, D., Rasiwasia, N., Vasconcelos, N.: Scene classification with semantic fisher vectors. In: CVPR (2015) Dixit, M., Chen, S., Gao, D., Rasiwasia, N., Vasconcelos, N.: Scene classification with semantic fisher vectors. In: CVPR (2015)
14.
Zurück zum Zitat Dixit, M.D., Vasconcelos, N.: Object based scene representations using fisher scores of local subspace projections. In: NIPS (2016) Dixit, M.D., Vasconcelos, N.: Object based scene representations using fisher scores of local subspace projections. In: NIPS (2016)
16.
Zurück zum Zitat Gross, S., Ranzato, M., Szlam, A.: Hard mixtures of experts for large scale weakly supervised vision. In: CVPR (2017) Gross, S., Ranzato, M., Szlam, A.: Hard mixtures of experts for large scale weakly supervised vision. In: CVPR (2017)
17.
Zurück zum Zitat Guo, S., Huang, W., Wang, L., Qiao, Y.: Locally supervised deep hybrid model for scene recognition. TIP (2017) Guo, S., Huang, W., Wang, L., Qiao, Y.: Locally supervised deep hybrid model for scene recognition. TIP (2017)
19.
Zurück zum Zitat He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)
20.
Zurück zum Zitat Herranz, L., Jiang, S., Li, X.: Scene recognition with CNNs: objects, scales and dataset bias. In: CVPR (2016) Herranz, L., Jiang, S., Li, X.: Scene recognition with CNNs: objects, scales and dataset bias. In: CVPR (2016)
21.
22.
Zurück zum Zitat Hwang, S.J., Sigal, L.: A unified semantic embedding: Relating taxonomies and attributes. In: NIPS (2014) Hwang, S.J., Sigal, L.: A unified semantic embedding: Relating taxonomies and attributes. In: NIPS (2014)
23.
Zurück zum Zitat Jordan, M.I., Jacobs, R.A.: Hierarchical mixtures of experts and the EM algorithm. Neural Comput. (1994) Jordan, M.I., Jacobs, R.A.: Hierarchical mixtures of experts and the EM algorithm. Neural Comput. (1994)
24.
Zurück zum Zitat Juneja, M., Vedaldi, A., Jawahar, C., Zisserman, A.: Blocks that shout: distinctive parts for scene classification. In: CVPR (2013) Juneja, M., Vedaldi, A., Jawahar, C., Zisserman, A.: Blocks that shout: distinctive parts for scene classification. In: CVPR (2013)
25.
Zurück zum Zitat Khan, S.H., Hayat, M., Porikli, F.: Scene categorization with spectral features. In: CVPR (2017) Khan, S.H., Hayat, M., Porikli, F.: Scene categorization with spectral features. In: CVPR (2017)
26.
Zurück zum Zitat Kim, J., Park, Y., Kim, G., Hwang, S.J.: SplitNet: learning to semantically split deep networks for parameter reduction and model parallelization. In: ICML (2017) Kim, J., Park, Y., Kim, G., Hwang, S.J.: SplitNet: learning to semantically split deep networks for parameter reduction and model parallelization. In: ICML (2017)
27.
28.
Zurück zum Zitat Krizhevsky, A., Hinton, G.: Learning multiple layers of features from tiny images (2009) Krizhevsky, A., Hinton, G.: Learning multiple layers of features from tiny images (2009)
29.
Zurück zum Zitat Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: NIPS (2012) Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: NIPS (2012)
30.
Zurück zum Zitat Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In: CVPR (2006) Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In: CVPR (2006)
31.
Zurück zum Zitat Li, F., Neverova, N., Wolf, C., Taylor, G.: Modout: learning multi-modal architectures by stochastic regularization. In: FG (2017) Li, F., Neverova, N., Wolf, C., Taylor, G.: Modout: learning multi-modal architectures by stochastic regularization. In: FG (2017)
33.
Zurück zum Zitat Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV (2017) Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV (2017)
34.
Zurück zum Zitat Lowe, D.G.: Object recognition from local scale-invariant features. In: ICCV (1999) Lowe, D.G.: Object recognition from local scale-invariant features. In: ICCV (1999)
35.
Zurück zum Zitat Mousavian, A., Kosecka, J.: Deep convolutional features for image based retrieval and scene categorization. arXiv preprint arXiv:1509.06033 (2015) Mousavian, A., Kosecka, J.: Deep convolutional features for image based retrieval and scene categorization. arXiv preprint arXiv:​1509.​06033 (2015)
36.
Zurück zum Zitat Murdock, C., Li, Z., Zhou, H., Duerig, T.: Blockout: dynamic model selection for hierarchical deep networks. In: CVPR (2016) Murdock, C., Li, Z., Zhou, H., Duerig, T.: Blockout: dynamic model selection for hierarchical deep networks. In: CVPR (2016)
37.
Zurück zum Zitat Murthy, V.N., Singh, V., Chen, T., Manmatha, R., Comaniciu, D.: Deep decision network for multi-class image classification. In: CVPR (2016) Murthy, V.N., Singh, V., Chen, T., Manmatha, R., Comaniciu, D.: Deep decision network for multi-class image classification. In: CVPR (2016)
38.
Zurück zum Zitat Oliva, A., Torralba, A.: Modeling the shape of the scene: a holistic representation of the spatial envelope. IJCV (2001) Oliva, A., Torralba, A.: Modeling the shape of the scene: a holistic representation of the spatial envelope. IJCV (2001)
39.
Zurück zum Zitat Paszke, A., Gross, S., Chintala, S., Chanan, G.: PyTorch: tensors and dynamic neural networks in python with strong GPU acceleration (2017). http://pytorch.org/ Paszke, A., Gross, S., Chintala, S., Chanan, G.: PyTorch: tensors and dynamic neural networks in python with strong GPU acceleration (2017). http://​pytorch.​org/​
40.
Zurück zum Zitat Qian, Q., Jin, R., Zhu, S., Lin, Y.: Fine-grained visual categorization via multi-stage metric learning. In: CVPR (2015) Qian, Q., Jin, R., Zhu, S., Lin, Y.: Fine-grained visual categorization via multi-stage metric learning. In: CVPR (2015)
41.
Zurück zum Zitat Quattoni, A., Torralba, A.: Recognizing indoor scenes. In: CVPR (2009) Quattoni, A., Torralba, A.: Recognizing indoor scenes. In: CVPR (2009)
42.
Zurück zum Zitat Sabour, S., Frosst, N., Hinton, G.E.: Dynamic routing between capsules. In: NIPS (2017) Sabour, S., Frosst, N., Hinton, G.E.: Dynamic routing between capsules. In: NIPS (2017)
43.
Zurück zum Zitat Selvaraju, R.R., Das, A., Vedantam, R., Cogswell, M., Parikh, D., Batra, D.: Grad-CAM: why did you say that? Visual explanations from deep networks via gradient-based localization. In: ICCV (2017) Selvaraju, R.R., Das, A., Vedantam, R., Cogswell, M., Parikh, D., Batra, D.: Grad-CAM: why did you say that? Visual explanations from deep networks via gradient-based localization. In: ICCV (2017)
44.
Zurück zum Zitat Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: ICLR (2015) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: ICLR (2015)
45.
Zurück zum Zitat Somanath, G., Kambhamettu, C.: Abstraction and generalization of 3D structure for recognition in large intra-class variation. In: ACCV (2010) Somanath, G., Kambhamettu, C.: Abstraction and generalization of 3D structure for recognition in large intra-class variation. In: ACCV (2010)
46.
Zurück zum Zitat Srivastava, N., Salakhutdinov, R.R.: Discriminative transfer learning with tree-based priors. In: NIPS (2013) Srivastava, N., Salakhutdinov, R.R.: Discriminative transfer learning with tree-based priors. In: NIPS (2013)
47.
Zurück zum Zitat Szegedy, C., et al.: Going deeper with convolutions. In: CVPR (2015) Szegedy, C., et al.: Going deeper with convolutions. In: CVPR (2015)
48.
Zurück zum Zitat Tu, Z.: Probabilistic boosting-tree: learning discriminative models for classification, recognition, and clustering. In: CVPR (2005) Tu, Z.: Probabilistic boosting-tree: learning discriminative models for classification, recognition, and clustering. In: CVPR (2005)
49.
50.
Zurück zum Zitat Wang, J., Liu, Z., Wu, Y., Yuan, J.: Mining actionlet ensemble for action recognition with depth cameras. In: CVPR (2012) Wang, J., Liu, Z., Wu, Y., Yuan, J.: Mining actionlet ensemble for action recognition with depth cameras. In: CVPR (2012)
51.
Zurück zum Zitat Wang, L., Guo, S., Huang, W., Xiong, Y., Qiao, Y.: Knowledge guided disambiguation for large-scale scene classification with multi-resolution CNNs. TIP (2017) Wang, L., Guo, S., Huang, W., Xiong, Y., Qiao, Y.: Knowledge guided disambiguation for large-scale scene classification with multi-resolution CNNs. TIP (2017)
52.
Zurück zum Zitat Wang, Z., Wang, L., Wang, Y., Zhang, B., Qiao, Y.: Weakly supervised patchnets: describing and aggregating local patches for scene recognition. TIP (2017) Wang, Z., Wang, L., Wang, Y., Zhang, B., Qiao, Y.: Weakly supervised patchnets: describing and aggregating local patches for scene recognition. TIP (2017)
53.
Zurück zum Zitat Warde-Farley, D., Rabinovich, A., Anguelov, D.: Self-informed neural network structure learning. arXiv preprint arXiv:1412.6563 (2014) Warde-Farley, D., Rabinovich, A., Anguelov, D.: Self-informed neural network structure learning. arXiv preprint arXiv:​1412.​6563 (2014)
54.
Zurück zum Zitat Wu, R., Wang, B., Wang, W., Yu, Y.: Harvesting discriminative meta objects with deep CNN features for scene classification. In: ICCV (2015) Wu, R., Wang, B., Wang, W., Yu, Y.: Harvesting discriminative meta objects with deep CNN features for scene classification. In: ICCV (2015)
55.
Zurück zum Zitat Xiao, J., Hays, J., Ehinger, K.A., Oliva, A., Torralba, A.: Sun database: large-scale scene recognition from abbey to zoo. In: CVPR (2010) Xiao, J., Hays, J., Ehinger, K.A., Oliva, A., Torralba, A.: Sun database: large-scale scene recognition from abbey to zoo. In: CVPR (2010)
56.
Zurück zum Zitat Xiao, Y., Wu, J., Yuan, J.: mCENTRIST: a multi-channel feature generation mechanism for scene categorization. TIP (2014) Xiao, Y., Wu, J., Yuan, J.: mCENTRIST: a multi-channel feature generation mechanism for scene categorization. TIP (2014)
57.
Zurück zum Zitat Yan, Z., et al.: HD-CNN: hierarchical deep convolutional neural networks for large scale visual recognition. In: ICCV (2015) Yan, Z., et al.: HD-CNN: hierarchical deep convolutional neural networks for large scale visual recognition. In: ICCV (2015)
58.
Zurück zum Zitat Zhao, B., Li, F., Xing, E.P.: Large-scale category structure aware image categorization. In: NIPS (2011) Zhao, B., Li, F., Xing, E.P.: Large-scale category structure aware image categorization. In: NIPS (2011)
59.
Zurück zum Zitat Zheng, W.S., Gong, S., Xiang, T.: Person re-identification by probabilistic relative distance comparison. In: CVPR (2011) Zheng, W.S., Gong, S., Xiang, T.: Person re-identification by probabilistic relative distance comparison. In: CVPR (2011)
60.
Zurück zum Zitat Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., Torralba, A.: Object detectors emerge in deep scene CNNs. In: ICLR (2014) Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., Torralba, A.: Object detectors emerge in deep scene CNNs. In: ICLR (2014)
61.
Zurück zum Zitat Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., Torralba, A.: Learning deep features for discriminative localization. In: CVPR (2016) Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., Torralba, A.: Learning deep features for discriminative localization. In: CVPR (2016)
62.
Zurück zum Zitat Zhou, B., Lapedriza, A., Khosla, A., Oliva, A., Torralba, A.: Places: A 10 million image database for scene recognition. PAMI (2017) Zhou, B., Lapedriza, A., Khosla, A., Oliva, A., Torralba, A.: Places: A 10 million image database for scene recognition. PAMI (2017)
63.
Zurück zum Zitat Zhou, B., Lapedriza, A., Xiao, J., Torralba, A., Oliva, A.: Learning deep features for scene recognition using places database. In: NIPS (2014) Zhou, B., Lapedriza, A., Xiao, J., Torralba, A., Oliva, A.: Learning deep features for scene recognition using places database. In: NIPS (2014)
Metadaten
Titel
Hierarchy of Alternating Specialists for Scene Recognition
verfasst von
Hyo Jin Kim
Jan-Michael Frahm
Copyright-Jahr
2018
DOI
https://doi.org/10.1007/978-3-030-01252-6_28

Premium Partner