Skip to main content

2020 | OriginalPaper | Buchkapitel

GATCluster: Self-supervised Gaussian-Attention Network for Image Clustering

verfasst von : Chuang Niu, Jun Zhang, Ge Wang, Jimin Liang

Erschienen in: Computer Vision – ECCV 2020

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

We propose a self-supervised Gaussian ATtention network for image Clustering (GATCluster). Rather than extracting intermediate features first and then performing traditional clustering algorithms, GATCluster directly outputs semantic cluster labels without further post-processing. We give a Label Feature Theorem to guarantee that the learned features are one-hot encoded vectors and the trivial solutions are avoided. Based on this theorem, we design four self-learning tasks with the constraints of transformation invariance, separability maximization, entropy analysis, and attention mapping. Specifically, the transformation invariance and separability maximization tasks learn the relations between samples. The entropy analysis task aims to avoid trivial solutions. To capture the object-oriented semantics, we design a self-supervised attention mechanism that includes a Gaussian attention module and a soft-attention loss. Moreover, we design a two-step learning algorithm that is memory-efficient for clustering large-size images. Extensive experiments demonstrate the superiority of our proposed method in comparison with the state-of-the-art image clustering benchmarks.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Literatur
1.
Zurück zum Zitat Anderson, P., et al.: Bottom-up and top-down attention for image captioning and visual question answering. In: CVPR (2018) Anderson, P., et al.: Bottom-up and top-down attention for image captioning and visual question answering. In: CVPR (2018)
2.
Zurück zum Zitat Bengio, Y., Lamblin, P., Popovici, D., Larochelle, H., Montreal, U.: Greedy layer-wise training of deep networks. NeurIPS 19, 153–160 (2007) Bengio, Y., Lamblin, P., Popovici, D., Larochelle, H., Montreal, U.: Greedy layer-wise training of deep networks. NeurIPS 19, 153–160 (2007)
3.
Zurück zum Zitat Cai, D., He, X., Wang, X., Bao, H., Han, J.: Locality preserving nonnegative matrix factorization. In: IJCAI, pp. 1010–1015 (2009) Cai, D., He, X., Wang, X., Bao, H., Han, J.: Locality preserving nonnegative matrix factorization. In: IJCAI, pp. 1010–1015 (2009)
4.
Zurück zum Zitat Caron, M., Bojanowski, P., Joulin, A., Douze, M.: Deep clustering for unsupervised learning of visual features. In: ECCV, vol. 11218, pp. 139–156 (2018) Caron, M., Bojanowski, P., Joulin, A., Douze, M.: Deep clustering for unsupervised learning of visual features. In: ECCV, vol. 11218, pp. 139–156 (2018)
5.
Zurück zum Zitat Chang, J., Wang, L., Meng, G., Xiang, S., Pan, C.: Deep adaptive image clustering. In: ICCV, pp. 5880–5888 (2017) Chang, J., Wang, L., Meng, G., Xiang, S., Pan, C.: Deep adaptive image clustering. In: ICCV, pp. 5880–5888 (2017)
6.
Zurück zum Zitat Chen, D., Lv, J., Zhang, Y.: Unsupervised multi-manifold clustering by learning deep representation. In: AAAI Workshops. AAAI Workshops, vol. WS-17. AAAI Press (2017) Chen, D., Lv, J., Zhang, Y.: Unsupervised multi-manifold clustering by learning deep representation. In: AAAI Workshops. AAAI Workshops, vol. WS-17. AAAI Press (2017)
7.
Zurück zum Zitat Coates, A., Ng, A.Y., Lee, H.: An analysis of single-layer networks in unsupervised feature learning. In: AISTATS, vol. 15, pp. 215–223 (2011) Coates, A., Ng, A.Y., Lee, H.: An analysis of single-layer networks in unsupervised feature learning. In: AISTATS, vol. 15, pp. 215–223 (2011)
8.
Zurück zum Zitat Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: CVPR, vol. 1, pp. 886–893 (2005) Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: CVPR, vol. 1, pp. 886–893 (2005)
9.
Zurück zum Zitat Dilokthanakul, N., et al.: Deep unsupervised clustering with gaussian mixture variational autoencoders. ArXiv abs/1611.02648 (2017) Dilokthanakul, N., et al.: Deep unsupervised clustering with gaussian mixture variational autoencoders. ArXiv abs/1611.02648 (2017)
10.
Zurück zum Zitat Dizaji, K.G., Herandi, A., Deng, C., Cai, W., Huang, H.: Deep clustering via joint convolutional autoencoder embedding and relative entropy minimization. In: ICCV, pp. 5747–5756 (2017) Dizaji, K.G., Herandi, A., Deng, C., Cai, W., Huang, H.: Deep clustering via joint convolutional autoencoder embedding and relative entropy minimization. In: ICCV, pp. 5747–5756 (2017)
11.
Zurück zum Zitat Franti, P., Virmajoki, O., Hautamaki, V.: Fast agglomerative clustering using a k-nearest neighbor graph. IEEE Trans. Pattern Anal. Mach. Intell. 28(11), 1875–1881 (2006)CrossRef Franti, P., Virmajoki, O., Hautamaki, V.: Fast agglomerative clustering using a k-nearest neighbor graph. IEEE Trans. Pattern Anal. Mach. Intell. 28(11), 1875–1881 (2006)CrossRef
12.
Zurück zum Zitat Gidaris, S., Singh, P., Komodakis, N.: Unsupervised representation learning by predicting image rotations. In: ICLR. OpenReview.net (2018) Gidaris, S., Singh, P., Komodakis, N.: Unsupervised representation learning by predicting image rotations. In: ICLR. OpenReview.net (2018)
13.
Zurück zum Zitat Haeusser, P., Plapp, J., Golkov, V., Aljalbout, E., Cremers, D.: Associative deep clustering: training a classification network with no labels. In: Brox, T., Bruhn, A., Fritz, M. (eds.) Pattern Recognition, pp. 18–32 (2019) Haeusser, P., Plapp, J., Golkov, V., Aljalbout, E., Cremers, D.: Associative deep clustering: training a classification network with no labels. In: Brox, T., Bruhn, A., Fritz, M. (eds.) Pattern Recognition, pp. 18–32 (2019)
14.
Zurück zum Zitat He, T., Tian, Z., Huang, W., Shen, C., Qiao, Y., Sun, C.: An end-to-end textspotter with explicit alignment and attention. In: CVPR (2018) He, T., Tian, Z., Huang, W., Shen, C., Qiao, Y., Sun, C.: An end-to-end textspotter with explicit alignment and attention. In: CVPR (2018)
15.
Zurück zum Zitat Hinton, G.E., Salakhutdinov, R.R.: Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006)MathSciNetCrossRef Hinton, G.E., Salakhutdinov, R.R.: Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006)MathSciNetCrossRef
16.
Zurück zum Zitat Hsu, C., Lin, C.: Cnn-based joint clustering and representation learning with feature drift compensation for large-scale image data. IEEE Trans. Multimedia 20(2), 421–429 (2018)CrossRef Hsu, C., Lin, C.: Cnn-based joint clustering and representation learning with feature drift compensation for large-scale image data. IEEE Trans. Multimedia 20(2), 421–429 (2018)CrossRef
17.
Zurück zum Zitat Hu, W., Miyato, T., Tokui, S., Matsumoto, E., Sugiyama, M.: Learning discrete representations via information maximizing self-augmented training. In: ICML, vol. 70, pp. 1558–1567 (2017) Hu, W., Miyato, T., Tokui, S., Matsumoto, E., Sugiyama, M.: Learning discrete representations via information maximizing self-augmented training. In: ICML, vol. 70, pp. 1558–1567 (2017)
18.
Zurück zum Zitat Huang, P., Huang, Y., Wang, W., Wang, L.: Deep embedding network for clustering. In: ICPR, pp. 1532–1537 (2014) Huang, P., Huang, Y., Wang, W., Wang, L.: Deep embedding network for clustering. In: ICPR, pp. 1532–1537 (2014)
19.
Zurück zum Zitat Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2(1), 193–218 (1985)CrossRef Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2(1), 193–218 (1985)CrossRef
20.
Zurück zum Zitat Ji, P., Zhang, T., Li, H., Salzmann, M., Reid, I.: Deep subspace clustering networks. In: NeurIPS, pp. 23–32 (2017) Ji, P., Zhang, T., Li, H., Salzmann, M., Reid, I.: Deep subspace clustering networks. In: NeurIPS, pp. 23–32 (2017)
21.
Zurück zum Zitat Ji, X., Henriques, J.F., Vedaldi, A.: Invariant information clustering for unsupervised image classification and segmentation. In: ICCV (2019) Ji, X., Henriques, J.F., Vedaldi, A.: Invariant information clustering for unsupervised image classification and segmentation. In: ICCV (2019)
22.
Zurück zum Zitat Jiang, Z., Zheng, Y., Tan, H., Tang, B., Zhou, H.: Variational deep embedding: an unsupervised and generative approach to clustering. In: IJCAI, pp. 1965–1972 (2017) Jiang, Z., Zheng, Y., Tan, H., Tang, B., Zhou, H.: Variational deep embedding: an unsupervised and generative approach to clustering. In: IJCAI, pp. 1965–1972 (2017)
23.
Zurück zum Zitat Kingma, D.P., Welling, M.: Auto-encoding variational bayes (2013) Kingma, D.P., Welling, M.: Auto-encoding variational bayes (2013)
24.
Zurück zum Zitat Krizhevsky, A.: Learning multiple layers of features from tiny images. University of Toronto, Technical report (2009) Krizhevsky, A.: Learning multiple layers of features from tiny images. University of Toronto, Technical report (2009)
25.
Zurück zum Zitat Li, F., Qiao, H., Zhang, B.: Discriminatively boosted image clustering with fully convolutional auto-encoders. Pattern Recogn. 83, 161–173 (2018)CrossRef Li, F., Qiao, H., Zhang, B.: Discriminatively boosted image clustering with fully convolutional auto-encoders. Pattern Recogn. 83, 161–173 (2018)CrossRef
26.
Zurück zum Zitat Li, K., Wu, Z., Peng, K.C., Ernst, J., Fu, Y.: Tell me where to look: guided attention inference network. In: CVPR (2018) Li, K., Wu, Z., Peng, K.C., Ernst, J., Fu, Y.: Tell me where to look: guided attention inference network. In: CVPR (2018)
27.
Zurück zum Zitat Li, T., Ding, C.H.Q.: The relationships among various nonnegative matrix factorization methods for clustering. In: ICDM, pp. 362–371. IEEE Computer Society (2006) Li, T., Ding, C.H.Q.: The relationships among various nonnegative matrix factorization methods for clustering. In: ICDM, pp. 362–371. IEEE Computer Society (2006)
28.
Zurück zum Zitat Li, W., Zhu, X., Gong, S.: Harmonious attention network for person re-identification. In: CVPR (2018) Li, W., Zhu, X., Gong, S.: Harmonious attention network for person re-identification. In: CVPR (2018)
29.
Zurück zum Zitat Liu, J., Gao, C., Meng, D., Hauptmann, A.G.: Decidenet: counting varying density crowds through attention guided detection and density estimation. In: CVPR (2018) Liu, J., Gao, C., Meng, D., Hauptmann, A.G.: Decidenet: counting varying density crowds through attention guided detection and density estimation. In: CVPR (2018)
30.
Zurück zum Zitat Lowe, D.G.: Object recognition from local scale-invariant features. In: ICCV (1999) Lowe, D.G.: Object recognition from local scale-invariant features. In: ICCV (1999)
31.
Zurück zum Zitat Macqueen, J.: Some methods for classification and analysis of multivariate observations. In: In 5-th Berkeley Symposium on Mathematical Statistics and Probability, pp. 281–297 (1967) Macqueen, J.: Some methods for classification and analysis of multivariate observations. In: In 5-th Berkeley Symposium on Mathematical Statistics and Probability, pp. 281–297 (1967)
32.
Zurück zum Zitat Makhzani, A., Shlens, J., Jaitly, N., Goodfellow, I.J.: Adversarial autoencoders. CoRR abs/1511.05644 (2015) Makhzani, A., Shlens, J., Jaitly, N., Goodfellow, I.J.: Adversarial autoencoders. CoRR abs/1511.05644 (2015)
33.
Zurück zum Zitat Masci, J., Meier, U., Cireşan, D., Schmidhuber, J.: Stacked convolutional auto-encoders for hierarchical feature extraction. In: Honkela, T., Duch, W., Girolami, M., Kaski, S. (eds.) Artificial Neural Networks and Machine Learning, pp. 52–59 (2011) Masci, J., Meier, U., Cireşan, D., Schmidhuber, J.: Stacked convolutional auto-encoders for hierarchical feature extraction. In: Honkela, T., Duch, W., Girolami, M., Kaski, S. (eds.) Artificial Neural Networks and Machine Learning, pp. 52–59 (2011)
34.
Zurück zum Zitat Ng, A.Y., Jordan, M.I., Weiss, Y.: On spectral clustering: analysis and an algorithm. In: Dietterich, T.G., Becker, S., Ghahramani, Z. (eds.) NeurIPS, pp. 849–856 (2002) Ng, A.Y., Jordan, M.I., Weiss, Y.: On spectral clustering: analysis and an algorithm. In: Dietterich, T.G., Becker, S., Ghahramani, Z. (eds.) NeurIPS, pp. 849–856 (2002)
35.
Zurück zum Zitat Noroozi, M., Favaro, P.: Unsupervised learning of visual representations by solving jigsaw puzzles. In: ECCV, vol. 9910, pp. 69–84 (2016) Noroozi, M., Favaro, P.: Unsupervised learning of visual representations by solving jigsaw puzzles. In: ECCV, vol. 9910, pp. 69–84 (2016)
36.
Zurück zum Zitat Noroozi, M., Pirsiavash, H., Favaro, P.: Representation learning by learning to count. In: ICCV, pp. 5899–5907. IEEE Computer Society (2017) Noroozi, M., Pirsiavash, H., Favaro, P.: Representation learning by learning to count. In: ICCV, pp. 5899–5907. IEEE Computer Society (2017)
37.
Zurück zum Zitat Pathak, D., Krahenbuhl, P., Donahue, J., Darrell, T., Efros, A.A.: Context encoders: feature learning by inpainting. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2536–2544 (2016) Pathak, D., Krahenbuhl, P., Donahue, J., Darrell, T., Efros, A.A.: Context encoders: feature learning by inpainting. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2536–2544 (2016)
38.
Zurück zum Zitat Russakovsky, O., et al.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015)MathSciNetCrossRef Russakovsky, O., et al.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015)MathSciNetCrossRef
39.
Zurück zum Zitat Springenberg, J.T.: Unsupervised and semi-supervised learning with categorical generative adversarial networks. In: Bengio, Y., LeCun, Y. (eds.) ICLR (2016) Springenberg, J.T.: Unsupervised and semi-supervised learning with categorical generative adversarial networks. In: Bengio, Y., LeCun, Y. (eds.) ICLR (2016)
40.
Zurück zum Zitat Strehl, A., Ghosh, J.: Cluster ensembles - a knowledge reuse framework for combining multiple partitions. J. Mach. Learn. Res. (JMLR) 3, 583–617 (2002)MathSciNetMATH Strehl, A., Ghosh, J.: Cluster ensembles - a knowledge reuse framework for combining multiple partitions. J. Mach. Learn. Res. (JMLR) 3, 583–617 (2002)MathSciNetMATH
42.
Zurück zum Zitat Vaswani, A., et al.: Attention is all you need. In: Guyon, I., et al. (eds.) NeurIPS, pp. 5998–6008 (2017) Vaswani, A., et al.: Attention is all you need. In: Guyon, I., et al. (eds.) NeurIPS, pp. 5998–6008 (2017)
43.
Zurück zum Zitat Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., Manzagol, P.A.: Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. J. Mach. Learn. Res. 11(12), 3371–3408 (2010)MathSciNetMATH Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., Manzagol, P.A.: Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. J. Mach. Learn. Res. 11(12), 3371–3408 (2010)MathSciNetMATH
44.
Zurück zum Zitat Wang, Q., Teng, Z., Xing, J., Gao, J., Hu, W., Maybank, S.: Learning attentions: residual attentional siamese network for high performance online visual tracking. In: CVPR (2018) Wang, Q., Teng, Z., Xing, J., Gao, J., Hu, W., Maybank, S.: Learning attentions: residual attentional siamese network for high performance online visual tracking. In: CVPR (2018)
45.
Zurück zum Zitat Wu, J., et al.: Deep comprehensive correlation mining for image clustering. In: ICCV (2019) Wu, J., et al.: Deep comprehensive correlation mining for image clustering. In: ICCV (2019)
46.
Zurück zum Zitat Xie, J., Girshick, R., Farhadi, A.: Unsupervised deep embedding for clustering analysis. In: ICML, pp. 478–487 (2016) Xie, J., Girshick, R., Farhadi, A.: Unsupervised deep embedding for clustering analysis. In: ICML, pp. 478–487 (2016)
47.
Zurück zum Zitat Yang, B., Fu, X., Sidiropoulos, N.D., Hong, M.: Towards k-means-friendly spaces: simultaneous deep learning and clustering. In: Precup, D., Teh, Y.W. (eds.) ICML, vol. 70, pp. 3861–3870 (2017) Yang, B., Fu, X., Sidiropoulos, N.D., Hong, M.: Towards k-means-friendly spaces: simultaneous deep learning and clustering. In: Precup, D., Teh, Y.W. (eds.) ICML, vol. 70, pp. 3861–3870 (2017)
48.
Zurück zum Zitat Yang, J., Parikh, D., Batra, D.: Joint unsupervised learning of deep representations and image clusters. In: CVPR (2016) Yang, J., Parikh, D., Batra, D.: Joint unsupervised learning of deep representations and image clusters. In: CVPR (2016)
49.
Zurück zum Zitat Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: Computer Vision and Pattern Recognition (2010) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: Computer Vision and Pattern Recognition (2010)
50.
Zurück zum Zitat Zhang, H., Goodfellow, I.J., Metaxas, D.N., Odena, A.: Self-attention generative adversarial networks. ArXiv abs/1805.08318 (2018) Zhang, H., Goodfellow, I.J., Metaxas, D.N., Odena, A.: Self-attention generative adversarial networks. ArXiv abs/1805.08318 (2018)
51.
Zurück zum Zitat Zhang, J., Li, C.G., You, C., Qi, X., Zhang, H., Guo, J., Lin, Z.: Self-supervised convolutional subspace clustering network. In: CVPR (2019) Zhang, J., Li, C.G., You, C., Qi, X., Zhang, H., Guo, J., Lin, Z.: Self-supervised convolutional subspace clustering network. In: CVPR (2019)
53.
Zurück zum Zitat Zhang, T., Ramakrishnan, R., Livny, M.: Birch: an efficient data clustering method for very large databases. In: SIGMOD Conference (1996) Zhang, T., Ramakrishnan, R., Livny, M.: Birch: an efficient data clustering method for very large databases. In: SIGMOD Conference (1996)
54.
Zurück zum Zitat Zhao, J.J., Mathieu, M., Goroshin, R., LeCun, Y.: Stacked what-where auto-encoders. CoRR abs/1506.02351 (2015) Zhao, J.J., Mathieu, M., Goroshin, R., LeCun, Y.: Stacked what-where auto-encoders. CoRR abs/1506.02351 (2015)
55.
Zurück zum Zitat Zhou, P., Hou, Y., Feng, J.: Deep adversarial subspace clustering. In: CVPR (2018) Zhou, P., Hou, Y., Feng, J.: Deep adversarial subspace clustering. In: CVPR (2018)
Metadaten
Titel
GATCluster: Self-supervised Gaussian-Attention Network for Image Clustering
verfasst von
Chuang Niu
Jun Zhang
Ge Wang
Jimin Liang
Copyright-Jahr
2020
DOI
https://doi.org/10.1007/978-3-030-58595-2_44

Premium Partner