Skip to main content
Top
Published in: Arabian Journal for Science and Engineering 2/2022

24-06-2021 | Research Article-Computer Engineering and Computer Science

Large-Scale Data Clustering Using Manifold-Regularized Ensemble of Posterior in GAN

Authors: Haleh Homayouni, Eghbal Mansoori

Published in: Arabian Journal for Science and Engineering | Issue 2/2022

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Data clustering is an unsupervised learning method as a pivotal technique for statistical data analysis. It is a challenging machine learning scheme that involves the grouping of data samples, especially in large databases. Deep neural networks are scalable to large-scale data and capable of learning data structure by modeling the nonlinearity. One of the famous latent generative models in this realm is the generative adversarial network (GAN). In the latent generative models, for clustering, we need the posterior corresponding to the intended model. Then, we need a variational approximation of that. To address this problem, we can maximize mutual information or minimize the KL-divergence. In this paper, to reach a more generalized inference in clustering, an ensemble approach is employed to approximate the posterior. To implement this ensemble with deep networks, we proposed a convex lower bound for the posteriors’ variational approximation. To amend the generator behavior, we injected the geometrical structure of data as manifold regularization to the objective function to reach accurate statistical inference. The efficacy of the proposed method has been addressed in four benchmark data sets. The experimental results confirm our model’s superiority in comparison with standard clustering algorithms and some recently developed deep methods.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Junyuan, Xie, Ross, Girshick, Ali, Farhadi: Unsupervised deep embedding for clustering analysis. In International conference on machine learning, pages 478–487, (2016a) Junyuan, Xie, Ross, Girshick, Ali, Farhadi: Unsupervised deep embedding for clustering analysis. In International conference on machine learning, pages 478–487, (2016a)
3.
go back to reference Goodfellow, Ian; Pouget-Abadie, Jean; Mirza, Mehdi; Xu, Bing; Warde-Farley, David; Ozair, Sherjil; Courville, Aaron; Bengio, Yoshua: Generative adversarial nets. In Advances in neural information processing systems, pages 2672–2680, (2014) Goodfellow, Ian; Pouget-Abadie, Jean; Mirza, Mehdi; Xu, Bing; Warde-Farley, David; Ozair, Sherjil; Courville, Aaron; Bengio, Yoshua: Generative adversarial nets. In Advances in neural information processing systems, pages 2672–2680, (2014)
4.
go back to reference Alain, Guillaume; Bengio, Yoshua: What regularized auto-encoders learn from the data-generating distribution. J. Mach. Learn. Res. 15(1), 3563–3593 (2014)MathSciNetMATH Alain, Guillaume; Bengio, Yoshua: What regularized auto-encoders learn from the data-generating distribution. J. Mach. Learn. Res. 15(1), 3563–3593 (2014)MathSciNetMATH
6.
go back to reference Kumar, Abhishek; Sattigeri, Prasanna; Fletcher, Tom: Semi-supervised learning with gans: Manifold invariance with improved inference. In Advances in Neural Information Processing Systems, pages 5534–5544, (2017) Kumar, Abhishek; Sattigeri, Prasanna; Fletcher, Tom: Semi-supervised learning with gans: Manifold invariance with improved inference. In Advances in Neural Information Processing Systems, pages 5534–5544, (2017)
7.
go back to reference Mescheder, Lars; Nowozin, Sebastian; Geiger, Andreas: Adversarial variational bayes: Unifying variational autoencoders and generative adversarial networks. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pages 2391–2400. JMLR. org, (2017) Mescheder, Lars; Nowozin, Sebastian; Geiger, Andreas: Adversarial variational bayes: Unifying variational autoencoders and generative adversarial networks. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pages 2391–2400. JMLR. org, (2017)
8.
go back to reference Wu, Yuhuai; Burda, Yuri; Salakhutdinov, Ruslan; Grosse, Roger: On the quantitative analysis of decoder-based generative models. arXiv preprint arXiv:1611.04273, (2016) Wu, Yuhuai; Burda, Yuri; Salakhutdinov, Ruslan; Grosse, Roger: On the quantitative analysis of decoder-based generative models. arXiv preprint arXiv:​1611.​04273, (2016)
9.
go back to reference Zhao, Shengjia; Song, Jiaming; Ermon, Stefano: Infovae: Information maximizing variational autoencoders. arXiv preprint arXiv:1706.02262, (2017) Zhao, Shengjia; Song, Jiaming; Ermon, Stefano: Infovae: Information maximizing variational autoencoders. arXiv preprint arXiv:​1706.​02262, (2017)
10.
go back to reference Diederik, P Kingma; Welling, Max, et al.: Auto-encoding variational bayes. In Proceedings of the International Conference on Learning Representations (ICLR), (2014) Diederik, P Kingma; Welling, Max, et al.: Auto-encoding variational bayes. In Proceedings of the International Conference on Learning Representations (ICLR), (2014)
11.
go back to reference Higgins, Irina; Matthey, Loic; Pal, Arka; Burgess, Christopher; Glorot, Xavier; Botvinick, Matthew; Mohamed, Shakir; Lerchner, Alexander: beta-vae: Learning basic visual concepts with a constrained variational framework. Iclr 2(5), 6 (2017) Higgins, Irina; Matthey, Loic; Pal, Arka; Burgess, Christopher; Glorot, Xavier; Botvinick, Matthew; Mohamed, Shakir; Lerchner, Alexander: beta-vae: Learning basic visual concepts with a constrained variational framework. Iclr 2(5), 6 (2017)
12.
go back to reference Chen, Xi; Duan, Yan; Houthooft, Rein; Schulman, John; Sutskever, Ilya; Abbeel, Pieter: Infogan: Interpretable representation learning by information maximizing generative adversarial nets. In Advances in neural information processing systems, pages 2172–2180, (2016) Chen, Xi; Duan, Yan; Houthooft, Rein; Schulman, John; Sutskever, Ilya; Abbeel, Pieter: Infogan: Interpretable representation learning by information maximizing generative adversarial nets. In Advances in neural information processing systems, pages 2172–2180, (2016)
13.
go back to reference Brock, Andrew; Donahue, Jeff; Simonyan, Karen: Large scale gan training for high fidelity natural image synthesis. Proceedings of the International Conference on Learning Representations (ICLR), (2018) Brock, Andrew; Donahue, Jeff; Simonyan, Karen: Large scale gan training for high fidelity natural image synthesis. Proceedings of the International Conference on Learning Representations (ICLR), (2018)
14.
go back to reference Zhu, Jun-Yan; Park, Taesung; Isola, Phillip; Efros, Alexei A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE international conference on computer vision, pages 2223–2232, (2017) Zhu, Jun-Yan; Park, Taesung; Isola, Phillip; Efros, Alexei A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE international conference on computer vision, pages 2223–2232, (2017)
15.
go back to reference Miyato, Takeru; Kataoka, Toshiki; Koyama, Masanori; Yoshida, Yuichi: Spectral normalization for generative adversarial networks. arXiv preprint arXiv:1802.05957, (2018) Miyato, Takeru; Kataoka, Toshiki; Koyama, Masanori; Yoshida, Yuichi: Spectral normalization for generative adversarial networks. arXiv preprint arXiv:​1802.​05957, (2018)
16.
go back to reference Manifold regularized generative adversarial networks for scientific data: Qunwei Li, Bhavya Kailkhura, Rushil Anirudh, Jize Zhang, Yi Zhou, Yingbin Liang, T Yong-Jin Han, and Pramod K Varshney. Mr-gan. Proceedings of Machine Learning Research 107, 1–27 (2020) Manifold regularized generative adversarial networks for scientific data: Qunwei Li, Bhavya Kailkhura, Rushil Anirudh, Jize Zhang, Yi Zhou, Yingbin Liang, T Yong-Jin Han, and Pramod K Varshney. Mr-gan. Proceedings of Machine Learning Research 107, 1–27 (2020)
17.
go back to reference Martin, Arjovsky; Lon, B.; Towards principled methods for training generative adversarial networks. In NIPS, : Workshop on Adversarial Training. review for ICLR 2016, 2017 (2016) Martin, Arjovsky; Lon, B.; Towards principled methods for training generative adversarial networks. In NIPS, : Workshop on Adversarial Training. review for ICLR 2016, 2017 (2016)
18.
go back to reference Simard, Patrice; Victorri, Bernard; LeCun, Yann; Denker, John: Tangent prop-a formalism for specifying selected invariances in an adaptive network. In Advances in neural information processing systems, pages 895–903, (1992) Simard, Patrice; Victorri, Bernard; LeCun, Yann; Denker, John: Tangent prop-a formalism for specifying selected invariances in an adaptive network. In Advances in neural information processing systems, pages 895–903, (1992)
19.
go back to reference Hu, Weihua; Miyato, Takeru; Tokui, Seiya; Matsumoto, Eiichi; Sugiyama, Masashi: Learning discrete representations via information maximizing self-augmented training. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pages 1558–1567. JMLR. org, (2017) Hu, Weihua; Miyato, Takeru; Tokui, Seiya; Matsumoto, Eiichi; Sugiyama, Masashi: Learning discrete representations via information maximizing self-augmented training. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pages 1558–1567. JMLR. org, (2017)
20.
go back to reference Aggarwal, Charu C; Reddy, Chandan K.: Data clustering. Algorithms and Application, Boca Raton: CRC Press, (2014) Aggarwal, Charu C; Reddy, Chandan K.: Data clustering. Algorithms and Application, Boca Raton: CRC Press, (2014)
21.
go back to reference Biernacki, Christophe; Celeux, Gilles; Govaert, Gérard: Assessing a mixture model for clustering with the integrated completed likelihood. IEEE Trans. Pattern Anal. Mach. Intell. 22(7), 719–725 (2000)CrossRef Biernacki, Christophe; Celeux, Gilles; Govaert, Gérard: Assessing a mixture model for clustering with the integrated completed likelihood. IEEE Trans. Pattern Anal. Mach. Intell. 22(7), 719–725 (2000)CrossRef
22.
go back to reference Xu, Linli; Neufeld, James; Larson, Bryce; Schuurmans, Dale: Maximum margin clustering. In Advances in neural information processing systems, pages 1537–1544, (2005) Xu, Linli; Neufeld, James; Larson, Bryce; Schuurmans, Dale: Maximum margin clustering. In Advances in neural information processing systems, pages 1537–1544, (2005)
23.
go back to reference Zhao, Bin; Wang, Fei; Zhang, Changshui: Efficient multiclass maximum margin clustering. In Proceedings of the 25th international conference on Machine learning, pages 1248–1255. ACM, (2008) Zhao, Bin; Wang, Fei; Zhang, Changshui: Efficient multiclass maximum margin clustering. In Proceedings of the 25th international conference on Machine learning, pages 1248–1255. ACM, (2008)
24.
go back to reference Ng, Andrew Y.; Jordan, Michael I.; Weiss, Yair: On spectral clustering: Analysis and an algorithm. In Advances in neural information processing systems, pages 849–856, (2002) Ng, Andrew Y.; Jordan, Michael I.; Weiss, Yair: On spectral clustering: Analysis and an algorithm. In Advances in neural information processing systems, pages 849–856, (2002)
26.
go back to reference Steinbach, Michael; Ertöz, Levent; Kumar, Vipin: The challenges of clustering high dimensional data. In New directions in statistical physics, pages 273–309. Springer, (2004) Steinbach, Michael; Ertöz, Levent; Kumar, Vipin: The challenges of clustering high dimensional data. In New directions in statistical physics, pages 273–309. Springer, (2004)
27.
go back to reference Krause, Andreas; Perona, Pietro; Gomes, Ryan G: Discriminative clustering by regularized information maximization. In Advances in neural information processing systems, pages 775–783, (2010) Krause, Andreas; Perona, Pietro; Gomes, Ryan G: Discriminative clustering by regularized information maximization. In Advances in neural information processing systems, pages 775–783, (2010)
28.
go back to reference Roth, Volker; Lange, Tilman: Feature selection in clustering problems. In Advances in neural information processing systems, pages 473–480, (2004) Roth, Volker; Lange, Tilman: Feature selection in clustering problems. In Advances in neural information processing systems, pages 473–480, (2004)
29.
go back to reference Tian, Fei; Gao, Bin; Cui, Qing; Chen, Enhong; Liu, Tie-Yan: Learning deep representations for graph clustering. In Twenty-Eighth AAAI Conference on Artificial Intelligence, (2014) Tian, Fei; Gao, Bin; Cui, Qing; Chen, Enhong; Liu, Tie-Yan: Learning deep representations for graph clustering. In Twenty-Eighth AAAI Conference on Artificial Intelligence, (2014)
30.
go back to reference Chang, Wei-Chien: On using principal components before separating a mixture of two multivariate normal distributions. J. Royal Statistical Soc: Series C (Applied Statistics) 32(3), 267–275 (1983)MathSciNetMATH Chang, Wei-Chien: On using principal components before separating a mixture of two multivariate normal distributions. J. Royal Statistical Soc: Series C (Applied Statistics) 32(3), 267–275 (1983)MathSciNetMATH
31.
go back to reference Yan, Donghui; Huang, Ling; Jordan, Michael I.: Fast approximate spectral clustering. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 907–916. ACM, (2009) Yan, Donghui; Huang, Ling; Jordan, Michael I.: Fast approximate spectral clustering. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 907–916. ACM, (2009)
32.
go back to reference Vincent, Pascal; Larochelle, Hugo; Lajoie, Isabelle; Bengio, Yoshua; Manzagol, Pierre-Antoine: Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. J. Mach. Learn. Res. 11(Dec), 3371–3408 (2010)MathSciNetMATH Vincent, Pascal; Larochelle, Hugo; Lajoie, Isabelle; Bengio, Yoshua; Manzagol, Pierre-Antoine: Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. J. Mach. Learn. Res. 11(Dec), 3371–3408 (2010)MathSciNetMATH
33.
go back to reference Song, Chunfeng; Huang, Yongzhen; Liu, Feng; Wang, Zhenyu; Wang, Liang: Deep auto-encoder based clustering. Intell. Data Anal. 18(6S), S65–S76 (2014)CrossRef Song, Chunfeng; Huang, Yongzhen; Liu, Feng; Wang, Zhenyu; Wang, Liang: Deep auto-encoder based clustering. Intell. Data Anal. 18(6S), S65–S76 (2014)CrossRef
34.
go back to reference Yang, Bo; Fu, Xiao; Sidiropoulos, Nicholas D; Hong, Mingyi: Towards k-means-friendly spaces: Simultaneous deep learning and clustering. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pages 3861–3870. JMLR. org, (2017) Yang, Bo; Fu, Xiao; Sidiropoulos, Nicholas D; Hong, Mingyi: Towards k-means-friendly spaces: Simultaneous deep learning and clustering. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pages 3861–3870. JMLR. org, (2017)
35.
go back to reference Jiang, Zhuxi; Zheng, Yin; Tan, Huachun; Tang, Bangsheng; Zhou, Hanning: Variational deep embedding: An unsupervised and generative approach to clustering. arXiv preprint arXiv:1611.05148, (2016) Jiang, Zhuxi; Zheng, Yin; Tan, Huachun; Tang, Bangsheng; Zhou, Hanning: Variational deep embedding: An unsupervised and generative approach to clustering. arXiv preprint arXiv:​1611.​05148, (2016)
36.
go back to reference Dizaji, Kamran Ghasedi; Herandi, Amirhossein; Deng, Cheng; Cai, Weidong; Huang, Heng: Deep clustering via joint convolutional autoencoder embedding and relative entropy minimization. In Proceedings of the IEEE International Conference on Computer Vision, pages 5736–5745, (2017) Dizaji, Kamran Ghasedi; Herandi, Amirhossein; Deng, Cheng; Cai, Weidong; Huang, Heng: Deep clustering via joint convolutional autoencoder embedding and relative entropy minimization. In Proceedings of the IEEE International Conference on Computer Vision, pages 5736–5745, (2017)
37.
go back to reference Gulrajani, Ishaan; Ahmed, Faruk; Arjovsky, Martin; Dumoulin, Vincent; Courville, Aaron C: Improved training of wasserstein gans. In Advances in neural information processing systems, pages 5767–5777, (2017) Gulrajani, Ishaan; Ahmed, Faruk; Arjovsky, Martin; Dumoulin, Vincent; Courville, Aaron C: Improved training of wasserstein gans. In Advances in neural information processing systems, pages 5767–5777, (2017)
38.
go back to reference Salimans, Tim; Goodfellow, Ian; Zaremba, Wojciech; Cheung, Vicki; Radford, Alec; Chen, Xi: Improved techniques for training gans. In Advances in neural information processing systems, pages 2234–2242, (2016) Salimans, Tim; Goodfellow, Ian; Zaremba, Wojciech; Cheung, Vicki; Radford, Alec; Chen, Xi: Improved techniques for training gans. In Advances in neural information processing systems, pages 2234–2242, (2016)
39.
go back to reference Banerjee, Arindam; Merugu, Srujana; Dhillon, Inderjit S.; Ghosh, Joydeep: Clustering with bregman divergences. J. Mach. Learn. Res. 6(Oct), 1705–1749 (2005)MathSciNetMATH Banerjee, Arindam; Merugu, Srujana; Dhillon, Inderjit S.; Ghosh, Joydeep: Clustering with bregman divergences. J. Mach. Learn. Res. 6(Oct), 1705–1749 (2005)MathSciNetMATH
40.
go back to reference Tikhonov, Andrei Nikolaevich: Regularization of incorrectly posed problems. Soviet Math. Doklady 4, 1624–1627 (1963)MATH Tikhonov, Andrei Nikolaevich: Regularization of incorrectly posed problems. Soviet Math. Doklady 4, 1624–1627 (1963)MATH
41.
go back to reference LeCun, Yann; Bottou, Léon; Bengio, Yoshua; Haffner, Patrick; et al.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)CrossRef LeCun, Yann; Bottou, Léon; Bengio, Yoshua; Haffner, Patrick; et al.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)CrossRef
42.
go back to reference Torralba, Antonio; Fergus, Rob; Freeman, William T.: 80 million tiny images: A large data set for nonparametric object and scene recognition. IEEE Trans. Pattern Anal. Mach. Intell. 30(11), 1958–1970 (2008)CrossRef Torralba, Antonio; Fergus, Rob; Freeman, William T.: 80 million tiny images: A large data set for nonparametric object and scene recognition. IEEE Trans. Pattern Anal. Mach. Intell. 30(11), 1958–1970 (2008)CrossRef
43.
go back to reference Netzer, Yuval; Wang, Tao; Coates, Adam; Bissacco, Alessandro; Wu, Bo; Ng, Andrew Y.: Reading digits in natural images with unsupervised feature learning. (2011) Netzer, Yuval; Wang, Tao; Coates, Adam; Bissacco, Alessandro; Wu, Bo; Ng, Andrew Y.: Reading digits in natural images with unsupervised feature learning. (2011)
44.
go back to reference Lewis, David D.; Yang, Yiming; Rose, Tony G.; Li, Fan: Rcv1: A new benchmark collection for text categorization research. J. Mach. Learn. Res. 5(Apr), 361–397 (2004) Lewis, David D.; Yang, Yiming; Rose, Tony G.; Li, Fan: Rcv1: A new benchmark collection for text categorization research. J. Mach. Learn. Res. 5(Apr), 361–397 (2004)
45.
go back to reference Xie, Junyuan; Girshick, Ross; Farhadi, Ali: Unsupervised deep embedding for clustering analysis. In International conference on machine learning, pages 478–487, (2016b) Xie, Junyuan; Girshick, Ross; Farhadi, Ali: Unsupervised deep embedding for clustering analysis. In International conference on machine learning, pages 478–487, (2016b)
Metadata
Title
Large-Scale Data Clustering Using Manifold-Regularized Ensemble of Posterior in GAN
Authors
Haleh Homayouni
Eghbal Mansoori
Publication date
24-06-2021
Publisher
Springer Berlin Heidelberg
Published in
Arabian Journal for Science and Engineering / Issue 2/2022
Print ISSN: 2193-567X
Electronic ISSN: 2191-4281
DOI
https://doi.org/10.1007/s13369-021-05809-y

Other articles of this Issue 2/2022

Arabian Journal for Science and Engineering 2/2022 Go to the issue

Research Article-Computer Engineering and Computer Science

An Effective Hash-Based Assessment and Recovery Algorithm for Healthcare Systems

Research Article-Computer Engineering and Computer Science

Automated Query Relaxation Mechanism for QoS-Aware Service Provisioning

Research Article-Computer Engineering and Computer Science

High Occupancy Itemset Mining with Consideration of Transaction Occupancy

Premium Partners