Top

Published in:

2019 | OriginalPaper | Chapter

Subspace Clustering—A Survey

Authors : Bhagyashri A. Kelkar, Sunil F. Rodd

Published in: Data Management, Analytics and Innovation

Publisher: Springer Singapore

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

High-dimensional data clustering is gaining attention in recent years due to its widespread applications in many domains like social networking, biology, etc. As a result of the advances in the data gathering and data storage technologies, many a times a single data object is often represented by many attributes. Although more data may provide new insights, it may also hinder the knowledge discovery process by cluttering the interesting relations with redundant information. The traditional definition of similarity becomes meaningless in high-dimensional data. Hence, clustering methods based on similarity between objects fail to cope with increased dimensionality of data. A dataset with large dimensionality can be better described in its subspaces than as a whole. Subspace clustering algorithms identify clusters existing in multiple, overlapping subspaces. Subspace clustering methods are further classified as top-down and bottom-up algorithms depending on strategy applied to identify subspaces. Initial clustering in case of top-down algorithms is based on full set of dimensions and it then iterates to identify subset of dimensions which can better represent the subspaces by removing irrelevant dimensions. Bottom-up algorithms start with low dimensional space and merge dense regions by using Apriori-based hierarchical clustering methods. It has been observed that, the performance and quality of results of a subspace clustering algorithm is highly dependent on the parameter values input to the algorithm. This paper gives an overview of work done in the field of subspace clustering.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

previous chapter Hybrid Kmeans with Improved Bagging for Semantic Analysis of Tweets on Social Causes

next chapter Revisiting Software Reliability

Bellman, R. (1961). Adaptive control processes. Princeton: Princeton University Press.CrossRef

Parsons, L., Haque, E., & Liu, H. (2004). Subspace clustering for high dimensional data: A review. ACM SIGKDD Explorations, 6(1), 90–105.CrossRef

Francois, D., Wertz, V., & Verleysen, M. (2007). The concentration of fractional distances. IEEE Transactions on Knowledge and Data Engineering, 19(7), 873–886.

Agrawal, R., Gehrke, J., & Gunopulos, D. (1998). Automatic subspace clustering of high dimensional data for data mining applications. In Proceedings of the ACM SIGMOD International Conference on Management of Data (pp. 94–105).

Liu, G., Sim, K., Li, J., & Wong, L. (2009). Efficient mining of distance-based subspace clusters. Statistical Analysis and Data Mining, 2(5–6), 427–444.MathSciNetCrossRef

Cheng, C.-H., Fu, A. W., & Zhang, Y. (1999). Entropy-based subspace clustering for mining numerical data. In Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 84–93).

Goil, S., Nagesh, H., & Choudhary, A. (1999). Mafia: Efficient and scalable subspace clustering for very large data sets. Technical Report CPDC-TR-9906-010, Northwestern University.

Kröger, P., Kriegel, H.-P., & Kailing, K. (2004). Density-connected subspace clustering for high-dimensional data. In Proceedings of SIAM International Conference on Data Mining (pp. 246–257).

Kriegel, H.-P. H., Kroger, P., Renz, M., & Wurst, S. (2005). A generic framework for efficient subspace clustering of high-dimensional data. In IEEE International Conference on Data Mining (pp. 250–257), Washington, DC, USA.

10.

Aggarwal, C. C., Procopiuc, C. M., Wolf, J. L., et al. (1999). Fast algorithms for projected clustering. In Proceedings of the ACM International Conference on Management of Data (SIGMOD) (pp. 61–72), Philadelphia, PA.

11.

Procopiuc, C. M., Jones, M., Agarwal, P. K., & Murali, T. M. (2002). A Monte Carlo algorithm for fast projective clustering in SIGMOD (pp. 418–427). USA.

12.

Bohm, C., Railing, K., Kriegel, H.-P., & Kroger, P. (2004). Density connected clustering with local subspace preferences. In Fourth IEEE International Conference on Data Mining, ICDM (pp. 27–34).

13.

Lance, P., Haque, E., & Liu, H. (2004). Subspace clustering for high dimensional data: A review. ACM SIGKDD Explorations Newsletter, 6(1), 90–105.CrossRef

14.

Hinneburg, A., & Keim, D. A. (1999). Optimal grid-clustering: Towards breaking the curse of dimensionality in high-dimensional clustering. In VLDB (pp. 506–517).

15.

Aggarwal, C. C., & Yu, P. S. (2000). Finding generalized projected clusters in high dimensional spaces. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (pp. 70–81).

16.

Friedman, J. H., & Meulman, J. J. (2004). Clustering objects on subsets of attributes. Journal of the Royal Statistical Society: Series B (Statistical Methodology) (pp. 815–849).

17.

Yang, J., Wang, W., Wang, H., & Yu, P. (2002). δ-Clusters: Capturing subspace correlation in a large data set. In Proceedings of the 18th International Conference on Data Engineering (pp. 517–528).

18.

Dash, M., Choi, K., Scheuermann, P., & Liu, H. (2002). Feature selection for clustering – a filter solution. In Proceedings of the IEEE International Conference on Data Mining (ICDM02) (pp. 115–124).

19.

Patrikainen, A., & Meila, M. (2006). Comparing subspace clusterings. TKDE, 18(7), 902–916.

20.

Müller, E., Günnemann, S., Assent, I., & Seidl, T. (2009). Evaluating clustering in subspace projections of high dimensional data. PVLDB, 2(1), 1270–1281.

21.

Weka 3: Data Mining Software in Java. (2014). Available: http://www.cs.waikato.ac.nz/ml/weka/.

22.

OpenSubspace:Weka Subspace-Clustering Integration. (2014). Available: http://dme.rwth-aachen.de/OpenSubspace/.

23.

Jaya Lakshmi, B., Shashi, M., & Madhuri, K. B. (2017). A rough set based subspace clustering technique for high dimensional data. Journal of King Saud University-Computer and Information Sciences.

24.

Jaya Lakshmi, B., Madhuri, K. B., & Shashi, M. (2017). An efficient algorithm for density based subspace clustering with dynamic parameter setting. International Journal of Information Technology and Computer Science, 9(6), 27–33.CrossRef

25.

Tomašev, N., & Radovanović, M. (2016). Clustering evaluation in high-dimensional data. In Unsupervised Learning Algorithms (pp. 71–107). Berlin: Springer.

26.

Zhu, B., Ordozgoiti, B., & Mozo, A. (2016). PSCEG: An unbiased parallel subspace clustering algorithm using exact grids. In 24th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning ESSAN16 (pp. 27–29), Bruges (Belgium).

27.

Peignier, S., Rigotti, C., & Beslon, G. (2015). Subspace clustering using evolvable genome structure. In Proceedings of the ACM Genetic and Evolutionary Computation Conference (GECCO 2015) (pp. 1–8).

28.

Kaur, A., & Datta, A. (2015). A novel algorithm for fast and scalable subspace clustering of high-dimensional data. Journal of Big Data, 2(1), 1–24.

29.

Xu, D., & Tian, Y. (2015). A comprehensive survey of clustering algorithms. Annals of Data Science, 2, 165–193.CrossRef

30.

Sim, K., Gopalkrishnan, V., Zimek, A., & Cong, G. (2013). A survey on enhanced subspace clustering. Data Mining and Knowledge Discovery, 26(2), 332–397.MathSciNetCrossRef

31.

Liu, H. W., Sun, J., Liu, L., & Zhang, H. J. (2009). Feature selection with dynamic mutual information. Pattern Recognition, 42(7), 1330–1339.CrossRef

32.

Kriegel, H. P., Kröger, P., Zimek, A., & Oger, P. K. R. (2009). Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering. ACM Transactions on Knowledge Discovery Data, 3(1), 1–58.CrossRef

Title: Subspace Clustering—A Survey
Authors: Bhagyashri A. Kelkar
Sunil F. Rodd
Publisher: Springer Singapore
Book: Data Management, Analytics and Innovation
Print ISBN: 978-981-13-1401-8

Electronic ISBN: 978-981-13-1402-5

Copyright Year: 2019
DOI: https://doi.org/10.1007/978-981-13-1402-5_16

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"