research-article

Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering

Authors:
Hans-Peter Kriegel

Ludwig-Maximilians-Universität München, Munich, Germany

Ludwig-Maximilians-Universität München, Munich, Germany
View Profile

,
Peer Kröger

Ludwig-Maximilians-Universität München, Munich, Germany

Ludwig-Maximilians-Universität München, Munich, Germany
View Profile

,
Arthur Zimek

Ludwig-Maximilians-Universität München, Munich, Germany

Ludwig-Maximilians-Universität München, Munich, Germany
View Profile

ACM Transactions on Knowledge Discovery from Data Volume 3 Issue 1Article No.: 1pp 1–58https://doi.org/10.1145/1497577.1497578

Published:23 March 2009Publication History

ACM Transactions on Knowledge Discovery from Data

Abstract

As a prolific research area in data mining, subspace clustering and related problems induced a vast quantity of proposed solutions. However, many publications compare a new proposition—if at all—with one or two competitors, or even with a so-called “naïve” ad hoc solution, but fail to clarify the exact problem definition. As a consequence, even if two solutions are thoroughly compared experimentally, it will often remain unclear whether both solutions tackle the same problem or, if they do, whether they agree in certain tacit assumptions and how such assumptions may influence the outcome of an algorithm. In this survey, we try to clarify: (i) the different problem definitions related to subspace clustering in general; (ii) the specific difficulties encountered in this field of research; (iii) the varying assumptions, heuristics, and intuitions forming the basis of different approaches; and (iv) how several prominent solutions tackle different problems.

References

Achtert, E., Böhm, C., David, J., Kröger, P., and Zimek, A. 2008. Robust clustering in arbitrarily oriented subspaces. In Proceedings of the 8th SIAM International Conference on Data Mining (SDM).Google Scholar
Achtert, E., Böhm, C., Kriegel, H.-P., Kröger, P., Müller-Gorman, I., and Zimek, A. 2007. Detection and visualization of subspace cluster hierarchies. In Proceedings of the 12th International Conference on Database Systems for Advanced Applications (DASFAA). Google ScholarDigital Library
Achtert, E., Böhm, C., Kriegel, H.-P., Kröger, P., and Zimek, A. 2006. Deriving quantitative models for correlation clusters. In Proceedings of the 12th ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD). Google ScholarDigital Library
Achtert, E., Böhm, C., Kriegel, H.-P., Kröger, P., and Zimek, A. 2007a. On exploring complex relationships of correlation clusters. In Proceedings of the 19th International Conference on Scientific and Statistical Database Management (SSDBM). Google ScholarDigital Library
Achtert, E., Böhm, C., Kriegel, H.-P., Kröger, P., and Zimek, A. 2007b. Robust, complete, and efficient correlation clustering. In Proceedings of the 7th SIAM International Conference on Data Mining (SDM).Google Scholar
Achtert, E., Böhm, C., Kröger, P., and Zimek, A. 2006a. Mining hierarchies of correlation clusters. In Proceedings of the 18th International Conference on Scientific and Statistical Database Management (SSDBM). Google ScholarDigital Library
Achtert, E., Kriegel, H.-P., and Zimek, A. 2008a. ELKI: A software system for evaluation of subspace clustering algorithms. In Proceedings of the 20th International Conference on Scientific and Statistical Database Management (SSDBM). Google ScholarDigital Library
Aggarwal, C. C., Hinneburg, A., and Keim, D. 2001. On the surprising behavior of distance metrics in high dimensional space. In Proceedings of the 8th International Conference on Database Theory (ICDT). Google ScholarDigital Library
Aggarwal, C. C., Procopiuc, C. M., Wolf, J. L., Yu, P. S., and Park, J. S. 1999. Fast algorithms for projected clustering. In Proceedings of the ACM International Conference on Management of Data (SIGMOD). Google ScholarDigital Library
Aggarwal, C. C. and Yu, P. S. 2000. Finding generalized projected clusters in high dimensional space. In Proceedings of the ACM International Conference on Management of Data (SIGMOD). Google ScholarDigital Library
Agrawal, R., Gehrke, J., Gunopulos, D., and Raghavan, P. 1998. Automatic subspace clustering of high dimensional data for data mining applications. In Proceedings of the ACM International Conference on Management of Data (SIGMOD). Google ScholarDigital Library
Agrawal, R. and Srikant, R. 1994. Fast algorithms for mining association rules. In Proceedings of the ACM International Conference on Management of Data (SIGMOD).Google Scholar
Ankerst, M., Breunig, M. M., Kriegel, H.-P., and Sander, J. 1999. OPTICS: Ordering points to identify the clustering structure. In Proceedings of the ACM International Conference on Management of Data (SIGMOD). Google ScholarDigital Library
Assent, I., Krieger, R., Müller, E., and Seidl, T. 2007a. DUSC: Dimensionality unbiased subspace clustering. In Proceedings of the 7th International Conference on Data Mining (ICDM). Google ScholarDigital Library
Assent, I., Krieger, R., Müller, E., and Seidl, T. 2007b. VISA: Visual subspace clustering analysis. ACM SIGKDD Explor. Newslett. 9, 2, 5--12. Google ScholarDigital Library
Bansal, N., Blum, A., and Chawla, S. 2004. Correlation clustering. Mach. Learn. 56, 89--113. Google ScholarDigital Library
Barbara, D. and Chen, P. 2000. Using the fractal dimension to cluster datasets. In Proceedings of the 6th ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD). Google ScholarDigital Library
Bellman, R. 1961. Adaptive Control Processes. A Guided Tour. Princeton University Press.Google Scholar
Belussi, A. and Faloutsos, C. 1995. Estimating the selectivity of spatial queries using the ‘correlation’ fractal dimension. In Proceedings of the 21st International Conference on Very Large Data Bases (VLDB). Google ScholarDigital Library
Ben-Dor, A., Chor, B., Karp, R., and Yakhini, Z. 2002. Discovering local structure in gene expression data: The order-preserving submatrix problem. In Proceedings of the 6th Annual International Conference on Computational Molecular Biology (RECOMB). Google ScholarDigital Library
Berchtold, S., Böhm, C., Jagadish, H. V., Kriegel, H.-P., and Sander, J. 2000. Independent quantization: An index compression technique for high-dimensional data spaces. In Proceedings of the 16th International Conference on Data Engineering (ICDE). Google ScholarDigital Library
Berchtold, S., Böhm, C., and Kriegel, H.-P. 1998. The pyramid technique: Towards breaking the curse of dimensionality. In Proceedings of the ACM International Conference on Management of Data (SIGMOD). Google ScholarDigital Library
Berchtold, S., Ertl, B., Keim, D. A., Kriegel, H.-P., and Seidl, T. 1998a. Fast nearest neighbor search in high-dimensional spaces. In Proceedings of the 14th International Conference on Data Engineering (ICDE). Google ScholarDigital Library
Berchtold, S., Keim, D. A., and Kriegel, H.-P. 1996. The X-tree: An index structure for high-dimensional data. In Proceedings of the 22nd International Conference on Very Large Data Bases (VLDB). Google ScholarDigital Library
Beyer, K., Goldstein, J., Ramakrishnan, R., and Shaft, U. 1999. When is “nearest neighbor” meaningful&quest; In Proceedings of the 7th International Conference on Database Theory (ICDT). Google ScholarDigital Library
Bishop, C. M. 2006. Pattern Recognition and Machine Learning. Springer. Google ScholarDigital Library
Bouveyron, C., Girard, S., and Schmid, C. 2007. High-dimensional data clustering. Comput. Statist. Data Anal. 52, 502--519.Google ScholarCross Ref
Böhm, C., Kailing, K., Kriegel, H.-P., and Kröger, P. 2004. Density connected clustering with local subspace preferences. In Proceedings of the 4th International Conference on Data Mining (ICDM). Google ScholarDigital Library
Böhm, C., Kailing, K., Kröger, P., and Zimek, A. 2004a. Computing clusters of correlation connected objects. In Proceedings of the ACM International Conference on Management of Data (SIGMOD). Google ScholarDigital Library
Böhm, C. and Kriegel, H.-P. 2000a. Dynamically optimizing high-dimensional index structures. In Proceedings of the 7th International Conference on Extending Database Technology (EDBT). Google ScholarDigital Library
Böhm, C. and Kriegel, H.-P. 2000b. Efficient construction of large high-dimensional indexes. In Proceedings of the 16th International Conference on Data Engineering (ICDE).Google Scholar
Califano, A., Stolovitzky, G., and Tu, Y. 2000. Analysis of gene expression microarrays for phenotype classification. In Proceedings of the 8th International Conference on Intelligent Systems for Molecular Biology (ISMB). Google ScholarDigital Library
Chakrabarti, K. and Mehrotra, S. 2000. Local dimensionality reduction: A new approach to indexing high dimensional spaces. In Proceedings of the 26th International Conference on Very Large Data Bases (VLDB). Google ScholarDigital Library
Cheng, C. H., Fu, A. W.-C., and Zhang, Y. 1999. Entropy-Based subspace clustering for mining numerical data. In Proceedings of the 5th ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD). 84--93. Google ScholarDigital Library
Cheng, H., Hua, K. A., and Vu, K. 2008. Constrained locally weighted clustering. In Proceedings of the 34nd International Conference on Very Large Data Bases (VLDB). Google ScholarDigital Library
Cheng, Y. and Church, G. M. 2000. Biclustering of expression data. In Proceedings of the 8th International Conference on Intelligent Systems for Molecular Biology (ISMB). Google ScholarDigital Library
Cho, H., Dhillon, I. S., Guan, Y., and Sra, S. 2004. Minimum sum-squared residue co-clustering of gene expression data. In Proceedings of the 4th SIAM International Conference on Data Mining (SDM).Google Scholar
Dhillon, I. S. 2001. Co-clustering documents and words using bipartite spectral graph partitioning. In Proceedings of the 7th ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD). Google ScholarDigital Library
Domeniconi, C., Papadopoulos, D., Gunopulos, D., and Ma, S. 2004. Subspace clustering of high dimensional data. In Proceedings of the 4th SIAM International Conference on Data Mining (SDM).Google Scholar
Duda, R. O. and Hart, P. E. 1972. Use of the Hough transformation to detect lines and curves in pictures. Commun. ACM 15, 1, 11--15. Google ScholarDigital Library
Duda, R. O., Hart, P. E., and Stork, D. G. 2001. Pattern Classification, 2nd ed. John Wiley&Sons. Google ScholarDigital Library
Ester, M., Kriegel, H.-P., Sander, J., and Xu, X. 1996. A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of the 2nd ACM International Conference on Knowledge Discovery and Data Mining (KDD).Google Scholar
Faloutsos, C. and Kamel, I. 1994. Beyond uniformity and independence: Analysis of R-trees using the concept of fractal dimension. In Proceedings of the ACM International Conference on Management of Data (SIGMOD). Google ScholarDigital Library
Faloutsos, C. and Megalooikonomou, V. 2007. On data mining, compression, and Kolmogorov complexity. Data Mining Knowl. Discov. 15, 1, 3--20. Google ScholarDigital Library
Fischler, M. A. and Bolles, R. C. 1981. Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24, 6, 381--395. Google ScholarDigital Library
Friedman, J. H. and Meulman, J. J. 2004. Clustering objects on subsets of attributes. J. Royal Statist. Soc. Series B (Statistical Methodology) 66, 4, 825--849.Google ScholarCross Ref
Ganter, B. and Wille, R. 1999. Formal Concept Analysis. Mathematical Foundations. Springer. Google ScholarDigital Library
Garey, M. R. and Johnson, D. S. 1979. Computers and Intractability. A Guide to the Theory of NP-Completeness. W. H. Freeman. Google ScholarDigital Library
Georgii, E., Richter, L., Rückert, U., and Kramer, S. 2005. Analyzing microarray data using quantitative association rules. Bioinf. 21, 2, ii1--ii8. Google ScholarDigital Library
Getz, G., Levine, E., and Domany, E. 2000. Coupled two-way clustering analysis of gene microarray data. Proc. National Academy Sci. United States Amer. 97, 22, 12079--12084.Google ScholarCross Ref
Gionis, A., Hinneburg, A., Papadimitriou, S., and Tsaparas, P. 2005. Dimension induced clustering. In Proceedings of the 11th ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD). Google ScholarDigital Library
Han, J. and Kamber, M. 2001. Data Mining: Concepts and Techniques. Academic Press. Google ScholarDigital Library
Han, J. and Kamber, M. 2006. Data Mining: Concepts and Techniques, 2nd ed. Academic Press. Google ScholarDigital Library
Hand, D., Mannila, H., and Smyth, P. 2001. Principles of Data Mining. The MIT Press. Google ScholarDigital Library
Haralick, R. and Harpaz, R. 2005. Linear manifold clustering. In Proceedings of the 4th International Conference on Machine Learning and Data Mining in Pattern Recognition (MLDM). Google ScholarDigital Library
Harpaz, R. 2007. Model-Based linear manifold clustering. Ph.D. thesis, The City University of New York, Department of Computer Science. Google ScholarDigital Library
Harpaz, R. and Haralick, R. 2007a. Linear manifold correlation clustering. Int. J. Inf. Technol. Intell. Comput. 2, 2.Google Scholar
Harpaz, R. and Haralick, R. 2007b. Mining subspace correlations. In Proceedings of the IEEE Symposium on Computational Intelligence and Data Mining (CIDM).Google Scholar
Hartigan, J. A. 1972. Direct clustering of a data matrix. J. Amer. Statist. Assoc. 67, 337, 123--129.Google ScholarCross Ref
Hastie, T., Tibshirani, R., and Friedman, J. 2001. The Elements of Statistical Learning. Data Mining, Inference, and Prediction. Springer.Google Scholar
Hinneburg, A., Aggarwal, C. C., and Keim, D. A. 2000. What is the nearest neighbor in high dimensional spaces&quest; In Proceedings of the 26th International Conference on Very Large Data Bases (VLDB). Google ScholarDigital Library
Hough, P. V. C. 1962. Methods and means for recognizing complex patterns. U.S. patent 3069654.Google Scholar
Huang, J. Z., Ng, M. K., Rong, H., and Li, Z. 2005. Automated variable weighting in k-means type clustering. IEEE Trans. Pattern Anal. Mach. Intell. 27, 5, 657--668. Google ScholarDigital Library
Ihmels, J., Bergmann, S., and Barkai, N. 2004. Defining transcription modules using large-scale gene expression data. Bioinf. 20, 13, 1993--2003. Google ScholarDigital Library
Jain, A. K., Murty, M. N., and Flynn, P. J. 1999. Data clustering: A review. ACM Comput. Surv. Google ScholarDigital Library
Jiang, D., Tang, C., and Zhang, A. 2004. Cluster analysis for gene expression data: A survey. IEEE Trans. Knowl. Data Eng. 16, 11, 1370--1386. Google ScholarDigital Library
Jing, L., Ng, M. K., and Huang, J. Z. 2007. An entropy weighting k-means algorithm for subspace clustering of high-dimensional sparse data. IEEE Trans. Knowl. Data Eng. 19, 8, 1026--1041. Google ScholarDigital Library
Jolliffe, I. T. 2002. Principal Component Analysis, 2nd ed. Springer.Google Scholar
Kailing, K., Kriegel, H.-P., and Kröger, P. 2004. Density-Connected subspace clustering for high-dimensional data. In Proceedings of the 4th SIAM International Conference on Data Mining (SDM).Google Scholar
Katayama, N. and Satoh, S. 1997. The SR-tree: An index structure for high-dimensional nearest neighbor queries. In Proceedings of the ACM International Conference on Management of Data (SIGMOD). Google ScholarDigital Library
Kettenring, J. R. 2008. A perspective on cluster analysis. Short communication. Statist. Anal. Data Mining 1, 1, 52--53. Google ScholarDigital Library
Korn, F., Pagel, B.-U., and Falutsos, C. 2001. On the “dimensionality curse” and the “self-similarity blessing”. IEEE Trans. Knowl. Data Eng. 13, 1, 96--111. Google ScholarDigital Library
Kriegel, H.-P., Kröger, P., Renz, M., and Wurst, S. 2005. A generic framework for efficient subspace clustering of high-dimensional data. In Proceedings of the 5th International Conference on Data Mining (ICDM). Google ScholarDigital Library
Kriegel, H.-P., Kröger, P., Schubert, E., and Zimek, A. 2008. A general framework for increasing the robustness of PCA-based correlation clustering algorithms. In Proceedings of the 20th International Conference on Scientific and Statistical Database Management (SSDBM). Google ScholarDigital Library
Kriegel, H.-P., Kröger, P., and Zimek, A. 2007. Detecting clusters in moderate-to-high-dimensional data: Subspace clustering, pattern-based clustering, and correlation clustering. Tutorial at the 7th International Conference on Data Mining (ICDM).Google Scholar
Li, J., Huang, X., Selke, C., and Yong, J. 2007. A fast algorithm for finding correlation clusters in noise data. In Proceedings of the 11th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD). Google ScholarDigital Library
Lin, K., Jagadish, H. V., and Faloutsos, C. 1995. The TV-tree: An index structure for high-dimensional data. VLDB J. 3, 517--542. Google ScholarDigital Library
Liu, B., Xia, Y., and Yu, P. S. 2000. Clustering through decision tree construction. In Proceedings of the 9th International Conference on Information and Knowledge Management (CIKM). Google ScholarDigital Library
Liu, G., Li, J., Sim, K., and Wong, L. 2007. Distance based subspace clustering with flexible dimension partitioning. In Proceedings of the 23th International Conference on Data Engineering (ICDE).Google Scholar
Liu, J. and Wang, W. 2003. OP-cluster: Clustering by tendency in high dimensional spaces. In Proceedings of the 3th International Conference on Data Mining (ICDM). Google ScholarDigital Library
Madeira, S. C. and Oliveira, A. L. 2004. Biclustering algorithms for biological data analysis: A survey. IEEE Trans. Comput. Biol. Bioinf. 1, 1, 24--45. Google ScholarDigital Library
Mirkin, B. 1996. Mathematical Classification and Clustering. Kluwer.Google Scholar
Mitchell, T. M. 1997. Mach. Learn.. McGraw-Hill.Google Scholar
Moise, G. and Sander, J. 2008. Finding non-redundant, statistically significant regions in high dimensional data: A novel approach to projected and subspace clustering. In Proceedings of the 14th ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD). Google ScholarDigital Library
Moise, G., Sander, J., and Ester, M. 2006. P3C: A robust projected clustering algorithm. In Proceedings of the 6th International Conference on Data Mining (ICDM). Google ScholarDigital Library
Moise, G., Sander, J., and Ester, M. 2008. Robust projected clustering. Knowl. Inf. Syst. 14, 3, 273--298. Google ScholarDigital Library
Murali, T. M. and Kasif, S. 2003. Extracting conserved gene expression motifs from gene expression data. In Proceedings of the 8th Pacific Symposium on Biocomputing (PSB).Google Scholar
Nagesh, H., Goil, S., and Choudhary, A. 2001. Adaptive grids for clustering massive data sets. In Proceedings of the 1st SIAM International Conference on Data Mining (SDM).Google Scholar
Pagel, B.-U., Korn, F., and Faloutsos, C. 2000. Deflating the dimensionality curse using multiple fractal dimensions. In Proceedings of the 16th International Conference on Data Engineering (ICDE). Google ScholarDigital Library
Parros Machado de Sousa, E., Traina, C., Traina, A., and Faloutsos, C. 2002. How to use fractal dimension to find correlations between attributes. In Proceedings of the KDD-Workshop on Fractals and Self-Similarity in Data Mining: Issues and Approaches.Google Scholar
Parsons, L., Haque, E., and Liu, H. 2004. Subspace clustering for high dimensional data: A review. SIGKDD Explor. 6, 1, 90--105. Google ScholarDigital Library
Pei, J., Zhang, X., Cho, M., Wang, H., and Yu, P. S. 2003. MaPle: A fast algorithm for maximal pattern-based clustering. In Proceedings of the 3th International Conference on Data Mining (ICDM). Google ScholarDigital Library
Pfaltz, J. 2007. What constitutes a scientific database&quest; In Proceedings of the 19th International Conference on Scientific and Statistical Database Management (SSDBM). Google ScholarDigital Library
Prelić, A., Bleuler, S., Zimmermann, P., Wille, A., Bühlmann, P., Guissem, W., Hennig, L., Thiele, L., and Zitzler, E. 2006. A systematic comparison and evaluation of biclustering methods for gene expression data. Bioinf. 22, 9, 1122--1129. Google ScholarDigital Library
Procopiuc, C. M., Jones, M., Agarwal, P. K., and Murali, T. M. 2002. A Monte Carlo algorithm for fast projective clustering. In Proceedings of the ACM International Conference on Management of Data (SIGMOD). Google ScholarDigital Library
Rückert, U., Richter, L., and Kramer, S. 2004. Quantitative association rules based on half-spaces: An optimization approach. In Proceedings of the 4th International Conference on Data Mining (ICDM) 507--510. Google ScholarDigital Library
Segal, E., Taskar, B., Gasch, A., Friedman, N., and Koller, D. 2001. Rich probabilistic models for gene expression. Bioinf. 17, 1, S243--S252.Google ScholarCross Ref
Sequeira, K. and Zaki, M. J. 2005. SCHISM: A new approach to interesting subspace mining. Int. J. Business Intell. Data Mining 1, 2, 137--160. Google ScholarDigital Library
Sheng, Q., Moreau, Y., and De Moor, B. 2003. Biclustering microarray data by Gibbs sampling. Bioinf. 19, 2, ii196--ii205.Google Scholar
Sim, K., Li, J., Gopalkrishnan, V., and Liu, G. 2006. Mining maximal quasi-bicliques to cocluster stocks and financial ratios for value investment. In Proceedings of the 6th International Conference on Data Mining (ICDM). Google ScholarDigital Library
Slagle, J. L., Chang, C. L., and Heller, S. L. 1975. A clustering and data-reorganization algorithm. IEEE Trans. Syst. Man. Cybernetics 5, 121--128.Google Scholar
Tan, P.-N., Steinbach, M., and Kumar, V. 2006. Introduction to Data Mining. Addison Wesley. Google ScholarDigital Library
Tanay, A., Sharan, R., and Shamir, R. 2002. Discovering statistically significant biclusters in gene expression data. Bioinf. 18, 1, S136--S144.Google ScholarCross Ref
Tanay, A., Sharan, R., and Shamir, R. 2006. Biclustering algorithms: A survey. In Handbook of Computational Molecular Biology, S. Aluru, Ed. Chapman & Hall.Google Scholar
Tung, A. K. H., Xu, X., and Ooi, C. B. 2005. CURLER: Finding and visualizing nonlinear correlated clusters. In Proceedings of the ACM International Conference on Management of Data (SIGMOD). Google ScholarDigital Library
Van Mechelen, I., Bock, H.-H., and De Boeck, P. 2004. Two-Mode clustering methods: A structured overview. Statist. Methods Med. Res. 13, 363--394.Google ScholarCross Ref
Wang, H., Wang, W., Yang, J., and Yu, P. S. 2002. Clustering by pattern similarity in large data sets. In Proceedings of the ACM International Conference on Management of Data (SIGMOD). Google ScholarDigital Library
Webb, G. I. 2001. Discovering associations with numeric variables. In Proceedings of the 7th ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD). 383--388. Google ScholarDigital Library
Weber, R., Schek, H.-J., and Blott, S. 1998. A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In Proceedings of the 24th International Conference on Very Large Data Bases (VLDB). Google ScholarDigital Library
Witten, I. H. and Frank, E. 2005. Data Mining: Practical Machine Learning Tools and Techniques, 2nd ed. Morgan Kaufmann. Google ScholarDigital Library
Woo, K.-G., Lee, J.-H., Kim, M.-H., and Lee, Y.-J. 2004. FINDIT: A fast and intelligent subspace clustering algorithm using dimension voting. Inf. Softw. Technol. 46, 4, 255--271.Google ScholarCross Ref
Yang, J., Wang, W., Wang, H., and Yu, P. S. 2002. Δ-clusters: Capturing subspace correlation in a large dataset. In Proceedings of the 18th International Conference on Data Engineering (ICDE). Google ScholarDigital Library
Yip, K. Y., Cheung, D. W., and Ng, M. K. 2004. HARP: A practical projected clustering algorithm. IEEE Trans. Knowl. Data Eng. 16, 11, 1387--1397. Google ScholarDigital Library
Yip, K. Y., Cheung, D. W., and Ng, M. K. 2005. On discovery of extremely low-dimensional clusters using semi-supervised projected clustering. In Proceedings of the 21st International Conference on Data Engineering (ICDE). Google ScholarDigital Library
Yiu, M. L. and Mamoulis, N. 2003. Frequent-Pattern based iterative projected clustering. In Proceedings of the 3th International Conference on Data Mining (ICDM). Google ScholarDigital Library
Yiu, M. L. and Mamoulis, N. 2005. Iterative projected clustering by subspace mining. IEEE Trans. Knowl. Data Eng. 17, 2, 176--189. Google ScholarDigital Library

Index Terms

Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering

Recommendations

Iterative random projections for high-dimensional data clustering

In this text we propose a method which efficiently performs clustering of high-dimensional data. The method builds on random projection and the K-means algorithm. The idea is to apply K-means several times, increasing the dimensionality of the data ...
Read More
Self-tuning clustering for high-dimensional data

Spectral clustering is an important component of clustering method, via tightly relying on the affinity matrix. However, conventional spectral clustering methods 1). equally treat each data point, so that easily affected by the outliers; 2). are ...
Read More
Interactive information bottleneck for high-dimensional co-occurrence data clustering
Abstract
Clustering high-dimensional data is quite challenging due to lots of redundant and irrelevant information contained in features. Most existing methods sequentially or jointly perform the feature dimensionality reduction and data ...
Highlights
- A novel interactive information bottleneck is proposed.
- Data clustering and low-...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM Transactions on Knowledge Discovery from Data Volume 3, Issue 1
March 2009
251 pages
ISSN:1556-4681
EISSN:1556-472X
DOI:10.1145/1497577
Issue’s Table of Contents

Copyright © 2009 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 23 March 2009
- Revised: 1 October 2008
- Accepted: 1 October 2008
- Received: 1 May 2008
Published in tkdd Volume 3, Issue 1

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Survey
clustering
high-dimensional data
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 798
  Total Citations
  View Citations
- 17,292
  Total Downloads
- Downloads (Last 12 months)401
- Downloads (Last 6 weeks)62
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering

ACM Transactions on Knowledge Discovery from Data

Abstract

References

Cited By

Index Terms

Recommendations

Iterative random projections for high-dimensional data clustering

Self-tuning clustering for high-dimensional data

Interactive information bottleneck for high-dimensional co-occurrence data clustering

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering

ACM Transactions on Knowledge Discovery from Data

Abstract

References

Cited By

Index Terms

Recommendations

Iterative random projections for high-dimensional data clustering

Self-tuning clustering for high-dimensional data

Interactive information bottleneck for high-dimensional co-occurrence data clustering

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media