Abstract
As a prolific research area in data mining, subspace clustering and related problems induced a vast quantity of proposed solutions. However, many publications compare a new proposition—if at all—with one or two competitors, or even with a so-called “naïve” ad hoc solution, but fail to clarify the exact problem definition. As a consequence, even if two solutions are thoroughly compared experimentally, it will often remain unclear whether both solutions tackle the same problem or, if they do, whether they agree in certain tacit assumptions and how such assumptions may influence the outcome of an algorithm. In this survey, we try to clarify: (i) the different problem definitions related to subspace clustering in general; (ii) the specific difficulties encountered in this field of research; (iii) the varying assumptions, heuristics, and intuitions forming the basis of different approaches; and (iv) how several prominent solutions tackle different problems.
- Achtert, E., Böhm, C., David, J., Kröger, P., and Zimek, A. 2008. Robust clustering in arbitrarily oriented subspaces. In Proceedings of the 8th SIAM International Conference on Data Mining (SDM).Google Scholar
- Achtert, E., Böhm, C., Kriegel, H.-P., Kröger, P., Müller-Gorman, I., and Zimek, A. 2007. Detection and visualization of subspace cluster hierarchies. In Proceedings of the 12th International Conference on Database Systems for Advanced Applications (DASFAA). Google ScholarDigital Library
- Achtert, E., Böhm, C., Kriegel, H.-P., Kröger, P., and Zimek, A. 2006. Deriving quantitative models for correlation clusters. In Proceedings of the 12th ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD). Google ScholarDigital Library
- Achtert, E., Böhm, C., Kriegel, H.-P., Kröger, P., and Zimek, A. 2007a. On exploring complex relationships of correlation clusters. In Proceedings of the 19th International Conference on Scientific and Statistical Database Management (SSDBM). Google ScholarDigital Library
- Achtert, E., Böhm, C., Kriegel, H.-P., Kröger, P., and Zimek, A. 2007b. Robust, complete, and efficient correlation clustering. In Proceedings of the 7th SIAM International Conference on Data Mining (SDM).Google Scholar
- Achtert, E., Böhm, C., Kröger, P., and Zimek, A. 2006a. Mining hierarchies of correlation clusters. In Proceedings of the 18th International Conference on Scientific and Statistical Database Management (SSDBM). Google ScholarDigital Library
- Achtert, E., Kriegel, H.-P., and Zimek, A. 2008a. ELKI: A software system for evaluation of subspace clustering algorithms. In Proceedings of the 20th International Conference on Scientific and Statistical Database Management (SSDBM). Google ScholarDigital Library
- Aggarwal, C. C., Hinneburg, A., and Keim, D. 2001. On the surprising behavior of distance metrics in high dimensional space. In Proceedings of the 8th International Conference on Database Theory (ICDT). Google ScholarDigital Library
- Aggarwal, C. C., Procopiuc, C. M., Wolf, J. L., Yu, P. S., and Park, J. S. 1999. Fast algorithms for projected clustering. In Proceedings of the ACM International Conference on Management of Data (SIGMOD). Google ScholarDigital Library
- Aggarwal, C. C. and Yu, P. S. 2000. Finding generalized projected clusters in high dimensional space. In Proceedings of the ACM International Conference on Management of Data (SIGMOD). Google ScholarDigital Library
- Agrawal, R., Gehrke, J., Gunopulos, D., and Raghavan, P. 1998. Automatic subspace clustering of high dimensional data for data mining applications. In Proceedings of the ACM International Conference on Management of Data (SIGMOD). Google ScholarDigital Library
- Agrawal, R. and Srikant, R. 1994. Fast algorithms for mining association rules. In Proceedings of the ACM International Conference on Management of Data (SIGMOD).Google Scholar
- Ankerst, M., Breunig, M. M., Kriegel, H.-P., and Sander, J. 1999. OPTICS: Ordering points to identify the clustering structure. In Proceedings of the ACM International Conference on Management of Data (SIGMOD). Google ScholarDigital Library
- Assent, I., Krieger, R., Müller, E., and Seidl, T. 2007a. DUSC: Dimensionality unbiased subspace clustering. In Proceedings of the 7th International Conference on Data Mining (ICDM). Google ScholarDigital Library
- Assent, I., Krieger, R., Müller, E., and Seidl, T. 2007b. VISA: Visual subspace clustering analysis. ACM SIGKDD Explor. Newslett. 9, 2, 5--12. Google ScholarDigital Library
- Bansal, N., Blum, A., and Chawla, S. 2004. Correlation clustering. Mach. Learn. 56, 89--113. Google ScholarDigital Library
- Barbara, D. and Chen, P. 2000. Using the fractal dimension to cluster datasets. In Proceedings of the 6th ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD). Google ScholarDigital Library
- Bellman, R. 1961. Adaptive Control Processes. A Guided Tour. Princeton University Press.Google Scholar
- Belussi, A. and Faloutsos, C. 1995. Estimating the selectivity of spatial queries using the ‘correlation’ fractal dimension. In Proceedings of the 21st International Conference on Very Large Data Bases (VLDB). Google ScholarDigital Library
- Ben-Dor, A., Chor, B., Karp, R., and Yakhini, Z. 2002. Discovering local structure in gene expression data: The order-preserving submatrix problem. In Proceedings of the 6th Annual International Conference on Computational Molecular Biology (RECOMB). Google ScholarDigital Library
- Berchtold, S., Böhm, C., Jagadish, H. V., Kriegel, H.-P., and Sander, J. 2000. Independent quantization: An index compression technique for high-dimensional data spaces. In Proceedings of the 16th International Conference on Data Engineering (ICDE). Google ScholarDigital Library
- Berchtold, S., Böhm, C., and Kriegel, H.-P. 1998. The pyramid technique: Towards breaking the curse of dimensionality. In Proceedings of the ACM International Conference on Management of Data (SIGMOD). Google ScholarDigital Library
- Berchtold, S., Ertl, B., Keim, D. A., Kriegel, H.-P., and Seidl, T. 1998a. Fast nearest neighbor search in high-dimensional spaces. In Proceedings of the 14th International Conference on Data Engineering (ICDE). Google ScholarDigital Library
- Berchtold, S., Keim, D. A., and Kriegel, H.-P. 1996. The X-tree: An index structure for high-dimensional data. In Proceedings of the 22nd International Conference on Very Large Data Bases (VLDB). Google ScholarDigital Library
- Beyer, K., Goldstein, J., Ramakrishnan, R., and Shaft, U. 1999. When is “nearest neighbor” meaningful? In Proceedings of the 7th International Conference on Database Theory (ICDT). Google ScholarDigital Library
- Bishop, C. M. 2006. Pattern Recognition and Machine Learning. Springer. Google ScholarDigital Library
- Bouveyron, C., Girard, S., and Schmid, C. 2007. High-dimensional data clustering. Comput. Statist. Data Anal. 52, 502--519.Google ScholarCross Ref
- Böhm, C., Kailing, K., Kriegel, H.-P., and Kröger, P. 2004. Density connected clustering with local subspace preferences. In Proceedings of the 4th International Conference on Data Mining (ICDM). Google ScholarDigital Library
- Böhm, C., Kailing, K., Kröger, P., and Zimek, A. 2004a. Computing clusters of correlation connected objects. In Proceedings of the ACM International Conference on Management of Data (SIGMOD). Google ScholarDigital Library
- Böhm, C. and Kriegel, H.-P. 2000a. Dynamically optimizing high-dimensional index structures. In Proceedings of the 7th International Conference on Extending Database Technology (EDBT). Google ScholarDigital Library
- Böhm, C. and Kriegel, H.-P. 2000b. Efficient construction of large high-dimensional indexes. In Proceedings of the 16th International Conference on Data Engineering (ICDE).Google Scholar
- Califano, A., Stolovitzky, G., and Tu, Y. 2000. Analysis of gene expression microarrays for phenotype classification. In Proceedings of the 8th International Conference on Intelligent Systems for Molecular Biology (ISMB). Google ScholarDigital Library
- Chakrabarti, K. and Mehrotra, S. 2000. Local dimensionality reduction: A new approach to indexing high dimensional spaces. In Proceedings of the 26th International Conference on Very Large Data Bases (VLDB). Google ScholarDigital Library
- Cheng, C. H., Fu, A. W.-C., and Zhang, Y. 1999. Entropy-Based subspace clustering for mining numerical data. In Proceedings of the 5th ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD). 84--93. Google ScholarDigital Library
- Cheng, H., Hua, K. A., and Vu, K. 2008. Constrained locally weighted clustering. In Proceedings of the 34nd International Conference on Very Large Data Bases (VLDB). Google ScholarDigital Library
- Cheng, Y. and Church, G. M. 2000. Biclustering of expression data. In Proceedings of the 8th International Conference on Intelligent Systems for Molecular Biology (ISMB). Google ScholarDigital Library
- Cho, H., Dhillon, I. S., Guan, Y., and Sra, S. 2004. Minimum sum-squared residue co-clustering of gene expression data. In Proceedings of the 4th SIAM International Conference on Data Mining (SDM).Google Scholar
- Dhillon, I. S. 2001. Co-clustering documents and words using bipartite spectral graph partitioning. In Proceedings of the 7th ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD). Google ScholarDigital Library
- Domeniconi, C., Papadopoulos, D., Gunopulos, D., and Ma, S. 2004. Subspace clustering of high dimensional data. In Proceedings of the 4th SIAM International Conference on Data Mining (SDM).Google Scholar
- Duda, R. O. and Hart, P. E. 1972. Use of the Hough transformation to detect lines and curves in pictures. Commun. ACM 15, 1, 11--15. Google ScholarDigital Library
- Duda, R. O., Hart, P. E., and Stork, D. G. 2001. Pattern Classification, 2nd ed. John Wiley&Sons. Google ScholarDigital Library
- Ester, M., Kriegel, H.-P., Sander, J., and Xu, X. 1996. A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of the 2nd ACM International Conference on Knowledge Discovery and Data Mining (KDD).Google Scholar
- Faloutsos, C. and Kamel, I. 1994. Beyond uniformity and independence: Analysis of R-trees using the concept of fractal dimension. In Proceedings of the ACM International Conference on Management of Data (SIGMOD). Google ScholarDigital Library
- Faloutsos, C. and Megalooikonomou, V. 2007. On data mining, compression, and Kolmogorov complexity. Data Mining Knowl. Discov. 15, 1, 3--20. Google ScholarDigital Library
- Fischler, M. A. and Bolles, R. C. 1981. Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24, 6, 381--395. Google ScholarDigital Library
- Friedman, J. H. and Meulman, J. J. 2004. Clustering objects on subsets of attributes. J. Royal Statist. Soc. Series B (Statistical Methodology) 66, 4, 825--849.Google ScholarCross Ref
- Ganter, B. and Wille, R. 1999. Formal Concept Analysis. Mathematical Foundations. Springer. Google ScholarDigital Library
- Garey, M. R. and Johnson, D. S. 1979. Computers and Intractability. A Guide to the Theory of NP-Completeness. W. H. Freeman. Google ScholarDigital Library
- Georgii, E., Richter, L., Rückert, U., and Kramer, S. 2005. Analyzing microarray data using quantitative association rules. Bioinf. 21, 2, ii1--ii8. Google ScholarDigital Library
- Getz, G., Levine, E., and Domany, E. 2000. Coupled two-way clustering analysis of gene microarray data. Proc. National Academy Sci. United States Amer. 97, 22, 12079--12084.Google ScholarCross Ref
- Gionis, A., Hinneburg, A., Papadimitriou, S., and Tsaparas, P. 2005. Dimension induced clustering. In Proceedings of the 11th ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD). Google ScholarDigital Library
- Han, J. and Kamber, M. 2001. Data Mining: Concepts and Techniques. Academic Press. Google ScholarDigital Library
- Han, J. and Kamber, M. 2006. Data Mining: Concepts and Techniques, 2nd ed. Academic Press. Google ScholarDigital Library
- Hand, D., Mannila, H., and Smyth, P. 2001. Principles of Data Mining. The MIT Press. Google ScholarDigital Library
- Haralick, R. and Harpaz, R. 2005. Linear manifold clustering. In Proceedings of the 4th International Conference on Machine Learning and Data Mining in Pattern Recognition (MLDM). Google ScholarDigital Library
- Harpaz, R. 2007. Model-Based linear manifold clustering. Ph.D. thesis, The City University of New York, Department of Computer Science. Google ScholarDigital Library
- Harpaz, R. and Haralick, R. 2007a. Linear manifold correlation clustering. Int. J. Inf. Technol. Intell. Comput. 2, 2.Google Scholar
- Harpaz, R. and Haralick, R. 2007b. Mining subspace correlations. In Proceedings of the IEEE Symposium on Computational Intelligence and Data Mining (CIDM).Google Scholar
- Hartigan, J. A. 1972. Direct clustering of a data matrix. J. Amer. Statist. Assoc. 67, 337, 123--129.Google ScholarCross Ref
- Hastie, T., Tibshirani, R., and Friedman, J. 2001. The Elements of Statistical Learning. Data Mining, Inference, and Prediction. Springer.Google Scholar
- Hinneburg, A., Aggarwal, C. C., and Keim, D. A. 2000. What is the nearest neighbor in high dimensional spaces? In Proceedings of the 26th International Conference on Very Large Data Bases (VLDB). Google ScholarDigital Library
- Hough, P. V. C. 1962. Methods and means for recognizing complex patterns. U.S. patent 3069654.Google Scholar
- Huang, J. Z., Ng, M. K., Rong, H., and Li, Z. 2005. Automated variable weighting in k-means type clustering. IEEE Trans. Pattern Anal. Mach. Intell. 27, 5, 657--668. Google ScholarDigital Library
- Ihmels, J., Bergmann, S., and Barkai, N. 2004. Defining transcription modules using large-scale gene expression data. Bioinf. 20, 13, 1993--2003. Google ScholarDigital Library
- Jain, A. K., Murty, M. N., and Flynn, P. J. 1999. Data clustering: A review. ACM Comput. Surv. Google ScholarDigital Library
- Jiang, D., Tang, C., and Zhang, A. 2004. Cluster analysis for gene expression data: A survey. IEEE Trans. Knowl. Data Eng. 16, 11, 1370--1386. Google ScholarDigital Library
- Jing, L., Ng, M. K., and Huang, J. Z. 2007. An entropy weighting k-means algorithm for subspace clustering of high-dimensional sparse data. IEEE Trans. Knowl. Data Eng. 19, 8, 1026--1041. Google ScholarDigital Library
- Jolliffe, I. T. 2002. Principal Component Analysis, 2nd ed. Springer.Google Scholar
- Kailing, K., Kriegel, H.-P., and Kröger, P. 2004. Density-Connected subspace clustering for high-dimensional data. In Proceedings of the 4th SIAM International Conference on Data Mining (SDM).Google Scholar
- Katayama, N. and Satoh, S. 1997. The SR-tree: An index structure for high-dimensional nearest neighbor queries. In Proceedings of the ACM International Conference on Management of Data (SIGMOD). Google ScholarDigital Library
- Kettenring, J. R. 2008. A perspective on cluster analysis. Short communication. Statist. Anal. Data Mining 1, 1, 52--53. Google ScholarDigital Library
- Korn, F., Pagel, B.-U., and Falutsos, C. 2001. On the “dimensionality curse” and the “self-similarity blessing”. IEEE Trans. Knowl. Data Eng. 13, 1, 96--111. Google ScholarDigital Library
- Kriegel, H.-P., Kröger, P., Renz, M., and Wurst, S. 2005. A generic framework for efficient subspace clustering of high-dimensional data. In Proceedings of the 5th International Conference on Data Mining (ICDM). Google ScholarDigital Library
- Kriegel, H.-P., Kröger, P., Schubert, E., and Zimek, A. 2008. A general framework for increasing the robustness of PCA-based correlation clustering algorithms. In Proceedings of the 20th International Conference on Scientific and Statistical Database Management (SSDBM). Google ScholarDigital Library
- Kriegel, H.-P., Kröger, P., and Zimek, A. 2007. Detecting clusters in moderate-to-high-dimensional data: Subspace clustering, pattern-based clustering, and correlation clustering. Tutorial at the 7th International Conference on Data Mining (ICDM).Google Scholar
- Li, J., Huang, X., Selke, C., and Yong, J. 2007. A fast algorithm for finding correlation clusters in noise data. In Proceedings of the 11th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD). Google ScholarDigital Library
- Lin, K., Jagadish, H. V., and Faloutsos, C. 1995. The TV-tree: An index structure for high-dimensional data. VLDB J. 3, 517--542. Google ScholarDigital Library
- Liu, B., Xia, Y., and Yu, P. S. 2000. Clustering through decision tree construction. In Proceedings of the 9th International Conference on Information and Knowledge Management (CIKM). Google ScholarDigital Library
- Liu, G., Li, J., Sim, K., and Wong, L. 2007. Distance based subspace clustering with flexible dimension partitioning. In Proceedings of the 23th International Conference on Data Engineering (ICDE).Google Scholar
- Liu, J. and Wang, W. 2003. OP-cluster: Clustering by tendency in high dimensional spaces. In Proceedings of the 3th International Conference on Data Mining (ICDM). Google ScholarDigital Library
- Madeira, S. C. and Oliveira, A. L. 2004. Biclustering algorithms for biological data analysis: A survey. IEEE Trans. Comput. Biol. Bioinf. 1, 1, 24--45. Google ScholarDigital Library
- Mirkin, B. 1996. Mathematical Classification and Clustering. Kluwer.Google Scholar
- Mitchell, T. M. 1997. Mach. Learn.. McGraw-Hill.Google Scholar
- Moise, G. and Sander, J. 2008. Finding non-redundant, statistically significant regions in high dimensional data: A novel approach to projected and subspace clustering. In Proceedings of the 14th ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD). Google ScholarDigital Library
- Moise, G., Sander, J., and Ester, M. 2006. P3C: A robust projected clustering algorithm. In Proceedings of the 6th International Conference on Data Mining (ICDM). Google ScholarDigital Library
- Moise, G., Sander, J., and Ester, M. 2008. Robust projected clustering. Knowl. Inf. Syst. 14, 3, 273--298. Google ScholarDigital Library
- Murali, T. M. and Kasif, S. 2003. Extracting conserved gene expression motifs from gene expression data. In Proceedings of the 8th Pacific Symposium on Biocomputing (PSB).Google Scholar
- Nagesh, H., Goil, S., and Choudhary, A. 2001. Adaptive grids for clustering massive data sets. In Proceedings of the 1st SIAM International Conference on Data Mining (SDM).Google Scholar
- Pagel, B.-U., Korn, F., and Faloutsos, C. 2000. Deflating the dimensionality curse using multiple fractal dimensions. In Proceedings of the 16th International Conference on Data Engineering (ICDE). Google ScholarDigital Library
- Parros Machado de Sousa, E., Traina, C., Traina, A., and Faloutsos, C. 2002. How to use fractal dimension to find correlations between attributes. In Proceedings of the KDD-Workshop on Fractals and Self-Similarity in Data Mining: Issues and Approaches.Google Scholar
- Parsons, L., Haque, E., and Liu, H. 2004. Subspace clustering for high dimensional data: A review. SIGKDD Explor. 6, 1, 90--105. Google ScholarDigital Library
- Pei, J., Zhang, X., Cho, M., Wang, H., and Yu, P. S. 2003. MaPle: A fast algorithm for maximal pattern-based clustering. In Proceedings of the 3th International Conference on Data Mining (ICDM). Google ScholarDigital Library
- Pfaltz, J. 2007. What constitutes a scientific database? In Proceedings of the 19th International Conference on Scientific and Statistical Database Management (SSDBM). Google ScholarDigital Library
- Prelić, A., Bleuler, S., Zimmermann, P., Wille, A., Bühlmann, P., Guissem, W., Hennig, L., Thiele, L., and Zitzler, E. 2006. A systematic comparison and evaluation of biclustering methods for gene expression data. Bioinf. 22, 9, 1122--1129. Google ScholarDigital Library
- Procopiuc, C. M., Jones, M., Agarwal, P. K., and Murali, T. M. 2002. A Monte Carlo algorithm for fast projective clustering. In Proceedings of the ACM International Conference on Management of Data (SIGMOD). Google ScholarDigital Library
- Rückert, U., Richter, L., and Kramer, S. 2004. Quantitative association rules based on half-spaces: An optimization approach. In Proceedings of the 4th International Conference on Data Mining (ICDM) 507--510. Google ScholarDigital Library
- Segal, E., Taskar, B., Gasch, A., Friedman, N., and Koller, D. 2001. Rich probabilistic models for gene expression. Bioinf. 17, 1, S243--S252.Google ScholarCross Ref
- Sequeira, K. and Zaki, M. J. 2005. SCHISM: A new approach to interesting subspace mining. Int. J. Business Intell. Data Mining 1, 2, 137--160. Google ScholarDigital Library
- Sheng, Q., Moreau, Y., and De Moor, B. 2003. Biclustering microarray data by Gibbs sampling. Bioinf. 19, 2, ii196--ii205.Google Scholar
- Sim, K., Li, J., Gopalkrishnan, V., and Liu, G. 2006. Mining maximal quasi-bicliques to cocluster stocks and financial ratios for value investment. In Proceedings of the 6th International Conference on Data Mining (ICDM). Google ScholarDigital Library
- Slagle, J. L., Chang, C. L., and Heller, S. L. 1975. A clustering and data-reorganization algorithm. IEEE Trans. Syst. Man. Cybernetics 5, 121--128.Google Scholar
- Tan, P.-N., Steinbach, M., and Kumar, V. 2006. Introduction to Data Mining. Addison Wesley. Google ScholarDigital Library
- Tanay, A., Sharan, R., and Shamir, R. 2002. Discovering statistically significant biclusters in gene expression data. Bioinf. 18, 1, S136--S144.Google ScholarCross Ref
- Tanay, A., Sharan, R., and Shamir, R. 2006. Biclustering algorithms: A survey. In Handbook of Computational Molecular Biology, S. Aluru, Ed. Chapman & Hall.Google Scholar
- Tung, A. K. H., Xu, X., and Ooi, C. B. 2005. CURLER: Finding and visualizing nonlinear correlated clusters. In Proceedings of the ACM International Conference on Management of Data (SIGMOD). Google ScholarDigital Library
- Van Mechelen, I., Bock, H.-H., and De Boeck, P. 2004. Two-Mode clustering methods: A structured overview. Statist. Methods Med. Res. 13, 363--394.Google ScholarCross Ref
- Wang, H., Wang, W., Yang, J., and Yu, P. S. 2002. Clustering by pattern similarity in large data sets. In Proceedings of the ACM International Conference on Management of Data (SIGMOD). Google ScholarDigital Library
- Webb, G. I. 2001. Discovering associations with numeric variables. In Proceedings of the 7th ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD). 383--388. Google ScholarDigital Library
- Weber, R., Schek, H.-J., and Blott, S. 1998. A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In Proceedings of the 24th International Conference on Very Large Data Bases (VLDB). Google ScholarDigital Library
- Witten, I. H. and Frank, E. 2005. Data Mining: Practical Machine Learning Tools and Techniques, 2nd ed. Morgan Kaufmann. Google ScholarDigital Library
- Woo, K.-G., Lee, J.-H., Kim, M.-H., and Lee, Y.-J. 2004. FINDIT: A fast and intelligent subspace clustering algorithm using dimension voting. Inf. Softw. Technol. 46, 4, 255--271.Google ScholarCross Ref
- Yang, J., Wang, W., Wang, H., and Yu, P. S. 2002. Δ-clusters: Capturing subspace correlation in a large dataset. In Proceedings of the 18th International Conference on Data Engineering (ICDE). Google ScholarDigital Library
- Yip, K. Y., Cheung, D. W., and Ng, M. K. 2004. HARP: A practical projected clustering algorithm. IEEE Trans. Knowl. Data Eng. 16, 11, 1387--1397. Google ScholarDigital Library
- Yip, K. Y., Cheung, D. W., and Ng, M. K. 2005. On discovery of extremely low-dimensional clusters using semi-supervised projected clustering. In Proceedings of the 21st International Conference on Data Engineering (ICDE). Google ScholarDigital Library
- Yiu, M. L. and Mamoulis, N. 2003. Frequent-Pattern based iterative projected clustering. In Proceedings of the 3th International Conference on Data Mining (ICDM). Google ScholarDigital Library
- Yiu, M. L. and Mamoulis, N. 2005. Iterative projected clustering by subspace mining. IEEE Trans. Knowl. Data Eng. 17, 2, 176--189. Google ScholarDigital Library
Index Terms
- Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering
Recommendations
Iterative random projections for high-dimensional data clustering
In this text we propose a method which efficiently performs clustering of high-dimensional data. The method builds on random projection and the K-means algorithm. The idea is to apply K-means several times, increasing the dimensionality of the data ...
Self-tuning clustering for high-dimensional data
Spectral clustering is an important component of clustering method, via tightly relying on the affinity matrix. However, conventional spectral clustering methods 1). equally treat each data point, so that easily affected by the outliers; 2). are ...
Interactive information bottleneck for high-dimensional co-occurrence data clustering
AbstractClustering high-dimensional data is quite challenging due to lots of redundant and irrelevant information contained in features. Most existing methods sequentially or jointly perform the feature dimensionality reduction and data ...
Highlights- A novel interactive information bottleneck is proposed.
- Data clustering and low-...
Comments