skip to main content
research-article

Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering

Authors Info & Claims
Published:23 March 2009Publication History
Skip Abstract Section

Abstract

As a prolific research area in data mining, subspace clustering and related problems induced a vast quantity of proposed solutions. However, many publications compare a new proposition—if at all—with one or two competitors, or even with a so-called “naïve” ad hoc solution, but fail to clarify the exact problem definition. As a consequence, even if two solutions are thoroughly compared experimentally, it will often remain unclear whether both solutions tackle the same problem or, if they do, whether they agree in certain tacit assumptions and how such assumptions may influence the outcome of an algorithm. In this survey, we try to clarify: (i) the different problem definitions related to subspace clustering in general; (ii) the specific difficulties encountered in this field of research; (iii) the varying assumptions, heuristics, and intuitions forming the basis of different approaches; and (iv) how several prominent solutions tackle different problems.

References

  1. Achtert, E., Böhm, C., David, J., Kröger, P., and Zimek, A. 2008. Robust clustering in arbitrarily oriented subspaces. In Proceedings of the 8th SIAM International Conference on Data Mining (SDM).Google ScholarGoogle Scholar
  2. Achtert, E., Böhm, C., Kriegel, H.-P., Kröger, P., Müller-Gorman, I., and Zimek, A. 2007. Detection and visualization of subspace cluster hierarchies. In Proceedings of the 12th International Conference on Database Systems for Advanced Applications (DASFAA). Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Achtert, E., Böhm, C., Kriegel, H.-P., Kröger, P., and Zimek, A. 2006. Deriving quantitative models for correlation clusters. In Proceedings of the 12th ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD). Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Achtert, E., Böhm, C., Kriegel, H.-P., Kröger, P., and Zimek, A. 2007a. On exploring complex relationships of correlation clusters. In Proceedings of the 19th International Conference on Scientific and Statistical Database Management (SSDBM). Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Achtert, E., Böhm, C., Kriegel, H.-P., Kröger, P., and Zimek, A. 2007b. Robust, complete, and efficient correlation clustering. In Proceedings of the 7th SIAM International Conference on Data Mining (SDM).Google ScholarGoogle Scholar
  6. Achtert, E., Böhm, C., Kröger, P., and Zimek, A. 2006a. Mining hierarchies of correlation clusters. In Proceedings of the 18th International Conference on Scientific and Statistical Database Management (SSDBM). Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Achtert, E., Kriegel, H.-P., and Zimek, A. 2008a. ELKI: A software system for evaluation of subspace clustering algorithms. In Proceedings of the 20th International Conference on Scientific and Statistical Database Management (SSDBM). Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Aggarwal, C. C., Hinneburg, A., and Keim, D. 2001. On the surprising behavior of distance metrics in high dimensional space. In Proceedings of the 8th International Conference on Database Theory (ICDT). Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Aggarwal, C. C., Procopiuc, C. M., Wolf, J. L., Yu, P. S., and Park, J. S. 1999. Fast algorithms for projected clustering. In Proceedings of the ACM International Conference on Management of Data (SIGMOD). Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Aggarwal, C. C. and Yu, P. S. 2000. Finding generalized projected clusters in high dimensional space. In Proceedings of the ACM International Conference on Management of Data (SIGMOD). Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Agrawal, R., Gehrke, J., Gunopulos, D., and Raghavan, P. 1998. Automatic subspace clustering of high dimensional data for data mining applications. In Proceedings of the ACM International Conference on Management of Data (SIGMOD). Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Agrawal, R. and Srikant, R. 1994. Fast algorithms for mining association rules. In Proceedings of the ACM International Conference on Management of Data (SIGMOD).Google ScholarGoogle Scholar
  13. Ankerst, M., Breunig, M. M., Kriegel, H.-P., and Sander, J. 1999. OPTICS: Ordering points to identify the clustering structure. In Proceedings of the ACM International Conference on Management of Data (SIGMOD). Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Assent, I., Krieger, R., Müller, E., and Seidl, T. 2007a. DUSC: Dimensionality unbiased subspace clustering. In Proceedings of the 7th International Conference on Data Mining (ICDM). Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Assent, I., Krieger, R., Müller, E., and Seidl, T. 2007b. VISA: Visual subspace clustering analysis. ACM SIGKDD Explor. Newslett. 9, 2, 5--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Bansal, N., Blum, A., and Chawla, S. 2004. Correlation clustering. Mach. Learn. 56, 89--113. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Barbara, D. and Chen, P. 2000. Using the fractal dimension to cluster datasets. In Proceedings of the 6th ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD). Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Bellman, R. 1961. Adaptive Control Processes. A Guided Tour. Princeton University Press.Google ScholarGoogle Scholar
  19. Belussi, A. and Faloutsos, C. 1995. Estimating the selectivity of spatial queries using the ‘correlation’ fractal dimension. In Proceedings of the 21st International Conference on Very Large Data Bases (VLDB). Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Ben-Dor, A., Chor, B., Karp, R., and Yakhini, Z. 2002. Discovering local structure in gene expression data: The order-preserving submatrix problem. In Proceedings of the 6th Annual International Conference on Computational Molecular Biology (RECOMB). Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Berchtold, S., Böhm, C., Jagadish, H. V., Kriegel, H.-P., and Sander, J. 2000. Independent quantization: An index compression technique for high-dimensional data spaces. In Proceedings of the 16th International Conference on Data Engineering (ICDE). Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Berchtold, S., Böhm, C., and Kriegel, H.-P. 1998. The pyramid technique: Towards breaking the curse of dimensionality. In Proceedings of the ACM International Conference on Management of Data (SIGMOD). Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Berchtold, S., Ertl, B., Keim, D. A., Kriegel, H.-P., and Seidl, T. 1998a. Fast nearest neighbor search in high-dimensional spaces. In Proceedings of the 14th International Conference on Data Engineering (ICDE). Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Berchtold, S., Keim, D. A., and Kriegel, H.-P. 1996. The X-tree: An index structure for high-dimensional data. In Proceedings of the 22nd International Conference on Very Large Data Bases (VLDB). Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Beyer, K., Goldstein, J., Ramakrishnan, R., and Shaft, U. 1999. When is “nearest neighbor” meaningful? In Proceedings of the 7th International Conference on Database Theory (ICDT). Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Bishop, C. M. 2006. Pattern Recognition and Machine Learning. Springer. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Bouveyron, C., Girard, S., and Schmid, C. 2007. High-dimensional data clustering. Comput. Statist. Data Anal. 52, 502--519.Google ScholarGoogle ScholarCross RefCross Ref
  28. Böhm, C., Kailing, K., Kriegel, H.-P., and Kröger, P. 2004. Density connected clustering with local subspace preferences. In Proceedings of the 4th International Conference on Data Mining (ICDM). Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Böhm, C., Kailing, K., Kröger, P., and Zimek, A. 2004a. Computing clusters of correlation connected objects. In Proceedings of the ACM International Conference on Management of Data (SIGMOD). Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Böhm, C. and Kriegel, H.-P. 2000a. Dynamically optimizing high-dimensional index structures. In Proceedings of the 7th International Conference on Extending Database Technology (EDBT). Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Böhm, C. and Kriegel, H.-P. 2000b. Efficient construction of large high-dimensional indexes. In Proceedings of the 16th International Conference on Data Engineering (ICDE).Google ScholarGoogle Scholar
  32. Califano, A., Stolovitzky, G., and Tu, Y. 2000. Analysis of gene expression microarrays for phenotype classification. In Proceedings of the 8th International Conference on Intelligent Systems for Molecular Biology (ISMB). Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Chakrabarti, K. and Mehrotra, S. 2000. Local dimensionality reduction: A new approach to indexing high dimensional spaces. In Proceedings of the 26th International Conference on Very Large Data Bases (VLDB). Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Cheng, C. H., Fu, A. W.-C., and Zhang, Y. 1999. Entropy-Based subspace clustering for mining numerical data. In Proceedings of the 5th ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD). 84--93. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Cheng, H., Hua, K. A., and Vu, K. 2008. Constrained locally weighted clustering. In Proceedings of the 34nd International Conference on Very Large Data Bases (VLDB). Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Cheng, Y. and Church, G. M. 2000. Biclustering of expression data. In Proceedings of the 8th International Conference on Intelligent Systems for Molecular Biology (ISMB). Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Cho, H., Dhillon, I. S., Guan, Y., and Sra, S. 2004. Minimum sum-squared residue co-clustering of gene expression data. In Proceedings of the 4th SIAM International Conference on Data Mining (SDM).Google ScholarGoogle Scholar
  38. Dhillon, I. S. 2001. Co-clustering documents and words using bipartite spectral graph partitioning. In Proceedings of the 7th ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD). Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Domeniconi, C., Papadopoulos, D., Gunopulos, D., and Ma, S. 2004. Subspace clustering of high dimensional data. In Proceedings of the 4th SIAM International Conference on Data Mining (SDM).Google ScholarGoogle Scholar
  40. Duda, R. O. and Hart, P. E. 1972. Use of the Hough transformation to detect lines and curves in pictures. Commun. ACM 15, 1, 11--15. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Duda, R. O., Hart, P. E., and Stork, D. G. 2001. Pattern Classification, 2nd ed. John Wiley&Sons. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Ester, M., Kriegel, H.-P., Sander, J., and Xu, X. 1996. A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of the 2nd ACM International Conference on Knowledge Discovery and Data Mining (KDD).Google ScholarGoogle Scholar
  43. Faloutsos, C. and Kamel, I. 1994. Beyond uniformity and independence: Analysis of R-trees using the concept of fractal dimension. In Proceedings of the ACM International Conference on Management of Data (SIGMOD). Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Faloutsos, C. and Megalooikonomou, V. 2007. On data mining, compression, and Kolmogorov complexity. Data Mining Knowl. Discov. 15, 1, 3--20. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Fischler, M. A. and Bolles, R. C. 1981. Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24, 6, 381--395. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Friedman, J. H. and Meulman, J. J. 2004. Clustering objects on subsets of attributes. J. Royal Statist. Soc. Series B (Statistical Methodology) 66, 4, 825--849.Google ScholarGoogle ScholarCross RefCross Ref
  47. Ganter, B. and Wille, R. 1999. Formal Concept Analysis. Mathematical Foundations. Springer. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Garey, M. R. and Johnson, D. S. 1979. Computers and Intractability. A Guide to the Theory of NP-Completeness. W. H. Freeman. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Georgii, E., Richter, L., Rückert, U., and Kramer, S. 2005. Analyzing microarray data using quantitative association rules. Bioinf. 21, 2, ii1--ii8. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Getz, G., Levine, E., and Domany, E. 2000. Coupled two-way clustering analysis of gene microarray data. Proc. National Academy Sci. United States Amer. 97, 22, 12079--12084.Google ScholarGoogle ScholarCross RefCross Ref
  51. Gionis, A., Hinneburg, A., Papadimitriou, S., and Tsaparas, P. 2005. Dimension induced clustering. In Proceedings of the 11th ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD). Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Han, J. and Kamber, M. 2001. Data Mining: Concepts and Techniques. Academic Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Han, J. and Kamber, M. 2006. Data Mining: Concepts and Techniques, 2nd ed. Academic Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Hand, D., Mannila, H., and Smyth, P. 2001. Principles of Data Mining. The MIT Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Haralick, R. and Harpaz, R. 2005. Linear manifold clustering. In Proceedings of the 4th International Conference on Machine Learning and Data Mining in Pattern Recognition (MLDM). Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Harpaz, R. 2007. Model-Based linear manifold clustering. Ph.D. thesis, The City University of New York, Department of Computer Science. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Harpaz, R. and Haralick, R. 2007a. Linear manifold correlation clustering. Int. J. Inf. Technol. Intell. Comput. 2, 2.Google ScholarGoogle Scholar
  58. Harpaz, R. and Haralick, R. 2007b. Mining subspace correlations. In Proceedings of the IEEE Symposium on Computational Intelligence and Data Mining (CIDM).Google ScholarGoogle Scholar
  59. Hartigan, J. A. 1972. Direct clustering of a data matrix. J. Amer. Statist. Assoc. 67, 337, 123--129.Google ScholarGoogle ScholarCross RefCross Ref
  60. Hastie, T., Tibshirani, R., and Friedman, J. 2001. The Elements of Statistical Learning. Data Mining, Inference, and Prediction. Springer.Google ScholarGoogle Scholar
  61. Hinneburg, A., Aggarwal, C. C., and Keim, D. A. 2000. What is the nearest neighbor in high dimensional spaces? In Proceedings of the 26th International Conference on Very Large Data Bases (VLDB). Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. Hough, P. V. C. 1962. Methods and means for recognizing complex patterns. U.S. patent 3069654.Google ScholarGoogle Scholar
  63. Huang, J. Z., Ng, M. K., Rong, H., and Li, Z. 2005. Automated variable weighting in k-means type clustering. IEEE Trans. Pattern Anal. Mach. Intell. 27, 5, 657--668. Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. Ihmels, J., Bergmann, S., and Barkai, N. 2004. Defining transcription modules using large-scale gene expression data. Bioinf. 20, 13, 1993--2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. Jain, A. K., Murty, M. N., and Flynn, P. J. 1999. Data clustering: A review. ACM Comput. Surv. Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. Jiang, D., Tang, C., and Zhang, A. 2004. Cluster analysis for gene expression data: A survey. IEEE Trans. Knowl. Data Eng. 16, 11, 1370--1386. Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. Jing, L., Ng, M. K., and Huang, J. Z. 2007. An entropy weighting k-means algorithm for subspace clustering of high-dimensional sparse data. IEEE Trans. Knowl. Data Eng. 19, 8, 1026--1041. Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. Jolliffe, I. T. 2002. Principal Component Analysis, 2nd ed. Springer.Google ScholarGoogle Scholar
  69. Kailing, K., Kriegel, H.-P., and Kröger, P. 2004. Density-Connected subspace clustering for high-dimensional data. In Proceedings of the 4th SIAM International Conference on Data Mining (SDM).Google ScholarGoogle Scholar
  70. Katayama, N. and Satoh, S. 1997. The SR-tree: An index structure for high-dimensional nearest neighbor queries. In Proceedings of the ACM International Conference on Management of Data (SIGMOD). Google ScholarGoogle ScholarDigital LibraryDigital Library
  71. Kettenring, J. R. 2008. A perspective on cluster analysis. Short communication. Statist. Anal. Data Mining 1, 1, 52--53. Google ScholarGoogle ScholarDigital LibraryDigital Library
  72. Korn, F., Pagel, B.-U., and Falutsos, C. 2001. On the “dimensionality curse” and the “self-similarity blessing”. IEEE Trans. Knowl. Data Eng. 13, 1, 96--111. Google ScholarGoogle ScholarDigital LibraryDigital Library
  73. Kriegel, H.-P., Kröger, P., Renz, M., and Wurst, S. 2005. A generic framework for efficient subspace clustering of high-dimensional data. In Proceedings of the 5th International Conference on Data Mining (ICDM). Google ScholarGoogle ScholarDigital LibraryDigital Library
  74. Kriegel, H.-P., Kröger, P., Schubert, E., and Zimek, A. 2008. A general framework for increasing the robustness of PCA-based correlation clustering algorithms. In Proceedings of the 20th International Conference on Scientific and Statistical Database Management (SSDBM). Google ScholarGoogle ScholarDigital LibraryDigital Library
  75. Kriegel, H.-P., Kröger, P., and Zimek, A. 2007. Detecting clusters in moderate-to-high-dimensional data: Subspace clustering, pattern-based clustering, and correlation clustering. Tutorial at the 7th International Conference on Data Mining (ICDM).Google ScholarGoogle Scholar
  76. Li, J., Huang, X., Selke, C., and Yong, J. 2007. A fast algorithm for finding correlation clusters in noise data. In Proceedings of the 11th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD). Google ScholarGoogle ScholarDigital LibraryDigital Library
  77. Lin, K., Jagadish, H. V., and Faloutsos, C. 1995. The TV-tree: An index structure for high-dimensional data. VLDB J. 3, 517--542. Google ScholarGoogle ScholarDigital LibraryDigital Library
  78. Liu, B., Xia, Y., and Yu, P. S. 2000. Clustering through decision tree construction. In Proceedings of the 9th International Conference on Information and Knowledge Management (CIKM). Google ScholarGoogle ScholarDigital LibraryDigital Library
  79. Liu, G., Li, J., Sim, K., and Wong, L. 2007. Distance based subspace clustering with flexible dimension partitioning. In Proceedings of the 23th International Conference on Data Engineering (ICDE).Google ScholarGoogle Scholar
  80. Liu, J. and Wang, W. 2003. OP-cluster: Clustering by tendency in high dimensional spaces. In Proceedings of the 3th International Conference on Data Mining (ICDM). Google ScholarGoogle ScholarDigital LibraryDigital Library
  81. Madeira, S. C. and Oliveira, A. L. 2004. Biclustering algorithms for biological data analysis: A survey. IEEE Trans. Comput. Biol. Bioinf. 1, 1, 24--45. Google ScholarGoogle ScholarDigital LibraryDigital Library
  82. Mirkin, B. 1996. Mathematical Classification and Clustering. Kluwer.Google ScholarGoogle Scholar
  83. Mitchell, T. M. 1997. Mach. Learn.. McGraw-Hill.Google ScholarGoogle Scholar
  84. Moise, G. and Sander, J. 2008. Finding non-redundant, statistically significant regions in high dimensional data: A novel approach to projected and subspace clustering. In Proceedings of the 14th ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD). Google ScholarGoogle ScholarDigital LibraryDigital Library
  85. Moise, G., Sander, J., and Ester, M. 2006. P3C: A robust projected clustering algorithm. In Proceedings of the 6th International Conference on Data Mining (ICDM). Google ScholarGoogle ScholarDigital LibraryDigital Library
  86. Moise, G., Sander, J., and Ester, M. 2008. Robust projected clustering. Knowl. Inf. Syst. 14, 3, 273--298. Google ScholarGoogle ScholarDigital LibraryDigital Library
  87. Murali, T. M. and Kasif, S. 2003. Extracting conserved gene expression motifs from gene expression data. In Proceedings of the 8th Pacific Symposium on Biocomputing (PSB).Google ScholarGoogle Scholar
  88. Nagesh, H., Goil, S., and Choudhary, A. 2001. Adaptive grids for clustering massive data sets. In Proceedings of the 1st SIAM International Conference on Data Mining (SDM).Google ScholarGoogle Scholar
  89. Pagel, B.-U., Korn, F., and Faloutsos, C. 2000. Deflating the dimensionality curse using multiple fractal dimensions. In Proceedings of the 16th International Conference on Data Engineering (ICDE). Google ScholarGoogle ScholarDigital LibraryDigital Library
  90. Parros Machado de Sousa, E., Traina, C., Traina, A., and Faloutsos, C. 2002. How to use fractal dimension to find correlations between attributes. In Proceedings of the KDD-Workshop on Fractals and Self-Similarity in Data Mining: Issues and Approaches.Google ScholarGoogle Scholar
  91. Parsons, L., Haque, E., and Liu, H. 2004. Subspace clustering for high dimensional data: A review. SIGKDD Explor. 6, 1, 90--105. Google ScholarGoogle ScholarDigital LibraryDigital Library
  92. Pei, J., Zhang, X., Cho, M., Wang, H., and Yu, P. S. 2003. MaPle: A fast algorithm for maximal pattern-based clustering. In Proceedings of the 3th International Conference on Data Mining (ICDM). Google ScholarGoogle ScholarDigital LibraryDigital Library
  93. Pfaltz, J. 2007. What constitutes a scientific database? In Proceedings of the 19th International Conference on Scientific and Statistical Database Management (SSDBM). Google ScholarGoogle ScholarDigital LibraryDigital Library
  94. Prelić, A., Bleuler, S., Zimmermann, P., Wille, A., Bühlmann, P., Guissem, W., Hennig, L., Thiele, L., and Zitzler, E. 2006. A systematic comparison and evaluation of biclustering methods for gene expression data. Bioinf. 22, 9, 1122--1129. Google ScholarGoogle ScholarDigital LibraryDigital Library
  95. Procopiuc, C. M., Jones, M., Agarwal, P. K., and Murali, T. M. 2002. A Monte Carlo algorithm for fast projective clustering. In Proceedings of the ACM International Conference on Management of Data (SIGMOD). Google ScholarGoogle ScholarDigital LibraryDigital Library
  96. Rückert, U., Richter, L., and Kramer, S. 2004. Quantitative association rules based on half-spaces: An optimization approach. In Proceedings of the 4th International Conference on Data Mining (ICDM) 507--510. Google ScholarGoogle ScholarDigital LibraryDigital Library
  97. Segal, E., Taskar, B., Gasch, A., Friedman, N., and Koller, D. 2001. Rich probabilistic models for gene expression. Bioinf. 17, 1, S243--S252.Google ScholarGoogle ScholarCross RefCross Ref
  98. Sequeira, K. and Zaki, M. J. 2005. SCHISM: A new approach to interesting subspace mining. Int. J. Business Intell. Data Mining 1, 2, 137--160. Google ScholarGoogle ScholarDigital LibraryDigital Library
  99. Sheng, Q., Moreau, Y., and De Moor, B. 2003. Biclustering microarray data by Gibbs sampling. Bioinf. 19, 2, ii196--ii205.Google ScholarGoogle Scholar
  100. Sim, K., Li, J., Gopalkrishnan, V., and Liu, G. 2006. Mining maximal quasi-bicliques to cocluster stocks and financial ratios for value investment. In Proceedings of the 6th International Conference on Data Mining (ICDM). Google ScholarGoogle ScholarDigital LibraryDigital Library
  101. Slagle, J. L., Chang, C. L., and Heller, S. L. 1975. A clustering and data-reorganization algorithm. IEEE Trans. Syst. Man. Cybernetics 5, 121--128.Google ScholarGoogle Scholar
  102. Tan, P.-N., Steinbach, M., and Kumar, V. 2006. Introduction to Data Mining. Addison Wesley. Google ScholarGoogle ScholarDigital LibraryDigital Library
  103. Tanay, A., Sharan, R., and Shamir, R. 2002. Discovering statistically significant biclusters in gene expression data. Bioinf. 18, 1, S136--S144.Google ScholarGoogle ScholarCross RefCross Ref
  104. Tanay, A., Sharan, R., and Shamir, R. 2006. Biclustering algorithms: A survey. In Handbook of Computational Molecular Biology, S. Aluru, Ed. Chapman & Hall.Google ScholarGoogle Scholar
  105. Tung, A. K. H., Xu, X., and Ooi, C. B. 2005. CURLER: Finding and visualizing nonlinear correlated clusters. In Proceedings of the ACM International Conference on Management of Data (SIGMOD). Google ScholarGoogle ScholarDigital LibraryDigital Library
  106. Van Mechelen, I., Bock, H.-H., and De Boeck, P. 2004. Two-Mode clustering methods: A structured overview. Statist. Methods Med. Res. 13, 363--394.Google ScholarGoogle ScholarCross RefCross Ref
  107. Wang, H., Wang, W., Yang, J., and Yu, P. S. 2002. Clustering by pattern similarity in large data sets. In Proceedings of the ACM International Conference on Management of Data (SIGMOD). Google ScholarGoogle ScholarDigital LibraryDigital Library
  108. Webb, G. I. 2001. Discovering associations with numeric variables. In Proceedings of the 7th ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD). 383--388. Google ScholarGoogle ScholarDigital LibraryDigital Library
  109. Weber, R., Schek, H.-J., and Blott, S. 1998. A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In Proceedings of the 24th International Conference on Very Large Data Bases (VLDB). Google ScholarGoogle ScholarDigital LibraryDigital Library
  110. Witten, I. H. and Frank, E. 2005. Data Mining: Practical Machine Learning Tools and Techniques, 2nd ed. Morgan Kaufmann. Google ScholarGoogle ScholarDigital LibraryDigital Library
  111. Woo, K.-G., Lee, J.-H., Kim, M.-H., and Lee, Y.-J. 2004. FINDIT: A fast and intelligent subspace clustering algorithm using dimension voting. Inf. Softw. Technol. 46, 4, 255--271.Google ScholarGoogle ScholarCross RefCross Ref
  112. Yang, J., Wang, W., Wang, H., and Yu, P. S. 2002. Δ-clusters: Capturing subspace correlation in a large dataset. In Proceedings of the 18th International Conference on Data Engineering (ICDE). Google ScholarGoogle ScholarDigital LibraryDigital Library
  113. Yip, K. Y., Cheung, D. W., and Ng, M. K. 2004. HARP: A practical projected clustering algorithm. IEEE Trans. Knowl. Data Eng. 16, 11, 1387--1397. Google ScholarGoogle ScholarDigital LibraryDigital Library
  114. Yip, K. Y., Cheung, D. W., and Ng, M. K. 2005. On discovery of extremely low-dimensional clusters using semi-supervised projected clustering. In Proceedings of the 21st International Conference on Data Engineering (ICDE). Google ScholarGoogle ScholarDigital LibraryDigital Library
  115. Yiu, M. L. and Mamoulis, N. 2003. Frequent-Pattern based iterative projected clustering. In Proceedings of the 3th International Conference on Data Mining (ICDM). Google ScholarGoogle ScholarDigital LibraryDigital Library
  116. Yiu, M. L. and Mamoulis, N. 2005. Iterative projected clustering by subspace mining. IEEE Trans. Knowl. Data Eng. 17, 2, 176--189. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Knowledge Discovery from Data
          ACM Transactions on Knowledge Discovery from Data  Volume 3, Issue 1
          March 2009
          251 pages
          ISSN:1556-4681
          EISSN:1556-472X
          DOI:10.1145/1497577
          Issue’s Table of Contents

          Copyright © 2009 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 23 March 2009
          • Revised: 1 October 2008
          • Accepted: 1 October 2008
          • Received: 1 May 2008
          Published in tkdd Volume 3, Issue 1

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader