ABSTRACT
In many applications it is desirable to cluster high dimensional data along various subspaces, which we refer to as projective clustering. We propose a new objective function for projective clustering, taking into account the inherent trade-off between the dimension of a subspace and the induced clustering error. We then present an extension of the k-means clustering algorithm for projective clustering in arbitrary subspaces, and also propose techniques to avoid local minima. Unlike previous algorithms, ours can choose the dimension of each cluster independently and automatically. Furthermore, experimental results show that our algorithm is significantly more accurate than the previous approaches.
- P. Agarwal and S. Har-Peled, Maintaining the approximate extent measures of moving points, Proc. 12th ACM-SIAM Sympos. Discrete Algorithms, 2001, pp. 148--157.]] Google ScholarDigital Library
- P. K. Agarwal, S. Har-Peled, N. Mustafa, and Y. Wang, Nearlinear time approximation algorithms for curve simplification in two and three dimensions, Proc. of the 10th European Symposium on Algorithms, 2002, pp. 544--555.]] Google ScholarDigital Library
- C. Aggarwal and P. Yu, Finding generalized projected clusters in high dimensional spaces, Proc. ACM-SIGMOD Intl. Conf. Management of Data, 2000, pp. 70--81.]] Google ScholarDigital Library
- C. C. Aggarwal, C. M. Procopiuc, J. L. Wolf, P. S. Yu, and J. S. Park, Fast algorithms for projected clustering, Proc. ACM SIGMOD Intl. Conf. on Management of Data, 1999, pp. 61--72.]] Google ScholarDigital Library
- R. Agrawal, J. Gehrke, D. Gunopulos, and P. Raghavan, Automatic subspace clustering of high dimensional data for data mining applications, Proc. ACM-SIGMOD Intl. Conf. Management of Data, 1998, pp. 94--105.]] Google ScholarDigital Library
- K. Beyer, J. Goldstein, R. Ramakrishnan, and U. Shaft, When is "nearest neighbor" meaningful?, Lecture Notes in Computer Science, 1540 (1999), 217--235.]] Google ScholarDigital Library
- K. Chakrabarti and S. Mehrotra, Local dimensionality reduction: A new approach to indexing high dimensional spaces, Proc. of 26th Intl. Conf. on Very Large Data Bases, 2000, pp. 89--100.]] Google ScholarDigital Library
- Q. Du, V. Faber, and M. Gunzburger, Centroidal voronoi tessellations: Applications and algorithms, SIAM Review, 41 (1999), 637--676.]] Google ScholarDigital Library
- S. Guha, R. Rastogi, and K. Shim, CURE: an efficient clustering algorithm for large databases, Proc. ACM-SIGMOD Intl. Conf. Management of Data, 1998, pp. 73--84.]] Google ScholarDigital Library
- J. Han and M. Kamber, Data Mining: Concepts and Techniques, Morgan Kaufmann, 2001.]] Google ScholarDigital Library
- S. Har-Peled and K. Varadarajan, Approximate shape fitting via linearization, Proc. 42nd Annu. IEEE Sympos. Found. Comput. Sci., 2001, pp. 66--73.]] Google ScholarDigital Library
- R. M. Heiberger, Algorithm AS 127: Generation of random orthogonal matrices, Appl. Statist., 27 (1978), 199--206.]]Google ScholarCross Ref
- A. Hinneburg, C. C. Aggarwal, and D. A. Keim, What is the nearest neighbor in high dimensional spaces?, Proc. 26th Intl. Conf. Very Large DataBases, 2000, pp. 506--515.]] Google ScholarDigital Library
- A. Hinneburg and D. A. Keim, Optimal grid-clustering:towards breaking the curse of dimensionality in high-dimensional clustering, Proc. of 25th Intl. Conf. Very Large DataBases, 1999, pp. 506--517.]] Google ScholarDigital Library
- A. K. Jain and R. C. Dubes, Algorithms for Clustering Data, Prentice Hall, Englewood Cliffs, NJ, 1988.]] Google ScholarDigital Library
- W. Johnson and J. Lindenstrauss, Extensions of Lipschitz maps into a Hilbert space, Contemp. Math., 26 (1984), 189--206.]]Google ScholarCross Ref
- T. T. Jolliffe, Principal component analysis, Springer-Verlag, New York, 2002.]]Google Scholar
- R. T. Ng and J. Han, Efficient and effective clustering methods for spatial data mining, Intl. Conf. Very Large DataBases, 1994, pp. 144--155.]] Google ScholarDigital Library
- M. Procopiuc, M. Jones. P. K. Agarwal, and T. M. Murali, A monte carlo algorithm for fast projective clustering, Proc. ACM-SIGMOD Intl. Conf. Management of Data, 2002, pp. 418--427.]] Google ScholarDigital Library
- T. Zhang, R. Ramakrishnan, and M. Livny, BIRCH: an efficient data clustering method for very large databases, Proc. ACM-SIGMOD Intl. Conf. Management of Data, 1996, pp. 103--114.]] Google ScholarDigital Library
Recommendations
Hybrid Bisect K-Means Clustering Algorithm
BCGIN '11: Proceedings of the 2011 International Conference on Business Computing and Global InformatizationIn this paper, we present a hybrid clustering algorithm that combines divisive and agglomerative hierarchical clustering algorithm. Our method uses bisect K-means for divisive clustering algorithm and Unweighted Pair Group Method with Arithmetic Mean (...
RK-Means Clustering: K-Means with Reliability
This paper presents an RK-means clustering algorithm which is developed for reliable data grouping by introducing a new reliability evaluation to the K-means clustering algorithm. The conventional K-means clustering algorithm has two shortfalls: 1) the ...
Partitive clustering (K-means family)
Partitional clustering is an important part of cluster analysis. Cluster analysis can be considered as one of the the most important approaches to unsupervised learning. The goal of clustering is to find clusters from unlabeled data, which means that ...
Comments