Article

k-means projective clustering

Authors:
Pankaj K. Agarwal

Duke University, Durham, NC

Duke University, Durham, NC
View Profile

,
Nabil H. Mustafa

Duke University, Durham, NC

Duke University, Durham, NC
View Profile

PODS '04: Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systemsJune 2004Pages 155–165https://doi.org/10.1145/1055558.1055581

Published:14 June 2004Publication History

PODS '04: Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems

Pages 155–165

ABSTRACT

In many applications it is desirable to cluster high dimensional data along various subspaces, which we refer to as projective clustering. We propose a new objective function for projective clustering, taking into account the inherent trade-off between the dimension of a subspace and the induced clustering error. We then present an extension of the k-means clustering algorithm for projective clustering in arbitrary subspaces, and also propose techniques to avoid local minima. Unlike previous algorithms, ours can choose the dimension of each cluster independently and automatically. Furthermore, experimental results show that our algorithm is significantly more accurate than the previous approaches.

References

P. Agarwal and S. Har-Peled, Maintaining the approximate extent measures of moving points, Proc. 12th ACM-SIAM Sympos. Discrete Algorithms, 2001, pp. 148--157.]] Google ScholarDigital Library
P. K. Agarwal, S. Har-Peled, N. Mustafa, and Y. Wang, Nearlinear time approximation algorithms for curve simplification in two and three dimensions, Proc. of the 10th European Symposium on Algorithms, 2002, pp. 544--555.]] Google ScholarDigital Library
C. Aggarwal and P. Yu, Finding generalized projected clusters in high dimensional spaces, Proc. ACM-SIGMOD Intl. Conf. Management of Data, 2000, pp. 70--81.]] Google ScholarDigital Library
C. C. Aggarwal, C. M. Procopiuc, J. L. Wolf, P. S. Yu, and J. S. Park, Fast algorithms for projected clustering, Proc. ACM SIGMOD Intl. Conf. on Management of Data, 1999, pp. 61--72.]] Google ScholarDigital Library
R. Agrawal, J. Gehrke, D. Gunopulos, and P. Raghavan, Automatic subspace clustering of high dimensional data for data mining applications, Proc. ACM-SIGMOD Intl. Conf. Management of Data, 1998, pp. 94--105.]] Google ScholarDigital Library
K. Beyer, J. Goldstein, R. Ramakrishnan, and U. Shaft, When is "nearest neighbor" meaningful?, Lecture Notes in Computer Science, 1540 (1999), 217--235.]] Google ScholarDigital Library
K. Chakrabarti and S. Mehrotra, Local dimensionality reduction: A new approach to indexing high dimensional spaces, Proc. of 26th Intl. Conf. on Very Large Data Bases, 2000, pp. 89--100.]] Google ScholarDigital Library
Q. Du, V. Faber, and M. Gunzburger, Centroidal voronoi tessellations: Applications and algorithms, SIAM Review, 41 (1999), 637--676.]] Google ScholarDigital Library
S. Guha, R. Rastogi, and K. Shim, CURE: an efficient clustering algorithm for large databases, Proc. ACM-SIGMOD Intl. Conf. Management of Data, 1998, pp. 73--84.]] Google ScholarDigital Library
J. Han and M. Kamber, Data Mining: Concepts and Techniques, Morgan Kaufmann, 2001.]] Google ScholarDigital Library
S. Har-Peled and K. Varadarajan, Approximate shape fitting via linearization, Proc. 42nd Annu. IEEE Sympos. Found. Comput. Sci., 2001, pp. 66--73.]] Google ScholarDigital Library
R. M. Heiberger, Algorithm AS 127: Generation of random orthogonal matrices, Appl. Statist., 27 (1978), 199--206.]]Google ScholarCross Ref
A. Hinneburg, C. C. Aggarwal, and D. A. Keim, What is the nearest neighbor in high dimensional spaces?, Proc. 26th Intl. Conf. Very Large DataBases, 2000, pp. 506--515.]] Google ScholarDigital Library
A. Hinneburg and D. A. Keim, Optimal grid-clustering:towards breaking the curse of dimensionality in high-dimensional clustering, Proc. of 25th Intl. Conf. Very Large DataBases, 1999, pp. 506--517.]] Google ScholarDigital Library
A. K. Jain and R. C. Dubes, Algorithms for Clustering Data, Prentice Hall, Englewood Cliffs, NJ, 1988.]] Google ScholarDigital Library
W. Johnson and J. Lindenstrauss, Extensions of Lipschitz maps into a Hilbert space, Contemp. Math., 26 (1984), 189--206.]]Google ScholarCross Ref
T. T. Jolliffe, Principal component analysis, Springer-Verlag, New York, 2002.]]Google Scholar
R. T. Ng and J. Han, Efficient and effective clustering methods for spatial data mining, Intl. Conf. Very Large DataBases, 1994, pp. 144--155.]] Google ScholarDigital Library
M. Procopiuc, M. Jones. P. K. Agarwal, and T. M. Murali, A monte carlo algorithm for fast projective clustering, Proc. ACM-SIGMOD Intl. Conf. Management of Data, 2002, pp. 418--427.]] Google ScholarDigital Library
T. Zhang, R. Ramakrishnan, and M. Livny, BIRCH: an efficient data clustering method for very large databases, Proc. ACM-SIGMOD Intl. Conf. Management of Data, 1996, pp. 103--114.]] Google ScholarDigital Library

Recommendations

Hybrid Bisect K-Means Clustering Algorithm
BCGIN '11: Proceedings of the 2011 International Conference on Business Computing and Global Informatization

In this paper, we present a hybrid clustering algorithm that combines divisive and agglomerative hierarchical clustering algorithm. Our method uses bisect K-means for divisive clustering algorithm and Unweighted Pair Group Method with Arithmetic Mean (...
Read More
RK-Means Clustering: K-Means with Reliability

This paper presents an RK-means clustering algorithm which is developed for reliable data grouping by introducing a new reliability evaluation to the K-means clustering algorithm. The conventional K-means clustering algorithm has two shortfalls: 1) the ...
Read More
Partitive clustering (K-means family)

Partitional clustering is an important part of cluster analysis. Cluster analysis can be considered as one of the the most important approaches to unsupervised learning. The goal of clustering is to find clusters from unlabeled data, which means that ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
PODS '04: Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
June 2004
350 pages
ISBN:158113858X
DOI:10.1145/1055558
Conference Chair:
Catriel Beeri
Hebrew University of Jerusalem
Copyright © 2004 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 14 June 2004
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate642of2,707submissions,24%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 111
  Total Citations
  View Citations
- 1,235
  Total Downloads
- Downloads (Last 12 months)52
- Downloads (Last 6 weeks)6
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

k-means projective clustering

PODS '04: Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems

ABSTRACT

References

Cited By

Recommendations

Hybrid Bisect K-Means Clustering Algorithm

RK-Means Clustering: K-Means with Reliability

Partitive clustering (K-means family)

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

k-means projective clustering

PODS '04: Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems

ABSTRACT

References

Cited By

Recommendations

Hybrid Bisect K-Means Clustering Algorithm

RK-Means Clustering: K-Means with Reliability

Partitive clustering (K-means family)

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media