skip to main content
10.1145/1055558.1055581acmconferencesArticle/Chapter ViewAbstractPublication PagespodsConference Proceedingsconference-collections
Article

k-means projective clustering

Published:14 June 2004Publication History

ABSTRACT

In many applications it is desirable to cluster high dimensional data along various subspaces, which we refer to as projective clustering. We propose a new objective function for projective clustering, taking into account the inherent trade-off between the dimension of a subspace and the induced clustering error. We then present an extension of the k-means clustering algorithm for projective clustering in arbitrary subspaces, and also propose techniques to avoid local minima. Unlike previous algorithms, ours can choose the dimension of each cluster independently and automatically. Furthermore, experimental results show that our algorithm is significantly more accurate than the previous approaches.

References

  1. P. Agarwal and S. Har-Peled, Maintaining the approximate extent measures of moving points, Proc. 12th ACM-SIAM Sympos. Discrete Algorithms, 2001, pp. 148--157.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. P. K. Agarwal, S. Har-Peled, N. Mustafa, and Y. Wang, Nearlinear time approximation algorithms for curve simplification in two and three dimensions, Proc. of the 10th European Symposium on Algorithms, 2002, pp. 544--555.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. C. Aggarwal and P. Yu, Finding generalized projected clusters in high dimensional spaces, Proc. ACM-SIGMOD Intl. Conf. Management of Data, 2000, pp. 70--81.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. C. C. Aggarwal, C. M. Procopiuc, J. L. Wolf, P. S. Yu, and J. S. Park, Fast algorithms for projected clustering, Proc. ACM SIGMOD Intl. Conf. on Management of Data, 1999, pp. 61--72.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. R. Agrawal, J. Gehrke, D. Gunopulos, and P. Raghavan, Automatic subspace clustering of high dimensional data for data mining applications, Proc. ACM-SIGMOD Intl. Conf. Management of Data, 1998, pp. 94--105.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. K. Beyer, J. Goldstein, R. Ramakrishnan, and U. Shaft, When is "nearest neighbor" meaningful?, Lecture Notes in Computer Science, 1540 (1999), 217--235.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. K. Chakrabarti and S. Mehrotra, Local dimensionality reduction: A new approach to indexing high dimensional spaces, Proc. of 26th Intl. Conf. on Very Large Data Bases, 2000, pp. 89--100.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Q. Du, V. Faber, and M. Gunzburger, Centroidal voronoi tessellations: Applications and algorithms, SIAM Review, 41 (1999), 637--676.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. S. Guha, R. Rastogi, and K. Shim, CURE: an efficient clustering algorithm for large databases, Proc. ACM-SIGMOD Intl. Conf. Management of Data, 1998, pp. 73--84.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. J. Han and M. Kamber, Data Mining: Concepts and Techniques, Morgan Kaufmann, 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. S. Har-Peled and K. Varadarajan, Approximate shape fitting via linearization, Proc. 42nd Annu. IEEE Sympos. Found. Comput. Sci., 2001, pp. 66--73.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. R. M. Heiberger, Algorithm AS 127: Generation of random orthogonal matrices, Appl. Statist., 27 (1978), 199--206.]]Google ScholarGoogle ScholarCross RefCross Ref
  13. A. Hinneburg, C. C. Aggarwal, and D. A. Keim, What is the nearest neighbor in high dimensional spaces?, Proc. 26th Intl. Conf. Very Large DataBases, 2000, pp. 506--515.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. A. Hinneburg and D. A. Keim, Optimal grid-clustering:towards breaking the curse of dimensionality in high-dimensional clustering, Proc. of 25th Intl. Conf. Very Large DataBases, 1999, pp. 506--517.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. A. K. Jain and R. C. Dubes, Algorithms for Clustering Data, Prentice Hall, Englewood Cliffs, NJ, 1988.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. W. Johnson and J. Lindenstrauss, Extensions of Lipschitz maps into a Hilbert space, Contemp. Math., 26 (1984), 189--206.]]Google ScholarGoogle ScholarCross RefCross Ref
  17. T. T. Jolliffe, Principal component analysis, Springer-Verlag, New York, 2002.]]Google ScholarGoogle Scholar
  18. R. T. Ng and J. Han, Efficient and effective clustering methods for spatial data mining, Intl. Conf. Very Large DataBases, 1994, pp. 144--155.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. M. Procopiuc, M. Jones. P. K. Agarwal, and T. M. Murali, A monte carlo algorithm for fast projective clustering, Proc. ACM-SIGMOD Intl. Conf. Management of Data, 2002, pp. 418--427.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. T. Zhang, R. Ramakrishnan, and M. Livny, BIRCH: an efficient data clustering method for very large databases, Proc. ACM-SIGMOD Intl. Conf. Management of Data, 1996, pp. 103--114.]] Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Conferences
    PODS '04: Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
    June 2004
    350 pages
    ISBN:158113858X
    DOI:10.1145/1055558

    Copyright © 2004 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 14 June 2004

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • Article

    Acceptance Rates

    Overall Acceptance Rate642of2,707submissions,24%

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader