Article

Free Access

Entropy-based subspace clustering for mining numerical data

Authors:
Chun-Hung Cheng

Department of Computer Science and Engineering, The Chinese University of Hong Kong

Department of Computer Science and Engineering, The Chinese University of Hong Kong
View Profile

,
Ada Waichee Fu

Department of Computer Science and Engineering, The Chinese University of Hong Kong

Department of Computer Science and Engineering, The Chinese University of Hong Kong
View Profile

,
Yi Zhang

Department of Computer Science and Engineering, The Chinese University of Hong Kong

Department of Computer Science and Engineering, The Chinese University of Hong Kong
View Profile

KDD '99: Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data miningAugust 1999Pages 84–93https://doi.org/10.1145/312129.312199

Published:01 August 1999Publication History

KDD '99: Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining

Pages 84–93

References

1.R. Agrawal and R. Srikant. Fast algorithms for mining association rules. In Proceedings of the 20th VLDB Conference, pages 487-499, 1994. Google ScholarDigital Library
2.Rakesh Agrawal, Johannes Gehrke, Dimitrios Gunopulos, and Prabhakar Raghavan. Automatic subspace clustering of high dimensional data for data mining applications. In Proceedings of the A CM SiGMOD Con- ~erence on Management of Data, Montreal, Canada, 1998. Google ScholarDigital Library
3.A. Aho, J. Hopcroft, and J. Ullman. The Design and Analysis of Computer Algorithms. Addison-Welsley, 1974. Google ScholarDigital Library
4.P. S. Bradley, Usama Fayyad, and Cory Reina. Scaling clustering algorithms to large databases. In Proceedings of International Conference on Knowledge Discovery and Data Mining KDD-98, AAAI Press, 1998.Google Scholar
5.P. S. Bradley, O. L. Mangasarian, and W. Nick Street. Clustering via concave minimization. In M. C. Mozer, M. I. Jordan, and T. Petsche, editors, Advances in Neural Information Processing Systems -9-, pages 368- 374, Cambridge, MA, 1997. MIT Press.Google Scholar
6.Sergey Brin, Rajeev Motwani, and Craig Silverstein. Beyond market baskets: Generalizing association rules to correlations. In Proceedings of the A CM SIGMOD Conference on Management of Data, 1997. Google ScholarDigital Library
7.David K. Y. Chiu and Andrew K. C. Wong. Synthesizing knowledge: A cluster analysis approach using event coveting. In IEEE Transactions on Sytems, Man, and Cybernetics, Vol. SMC-16, No. 2, March/April 1986, pages 251-259, 1986. Google ScholarDigital Library
8.Thomas M. Cover and Joy A. Thomas. Elements of Information Theory. Wiley Series in Telecommunications, 1991. Google ScholarDigital Library
9.I. Csiszgr and J. KSrner. Information Theory: Coding Theorems for Discrete Memoryless System. Academic Press, 1981. Google ScholarDigital Library
10.Jay L. Devore. Probability and Statistics for Engineering and the Sciences. Duxbury Press, 4th edition, 1995.Google Scholar
11.Martin Ester, Hans-Peter Kriegel, JSrg Sander, Michael Wimmer, and Xiaowei Xu. Incremental clustering for mining in a data warehousing environment. In Proceedings of the ~4th VLDB Conference, New York, USA, 1998. Google ScholarDigital Library
12.Martin Ester, Hans-Peter Kriegel, JSrg Sander, and Xiaowei Xu. A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of International Conference on Knowledge Discovery and Data Mining KDD-98, AAAI Press, pages 226-231, 1996.Google Scholar
13.Takeshi Fukuda, Yasuhiki Morimoto, Shinichi Morishita, and Takeshi Tokuyama. Data mining using twodimensional optimized association rules: Scheme, algorithms, and visualization. In Proceedings of the A CM SIGMOD Conference on Management of Data, 1996. Google ScholarDigital Library
14.Takeshi Fukuda, Yasuhiko Morimoto, Shinichi Morishita, and Takeshi Tokuyama. Constructing efficient decision trees by using optimized numeric association rules. In Proceedings of the ~2nd VLDB Conference, Mumbai(Bombay), India, 1996. Google ScholarDigital Library
15.Takeshi Fukuda, Yasuhiko Morimoto, Shinichi Morishita, and Takeshi Tokuyama. Mining optimized association rules for numeric attributes. In Proceedings of the Fifteenth A CM SIGA CT-SIGMOD-SIGART Symposium on Principles of Database Systems, June 1996. Google ScholarDigital Library
16.Clark Glymour, David Madigan, Daryl Pregibon, and Padhraic Smyth. Statistical themes and lessons for data mining. Data Mining and Knowledge Discovery, 1:11- 28, 1997. Google ScholarDigital Library
17.Sudipto Guha, Rajeev Rastogi, and Kyuseok Shim. CURE: An efficient clustering algorithm for large databases. In Proceedings of the A CM SiGMOD Conference on Management of Data, Montreal, Canada, June 1996. Google ScholarDigital Library
18.John A. Hartigan. Clustering algorithms. Wiley, 1975. Google ScholarDigital Library
19.Pierre Michaud. Clustering techniques. In Future Generation Computer Systems 13, pages 135-147, 1997. Google ScholarDigital Library
20.Raymond T. Ng and Jiawei Han. Efficient and effective clustering methods for spatial data mining. In Proceedings of the 20th VLDB Conference, Santiago, Chile, 1994. Google ScholarDigital Library
21.J.R. Quinlan. Induction of decision trees. In Machine Learning, pages 81-106. Kluwer Academic Publishers, 1986. Google Scholar
22.J.R. Quinlan. Cd.5: Programs for Machine Learning. Morgan Kaufmann, 1993. Google ScholarDigital Library
23.Erich Schikuta. Grid-clustering: An efficient hierarchical clustering method for very large data sets. in Proceedings of Internation Conference on Pattern Recognition (ICPR), pages 101-105, 1996. Google ScholarDigital Library
24.Xiaowei Xu, Martin Ester, Hans-Peter Kriegel, and JSrg Sander. A distribution-based clustering algorithm for mining in large spatial databases. In Proceedings of ldth International Conference on Data Engineering (ICDE'98), 1998. Google ScholarDigital Library
25.Tian Zhang, Raghu Ramakristman, and Miron Livny. BIRCH: An efficient data clustering method for very large databases. In Proceedings of the A CM SIG- MOD Conference on Management of Data, Montreal, Canada, pages 103-114, June 1996. Google ScholarDigital Library

Index Terms

Entropy-based subspace clustering for mining numerical data
1. Information systems
  1. Information retrieval
    1. Retrieval tasks and goals
      1. Clustering and classification
  2. Information systems applications
    1. Data mining
      1. Clustering

Recommendations

Subspace clustering for high dimensional data: a review
Special issue on learning from imbalanced datasets

Subspace clustering is an extension of traditional clustering that seeks to find clusters in different subspaces within a dataset. Often in high dimensional data, many dimensions are irrelevant and can mask existing clusters in noisy data. Feature ...
Read More
A rough set based subspace clustering technique for high dimensional data
Abstract
Subspace clustering aims at identifying subspaces for cluster formation so that the data is categorized in different perspectives. The conventional subspace clustering algorithms explore dense clusters in all the possible subspaces. ...
Read More
An Entropy Weighting k-Means Algorithm for Subspace Clustering of High-Dimensional Sparse Data

This paper presents a new k-means type algorithm for clustering high-dimensional objects in subspaces. In high-dimensional data, clusters of objects often exist in subspaces rather than in the entire space. For example, in text clustering, clusters of ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
KDD '99: Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
August 1999
439 pages
ISBN:1581131437
DOI:10.1145/312129
Chairmen:
Usama Fayyad
Microsoft Research
,
Surajit Chaudhuri
Microsoft Research
,
David Madigan
AT&T Labs-Research
Copyright © 1999 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 August 1999
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate1,133of8,635submissions,13%
Upcoming Conference
KDD '24

Sponsor:

sigkdd

sigkdd

The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona , Spain
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 330
  Total Citations
  View Citations
- 3,503
  Total Downloads
- Downloads (Last 12 months)214
- Downloads (Last 6 weeks)16
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Entropy-based subspace clustering for mining numerical data

KDD '99: Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining

References

Cited By

Index Terms

Recommendations

Subspace clustering for high dimensional data: a review

A rough set based subspace clustering technique for high dimensional data

An Entropy Weighting k-Means Algorithm for Subspace Clustering of High-Dimensional Sparse Data

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Entropy-based subspace clustering for mining numerical data

KDD '99: Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining

References

Cited By

Index Terms

Recommendations

Subspace clustering for high dimensional data: a review

A rough set based subspace clustering technique for high dimensional data

An Entropy Weighting k-Means Algorithm for Subspace Clustering of High-Dimensional Sparse Data

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media