An efficient enhanced k-means clustering algorithm

Fahim, A. M.; Salem, A. M.; Torkey, F. A.; Ramadan, M. A.

doi:10.1631/jzus.2006.A1626

An efficient enhanced k-means clustering algorithm

Published: 01 October 2006

Volume 7, pages 1626–1633, (2006)
Cite this article

Journal of Zhejiang University-SCIENCE A Aims and scope Submit manuscript

Fahim A. M.¹,
Salem A. M.²,
Torkey F. A.³ &
…
Ramadan M. A.⁴

1302 Accesses
185 Citations
3 Altmetric
Explore all metrics

Abstract

In k-means clustering, we are given a set of n data points in d-dimensional space ℝ^d and an integer k and the problem is to determine a set of k points in ℝ^d, called centers, so as to minimize the mean squared distance from each data point to its nearest center. In this paper, we present a simple and efficient clustering algorithm based on the k-means algorithm, which we call enhanced k-means algorithm. This algorithm is easy to implement, requiring a simple data structure to keep some information in each iteration to be used in the next iteration. Our experimental results demonstrated that our scheme can improve the computational speed of the k-means algorithm by the magnitude in the total number of distance calculations and the overall time of computation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Agrawal, R., Gehrke, J., Gunopulos, D., Raghavan, P., 1998. Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications. Proc. ACM SIGMOD Int. Conf. on Management of Data. Seattle, WA, p.94–105.
Ankerst, M., Breunig, M., Kriegel, H.P., Sander, J., 1999. OPTICS: Ordering Points to Identify the Clustering Structure. Proc. ACM SIGMOD Int. Con. Management of Data Mining, p.49–60.
Duda, R.O., Hart, P.E., 1973. Pattern Classification and Scene Analysis. John Wiley & Sons, New York.
MATH Google Scholar
Ester, M., Kriegel, H.P., Sander, J., Xu, X., 1996. A Density-based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. Proc. 2nd Int. Conf. on Knowledge Discovery and Data Mining. AAAI Press, Portland, OR, p.226–231.
Google Scholar
Fayyad, U.M., Piatetsky-Shapiro, G., Smyth, P., Uthurusamy, R., 1996. Advances in Knowledge Discovery and Data Mining. AAAI/MIT Press.
Gersho, A., Gray, R.M., 1992. Vector Quantization and Signal Compression. Kluwer Academic, Boston.
Book MATH Google Scholar
Guha, S., Rastogi, R., Shim, K., 1998. CURE: An Efficient Clustering Algorithms for Large Databases. Proc. ACM SIGMOD Int. Conf. on Management of Data. Seattle, WA, p.73–84.
Hinneburg, A., Keim, D., 1998. An Efficient Approach to Clustering in Large Multimedia Databases with Noise. Proc. 4th Int. Conf. on Knowledge Discovery and Data Mining. New York City, NY.
Huang, Z., 1997. A Fast Clustering Algorithm to Cluster Very Large Categorical Data Sets in Data Mining. Proc. SIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery. Tech. Report 97-07, Dept. of CS, UBC.
Jain, A.K., Dubes, R.C., 1988. Algorithms for Clustering Data. Prentice-Hall Inc.
Kaufman, L., Rousseeuw, P.J., 1990. Finding Groups in Data: An Introduction to Cluster Analysis. John Wiley & Sons.
MacQueen, J., 1967. Some Methods for Classification and Analysis of Multivariate Observations. 5th Berkeley Symp. Math. Statist. Prob., 1:281–297.
MathSciNet MATH Google Scholar
Merz, P., 2003. An Iterated Local Search Approach for Minimum Sum of Squares Clustering. IDA 2003, p.286–296.
Ng, R.T., Han, J., 1994. Efficient and Effective Clustering Methods for Spatial Data Mining. Proc. 20th Int. Conf. on Very Large Data Bases. Morgan Kaufmann Publishers, San Francisco, CA, p.144–155.
Google Scholar
Sheikholeslami, G., Chatterjee, S., Zhang, A., 1998. Wave-Cluster: A Multi-Resolution Clustering Approach for Very Large Spatial Databases. Proc. 24th Int. Conf. on Very Large Data Bases. New York, p.428–439.
Sibson, R., 1973. SLINK: an optimally efficient algorithm for the single-link cluster method. The Comp. Journal, 16(1):30–34. [doi:10.1093/comjnl/16.1.30]
Article MathSciNet Google Scholar
Zhang, T., Ramakrishnan, R., Linvy, M., 1996. BIRCH: An Efficient Data Clustering Method for Very Large Data-bases. Proc. ACM SIGMOD Int. Conf. on Management of Data. ACM Press, New York, p.103–114.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Mathematics, Faculty of Education, Suez Canal University, Suez city, Egypt
Fahim A. M.
Department of Computer Science, Faculty of Computers & Information, Ain Shams University, Cairo city, Egypt
Salem A. M.
Department of Computer Science, Faculty of Computers & Information, Minufiya University, Shbeen El Koom City, Egypt
Torkey F. A.
Department of Mathematics, Faculty of Science, Minufiya University, Shbeen El Koom City, Egypt
Ramadan M. A.

Authors

Fahim A. M.
View author publications
You can also search for this author in PubMed Google Scholar
Salem A. M.
View author publications
You can also search for this author in PubMed Google Scholar
Torkey F. A.
View author publications
You can also search for this author in PubMed Google Scholar
Ramadan M. A.
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Fahim, A.M., Salem, A.M., Torkey, F.A. et al. An efficient enhanced k-means clustering algorithm. J. Zhejiang Univ. - Sci. A 7, 1626–1633 (2006). https://doi.org/10.1631/jzus.2006.A1626

Download citation

Received: 15 March 2006
Accepted: 11 May 2006
Published: 01 October 2006
Issue Date: October 2006
DOI: https://doi.org/10.1631/jzus.2006.A1626

Key words

CLC number

TP301.6

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An efficient enhanced k-means clustering algorithm

Abstract

Access this article

Similar content being viewed by others

A Comprehensive Survey of Clustering Algorithms

Density-Based Clustering Based on Hierarchical Density Estimates

Data clustering: application and trends

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Key words

CLC number

Navigation

An efficient enhanced k-means clustering algorithm

Abstract

Access this article

Similar content being viewed by others

A Comprehensive Survey of Clustering Algorithms

Density-Based Clustering Based on Hierarchical Density Estimates

Data clustering: application and trends

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Key words

CLC number

Search

Navigation