Skip to main content
Top
Published in: Cluster Computing 3/2019

16-02-2018

Attribute weights-based clustering centres algorithm for initialising K-modes clustering

Authors: Liwen Peng, Yongguo Liu

Published in: Cluster Computing | Special Issue 3/2019

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

The K-modes algorithm based on partitional clustering technology is a very popular and effective clustering method; moreover, it handles categorical data. However, the performance of the K-modes method is largely affected by the initial clustering centres. Random selection of the initial clustering centres commonly leads to non-repeatable clustering result. Hence, suitable choice of the initial clustering centres is crucial to realizing high-performance K-modes clustering. The present article develops an initialisation algorithm for K-modes. At initialisation, the distance between two instances calculated after weighting the attributes of the instances. Many studies have shown that if clustering is based only on distances or density between the instances, the clustering revolves around one centre or the outliers. Therefore, based on the attribute weights, we combine the distance and density measures to select the clustering centres. In experiments on several UCI machine learning repository benchmark datasets, the new initialisation method outperformed the existing K-modes clustering methods.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Matas, J., Kittler, J.: Spatial and feature space clustering: applications in image analysis. In: International Conference on Computer Analysis of Images and Patterns, pp. 162–173. Springer, Berlin (1995)CrossRef Matas, J., Kittler, J.: Spatial and feature space clustering: applications in image analysis. In: International Conference on Computer Analysis of Images and Patterns, pp. 162–173. Springer, Berlin (1995)CrossRef
2.
go back to reference Hsu, C.C., Huang, Y.P.: Incremental clustering of mixed data based on distance hierarchy. Inf. Sci. 35(3), 1177–1185 (2008) Hsu, C.C., Huang, Y.P.: Incremental clustering of mixed data based on distance hierarchy. Inf. Sci. 35(3), 1177–1185 (2008)
3.
go back to reference Anant, R., Sunita, J., Jalal, A.S., Aanjoy, K.: A density based algorithm for discovering density varied clusters in large spatial databases. Int. J. Comput. Appl. 3(6), 1–4 (2011) Anant, R., Sunita, J., Jalal, A.S., Aanjoy, K.: A density based algorithm for discovering density varied clusters in large spatial databases. Int. J. Comput. Appl. 3(6), 1–4 (2011)
4.
go back to reference Bai, L., Liang, J., Sui, C., Dang, C.: Fast global k-means clustering based on local geometrical information. Inf. Sci. 245(10), 168–180 (2013)MathSciNetCrossRef Bai, L., Liang, J., Sui, C., Dang, C.: Fast global k-means clustering based on local geometrical information. Inf. Sci. 245(10), 168–180 (2013)MathSciNetCrossRef
5.
go back to reference Kanungo, T., Mount, D.M., Netanyahu, N.S., Piatko, C.D., Silverman, R., Wu, A.Y.: An efficient k-means clustering algorithm: analysis and implementation. IEEE Trans. Pattern Anal. Mach. Intell. 24(7), 881–892 (2002)CrossRef Kanungo, T., Mount, D.M., Netanyahu, N.S., Piatko, C.D., Silverman, R., Wu, A.Y.: An efficient k-means clustering algorithm: analysis and implementation. IEEE Trans. Pattern Anal. Mach. Intell. 24(7), 881–892 (2002)CrossRef
6.
go back to reference Gheid, Z., Challal, Y.: Efficient and privacy-preserving k-means clustering for big data mining. In: 2017 IEEE Trustcom/bigdatase/ispa, pp. 791–798 (2017) Gheid, Z., Challal, Y.: Efficient and privacy-preserving k-means clustering for big data mining. In: 2017 IEEE Trustcom/bigdatase/ispa, pp. 791–798 (2017)
7.
go back to reference Khanmohammadi, S., Adibeig, N., Shanehbandy, S.: An improved overlapping k-means clustering method for medical applications. Expert Syst. Appl. 67, 12–18 (2016)CrossRef Khanmohammadi, S., Adibeig, N., Shanehbandy, S.: An improved overlapping k-means clustering method for medical applications. Expert Syst. Appl. 67, 12–18 (2016)CrossRef
8.
go back to reference Baby, V., Chandra, N.S.: Distributed threshold k-means clustering for privacy preserving data mining. In: 2016 IEEE International Conference on Advances in Computing, Communications and Informatics, pp. 2286–2289 (2016) Baby, V., Chandra, N.S.: Distributed threshold k-means clustering for privacy preserving data mining. In: 2016 IEEE International Conference on Advances in Computing, Communications and Informatics, pp. 2286–2289 (2016)
9.
go back to reference Wazid, M., Das, A.K.: An efficient hybrid anomaly detection scheme using k-means clustering for wireless sensor networks. Wirel. Pers. Commun. 90(4), 1971–2000 (2016)CrossRef Wazid, M., Das, A.K.: An efficient hybrid anomaly detection scheme using k-means clustering for wireless sensor networks. Wirel. Pers. Commun. 90(4), 1971–2000 (2016)CrossRef
10.
go back to reference Huang, Z.: A fast clustering algorithm to cluster very large categorical data sets in data mining. Data Min. Knowl. Discov. 3, 1–8 (1997) Huang, Z.: A fast clustering algorithm to cluster very large categorical data sets in data mining. Data Min. Knowl. Discov. 3, 1–8 (1997)
11.
go back to reference Bai, T., Kulikowski, C.A.A., Gong, L., Yang, B., Huang, L., Zhou, C.: A global k-modes algorithm for clustering categorical data. Chin. J. Electron. 21(3), 460–465 (2012) Bai, T., Kulikowski, C.A.A., Gong, L., Yang, B., Huang, L., Zhou, C.: A global k-modes algorithm for clustering categorical data. Chin. J. Electron. 21(3), 460–465 (2012)
12.
go back to reference Khan, S.S., Ahmad, A.: Cluster center initialization for categorical data using multiple attribute clustering. MultiClust@ SDM (2012) Khan, S.S., Ahmad, A.: Cluster center initialization for categorical data using multiple attribute clustering. MultiClust@ SDM (2012)
13.
go back to reference Li, T.Y., Chen, Y., Jin, Z.H., Li, Y.: Initialization of k-modes clustering for categorical data. In: 2013 IEEE International Conference on Management Science and Engineering, pp. 107–112 (2013) Li, T.Y., Chen, Y., Jin, Z.H., Li, Y.: Initialization of k-modes clustering for categorical data. In: 2013 IEEE International Conference on Management Science and Engineering, pp. 107–112 (2013)
14.
go back to reference Ali, D.S., Ghoneim, A., Saleh, M.: K-modes and entropy cluster centers initialization methods. In: International Conference on Operations Research and Enterprise Systems, pp. 447–454 (2017) Ali, D.S., Ghoneim, A., Saleh, M.: K-modes and entropy cluster centers initialization methods. In: International Conference on Operations Research and Enterprise Systems, pp. 447–454 (2017)
15.
go back to reference Huang, Z.: Extensions to the k-means algorithm for clustering large data sets with categorical values. Data. Min. Knowl. Discov. 2(3), 283–304 (1998)CrossRef Huang, Z.: Extensions to the k-means algorithm for clustering large data sets with categorical values. Data. Min. Knowl. Discov. 2(3), 283–304 (1998)CrossRef
16.
go back to reference Sun, Y., Zhu, Q., Chen, Z.: An iterative initial-points refinement algorithm for categorical data clustering. Pattern Recognit. Lett. 23(7), 875–884 (2002)CrossRef Sun, Y., Zhu, Q., Chen, Z.: An iterative initial-points refinement algorithm for categorical data clustering. Pattern Recognit. Lett. 23(7), 875–884 (2002)CrossRef
17.
go back to reference Bradley, P.S., Fayyad, U.M.: Refining initial points for k-means clustering. In: Fifteenth International Conference on Machine Learning. Morgan Kaufmann Publishers Inc., pp. 91–99 (1998) Bradley, P.S., Fayyad, U.M.: Refining initial points for k-means clustering. In: Fifteenth International Conference on Machine Learning. Morgan Kaufmann Publishers Inc., pp. 91–99 (1998)
18.
go back to reference Barbara, D., Li, Y., Couto, J.: COOLCAT: an entropy-based algorithm for categorical clustering. In: DBLP, vol. 1, pp. 582–589 (2002) Barbara, D., Li, Y., Couto, J.: COOLCAT: an entropy-based algorithm for categorical clustering. In: DBLP, vol. 1, pp. 582–589 (2002)
19.
go back to reference Cao, F., Liang, J., Bai, L.: A new initialization method for categorical data clustering. Expert Syst. Appl. 36(7), 10223–10228 (2009)CrossRef Cao, F., Liang, J., Bai, L.: A new initialization method for categorical data clustering. Expert Syst. Appl. 36(7), 10223–10228 (2009)CrossRef
20.
go back to reference Wu, S., Jiang, Q., Huang, J.Z.: A new initialization method for clustering categorical data. In: Pacific-Asia Conference Advances in Knowledge Discovery and Data Mining, vol. 4426, pp. 972–980 (2007) Wu, S., Jiang, Q., Huang, J.Z.: A new initialization method for clustering categorical data. In: Pacific-Asia Conference Advances in Knowledge Discovery and Data Mining, vol. 4426, pp. 972–980 (2007)
21.
go back to reference Bai, L., Liang, J., Dang, C.: An initialization method to simultaneously find initial cluster centers and the number of clusters for clustering categorical data. Knowl.-Based Syst. 24(6), 785–795 (2011)CrossRef Bai, L., Liang, J., Dang, C.: An initialization method to simultaneously find initial cluster centers and the number of clusters for clustering categorical data. Knowl.-Based Syst. 24(6), 785–795 (2011)CrossRef
22.
go back to reference Khan, S.S., Ahmad, A.: Cluster center initialization algorithm for k-modes clustering. Pattern Recognit. Lett. 40(18), 7444–7456 (2013) Khan, S.S., Ahmad, A.: Cluster center initialization algorithm for k-modes clustering. Pattern Recognit. Lett. 40(18), 7444–7456 (2013)
23.
go back to reference Jiang, F., Liu, G., Du, J., Sui, Y.: Initialization of k-modes clustering using outlier detection techniques. Inf. Sci. 332, 167–183 (2016)CrossRef Jiang, F., Liu, G., Du, J., Sui, Y.: Initialization of k-modes clustering using outlier detection techniques. Inf. Sci. 332, 167–183 (2016)CrossRef
24.
go back to reference Mahajan, P., Kandwal, R., Vijay, R.: Rough set approach in machine learning: a review. Int. J. Comput. Appl. 56(10), 1–13 (1996) Mahajan, P., Kandwal, R., Vijay, R.: Rough set approach in machine learning: a review. Int. J. Comput. Appl. 56(10), 1–13 (1996)
25.
go back to reference Bai, L., Liang, J., Dang, C., Cao, F.: A cluster centers initialization method for clustering categorical data. Expert Syst. Appl. 39(9), 8022–8029 (2012)CrossRef Bai, L., Liang, J., Dang, C., Cao, F.: A cluster centers initialization method for clustering categorical data. Expert Syst. Appl. 39(9), 8022–8029 (2012)CrossRef
26.
go back to reference Wang, C., Chen, D., Wu, C., Hu, Q.: Data compression with homomorphism in covering information systems. Int. J. Approx. Reason. 52(4), 519–525 (2011)MathSciNetCrossRef Wang, C., Chen, D., Wu, C., Hu, Q.: Data compression with homomorphism in covering information systems. Int. J. Approx. Reason. 52(4), 519–525 (2011)MathSciNetCrossRef
27.
28.
go back to reference Hoa, N.S., Son N.H.: Some efficient algorithms for rough set methods. In: 6th International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems, pp. 1541–1457 (2000) Hoa, N.S., Son N.H.: Some efficient algorithms for rough set methods. In: 6th International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems, pp. 1541–1457 (2000)
29.
go back to reference Yang, Y.: An evaluation of statistical approaches to text categorization. Inf. Retr. 1(1), 69–90 (1999)CrossRef Yang, Y.: An evaluation of statistical approaches to text categorization. Inf. Retr. 1(1), 69–90 (1999)CrossRef
Metadata
Title
Attribute weights-based clustering centres algorithm for initialising K-modes clustering
Authors
Liwen Peng
Yongguo Liu
Publication date
16-02-2018
Publisher
Springer US
Published in
Cluster Computing / Issue Special Issue 3/2019
Print ISSN: 1386-7857
Electronic ISSN: 1573-7543
DOI
https://doi.org/10.1007/s10586-018-1889-5

Other articles of this Special Issue 3/2019

Cluster Computing 3/2019 Go to the issue

Premium Partner