Skip to main content
Top
Published in: Journal of Intelligent Information Systems 3/2020

22-04-2020

A minimum spanning tree based partitioning and merging technique for clustering heterogeneous data sets

Authors: Gaurav Mishra, Sraban Kumar Mohanty

Published in: Journal of Intelligent Information Systems | Issue 3/2020

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Clustering being an unsupervised learning technique, has been used extensively for knowledge discovery due to its less dependency on domain knowledge. Many clustering techniques were proposed in the literature to recognize the cluster of different characteristics. Most of them become inadequate either due to their dependency on user-defined parameters or when they are applied on multi-scale datasets. Hybrid clustering techniques have been proposed to take the advantage of both Partitional and Hierarchical techniques by first partitioning the dataset into several dense sub-clusters and merging them into actual clusters. However, the universality of the partition and merging criteria are not sufficient to capture many characteristics of the clusters. Minimum spanning tree (MST) has been used extensively for clustering because it preserves the intrinsic nature of the dataset even after the sparsification of the graph. In this paper, we propose a parameter-free, minimum spanning tree based efficient hybrid clustering algorithm to cluster the multi-scale datasets. In the first phase, we construct a MST of the dataset to capture the neighborhood information of data points and employ box-plot, an outlier detection technique on MST edge weights for effectively selecting the inconsistent edges to partition the data points into several dense sub-clusters. In the second phase, we propose a novel merging criterion to find the genuine clusters by iteratively merging only the pairs of adjacent sub-clusters. The merging technique involves both dis-connectivity and intra-similarity using the topology of two adjacent pairs which helps to identify the arbitrary shape and varying density clusters. Experiment results on various synthetic and real world datasets demonstrate the superior performance of the proposed technique over other popular clustering algorithms.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
go back to reference Bezdek, J.C., & Pal, N.R. (1998). Some new indexes of cluster validity. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 28(3), 301–315.CrossRef Bezdek, J.C., & Pal, N.R. (1998). Some new indexes of cluster validity. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 28(3), 301–315.CrossRef
go back to reference Chen, X. (2013). Clustering based on a near neighbor graph and a grid cell graph. Journal of Intelligent Information Systems, 40(3), 529–554.CrossRef Chen, X. (2013). Clustering based on a near neighbor graph and a grid cell graph. Journal of Intelligent Information Systems, 40(3), 529–554.CrossRef
go back to reference Cheng, Q., Liu, Z., Huang, J., & Cheng, G. (2016a). Community detection in hypernetwork via density-ordered tree partition. Applied Mathematics and Computation, 276, 384–393.MathSciNetCrossRef Cheng, Q., Liu, Z., Huang, J., & Cheng, G. (2016a). Community detection in hypernetwork via density-ordered tree partition. Applied Mathematics and Computation, 276, 384–393.MathSciNetCrossRef
go back to reference Cheng, Q., Lu, X., Liu, Z., Huang, J., & Cheng, G. (2016b). Spatial clustering with density-ordered tree. Physica A:, Statistical Mechanics and its Applications, 460, 188–200.CrossRef Cheng, Q., Lu, X., Liu, Z., Huang, J., & Cheng, G. (2016b). Spatial clustering with density-ordered tree. Physica A:, Statistical Mechanics and its Applications, 460, 188–200.CrossRef
go back to reference Chung, C.H., & Dai, B.R. (2014). A fragment-based iterative consensus clustering algorithm with a robust similarity. Knowledge and information systems, 41(3), 591–609.CrossRef Chung, C.H., & Dai, B.R. (2014). A fragment-based iterative consensus clustering algorithm with a robust similarity. Knowledge and information systems, 41(3), 591–609.CrossRef
go back to reference Das, A.K., & Sil, J. (2007). Cluster validation using splitting and merging technique, International conference on computational intelligence and multimedia applications (ICCIMA 2007), vol. 2, pp. 56–60. IEEE. Das, A.K., & Sil, J. (2007). Cluster validation using splitting and merging technique, International conference on computational intelligence and multimedia applications (ICCIMA 2007), vol. 2, pp. 56–60. IEEE.
go back to reference Du, M., Ding, S., Xue, Y., & Shi, Z. (2019). A novel density peaks clustering with sensitivity of local density and density-adaptive metric. Knowledge and Information Systems, 59(2), 285–309.CrossRef Du, M., Ding, S., Xue, Y., & Shi, Z. (2019). A novel density peaks clustering with sensitivity of local density and density-adaptive metric. Knowledge and Information Systems, 59(2), 285–309.CrossRef
go back to reference Ester, M., Kriegel, H.P., Sander, J., Xu, X., & et al. (1996). A density-based algorithm for discovering clusters in large spatial databases with noise. In Kdd, vol. 96, pp. 226–231. Ester, M., Kriegel, H.P., Sander, J., Xu, X., & et al. (1996). A density-based algorithm for discovering clusters in large spatial databases with noise. In Kdd, vol. 96, pp. 226–231.
go back to reference Grygorash, O., Zhou, Y., & Jorgensen, Z. (2006). Minimum spanning tree based clustering algorithms. In 18Th IEEE international conference on tools with artificial intelligence (ICTAI’06), pp. 73–81. IEEE. Grygorash, O., Zhou, Y., & Jorgensen, Z. (2006). Minimum spanning tree based clustering algorithms. In 18Th IEEE international conference on tools with artificial intelligence (ICTAI’06), pp. 73–81. IEEE.
go back to reference Guha, S., Rastogi, R., & Shim, K. (1998). Cure: an efficient clustering algorithm for large databases. ACM Sigmod Record, 27(2), 73–84.CrossRef Guha, S., Rastogi, R., & Shim, K. (1998). Cure: an efficient clustering algorithm for large databases. ACM Sigmod Record, 27(2), 73–84.CrossRef
go back to reference Halkidi, M., Batistakis, Y., & Vazirgiannis, M. (2001). On clustering validation techniques. Journal of intelligent information systems, 17(2-3), 107–145.CrossRef Halkidi, M., Batistakis, Y., & Vazirgiannis, M. (2001). On clustering validation techniques. Journal of intelligent information systems, 17(2-3), 107–145.CrossRef
go back to reference Hartigan, J.A., & Wong, M.A. (1979). Algorithm as 136: a k-means clustering algorithm. Journal of the Royal Statistical Society. Series C (Applied Statistics), 28(1), 100–108.MATH Hartigan, J.A., & Wong, M.A. (1979). Algorithm as 136: a k-means clustering algorithm. Journal of the Royal Statistical Society. Series C (Applied Statistics), 28(1), 100–108.MATH
go back to reference Hu, W., & he Pan, Q. (2015). Data clustering and analyzing techniques using hierarchical clustering method. Multimedia Tools and Applications, 74(19), 8495–8504.CrossRef Hu, W., & he Pan, Q. (2015). Data clustering and analyzing techniques using hierarchical clustering method. Multimedia Tools and Applications, 74(19), 8495–8504.CrossRef
go back to reference Jain, A.K., & Dubes, R.C. (1988). Algorithms for clustering data, Prentice-Hall, Inc. Jain, A.K., & Dubes, R.C. (1988). Algorithms for clustering data, Prentice-Hall, Inc.
go back to reference Jiau, H.C., Su, Y.J., Lin, Y.M., & Tsai, S.R. (2006). Mpm: a hierarchical clustering algorithm using matrix partitioning method for non-numeric data. Journal of Intelligent Information Systems, 26(2), 185–207.CrossRef Jiau, H.C., Su, Y.J., Lin, Y.M., & Tsai, S.R. (2006). Mpm: a hierarchical clustering algorithm using matrix partitioning method for non-numeric data. Journal of Intelligent Information Systems, 26(2), 185–207.CrossRef
go back to reference Jothi, R., Mohanty, S.K., & Ojha, A. (2016). Functional grouping of similar genes using eigenanalysis on minimum spanning tree based neighborhood graph. Computers in biology and medicine, 71, 135–148.CrossRef Jothi, R., Mohanty, S.K., & Ojha, A. (2016). Functional grouping of similar genes using eigenanalysis on minimum spanning tree based neighborhood graph. Computers in biology and medicine, 71, 135–148.CrossRef
go back to reference Jothi, R., Mohanty, S.K., & Ojha, A. (2016). On careful selection of initial centers for k-means algorithm. In Proceedings of 3rd International Conference on Advanced Computing, Networking and Informatics, pp. 435–445. Springer. Jothi, R., Mohanty, S.K., & Ojha, A. (2016). On careful selection of initial centers for k-means algorithm. In Proceedings of 3rd International Conference on Advanced Computing, Networking and Informatics, pp. 435–445. Springer.
go back to reference Jothi, R., Mohanty, S.K., & Ojha, A. (2018). Fast approximate minimum spanning tree based clustering algorithm. Neurocomputing, 272, 542–557.CrossRef Jothi, R., Mohanty, S.K., & Ojha, A. (2018). Fast approximate minimum spanning tree based clustering algorithm. Neurocomputing, 272, 542–557.CrossRef
go back to reference Karypis, G., Han, E.H., & Kumar, V. (1999). Chameleon: Hierarchical clustering using dynamic modeling. Computer, 32(8), 68–75.CrossRef Karypis, G., Han, E.H., & Kumar, V. (1999). Chameleon: Hierarchical clustering using dynamic modeling. Computer, 32(8), 68–75.CrossRef
go back to reference Kavitha, E., & Tamilarasan, R. (2019). Agglo-hi clustering algorithm for gene expression micro array data using proximity measures. Multimedia Tools and Applications, 79, 9003–9017.CrossRef Kavitha, E., & Tamilarasan, R. (2019). Agglo-hi clustering algorithm for gene expression micro array data using proximity measures. Multimedia Tools and Applications, 79, 9003–9017.CrossRef
go back to reference Koga, H., Ishibashi, T., & Watanabe, T. (2007). Fast agglomerative hierarchical clustering algorithm using locality-sensitive hashing. Knowledge and Information Systems, 12(1), 25–53.CrossRef Koga, H., Ishibashi, T., & Watanabe, T. (2007). Fast agglomerative hierarchical clustering algorithm using locality-sensitive hashing. Knowledge and Information Systems, 12(1), 25–53.CrossRef
go back to reference Kriegel, H.P., Kröger, P., Sander, J., & Zimek, A. (2011). Density-based clustering. Wiley Interdisciplinary Reviews:, Data Mining and Knowledge Discovery, 1 (3), 231–240. Kriegel, H.P., Kröger, P., Sander, J., & Zimek, A. (2011). Density-based clustering. Wiley Interdisciplinary Reviews:, Data Mining and Knowledge Discovery, 1 (3), 231–240.
go back to reference Kumar, K.M., & Reddy, A.R.M. (2016). A fast dbscan clustering algorithm by accelerating neighbor searching using groups method. Pattern Recognition, 58, 39–48.CrossRef Kumar, K.M., & Reddy, A.R.M. (2016). A fast dbscan clustering algorithm by accelerating neighbor searching using groups method. Pattern Recognition, 58, 39–48.CrossRef
go back to reference Li, X., Kao, B., Luo, S., & Ester, M. (2018). Rosc: Robust spectral clustering on multi-scale data. In Proceedings of the 2018 World Wide Web Conference, pp. 157–166. Li, X., Kao, B., Luo, S., & Ester, M. (2018). Rosc: Robust spectral clustering on multi-scale data. In Proceedings of the 2018 World Wide Web Conference, pp. 157–166.
go back to reference Limwattanapibool, O., & Arch-int, S. (2017). Determination of the appropriate parameters for k-means clustering using selection of region clusters based on density dbscan (srcd-dbscan). Expert Systems, 34(3), 12204.CrossRef Limwattanapibool, O., & Arch-int, S. (2017). Determination of the appropriate parameters for k-means clustering using selection of region clusters based on density dbscan (srcd-dbscan). Expert Systems, 34(3), 12204.CrossRef
go back to reference Lin, C.R., & Chen, M.S. (2005). Combining partitional and hierarchical algorithms for robust and efficient data clustering with cohesion self-merging. IEEE Transactions on Knowledge and Data Engineering, 17(2), 145–159.CrossRef Lin, C.R., & Chen, M.S. (2005). Combining partitional and hierarchical algorithms for robust and efficient data clustering with cohesion self-merging. IEEE Transactions on Knowledge and Data Engineering, 17(2), 145–159.CrossRef
go back to reference Mishra, G., & Mohanty, S.K. (2019). A fast hybrid clustering technique based on local nearest neighbor using minimum spanning tree. Expert Systems with Applications, 132, 28–43.CrossRef Mishra, G., & Mohanty, S.K. (2019). A fast hybrid clustering technique based on local nearest neighbor using minimum spanning tree. Expert Systems with Applications, 132, 28–43.CrossRef
go back to reference Otoo, E.J., Shoshani, A., & Hwang, S.w. (2001). Clustering high dimensional massive scientific datasets. Journal of Intelligent Information Systems, 17(2-3), 147–168.CrossRef Otoo, E.J., Shoshani, A., & Hwang, S.w. (2001). Clustering high dimensional massive scientific datasets. Journal of Intelligent Information Systems, 17(2-3), 147–168.CrossRef
go back to reference Rand, W.M. (1971). Objective criteria for the evaluation of clustering methods. Journal of the American Statistical association, 66(336), 846–850.CrossRef Rand, W.M. (1971). Objective criteria for the evaluation of clustering methods. Journal of the American Statistical association, 66(336), 846–850.CrossRef
go back to reference Schlitter, N., Falkowski, T., & Lässig, J. (2014). Dengraph-ho: a density-based hierarchical graph clustering algorithm. Expert Systems, 31(5), 469–479.CrossRef Schlitter, N., Falkowski, T., & Lässig, J. (2014). Dengraph-ho: a density-based hierarchical graph clustering algorithm. Expert Systems, 31(5), 469–479.CrossRef
go back to reference Tong, T., Zhu, X., & Du, T. (2019). Connected graph decomposition for spectral clustering. Multimedia Tools and Applications, 78(23), 33247–33259.CrossRef Tong, T., Zhu, X., & Du, T. (2019). Connected graph decomposition for spectral clustering. Multimedia Tools and Applications, 78(23), 33247–33259.CrossRef
go back to reference Wagner, S., & Wagner, D. (2007). Comparing clusterings: an overview. Universität Karlsruhe: Fakultät für Informatik Karlsruhe. Wagner, S., & Wagner, D. (2007). Comparing clusterings: an overview. Universität Karlsruhe: Fakultät für Informatik Karlsruhe.
go back to reference Walker, M., & Chakraborti, S. (2013). An asymmetrically modified boxplot for exploratory data analysis. The University of Alabama: Department of Information Systems Statistics, and Management Science. Walker, M., & Chakraborti, S. (2013). An asymmetrically modified boxplot for exploratory data analysis. The University of Alabama: Department of Information Systems Statistics, and Management Science.
go back to reference Wang, X., Wang, X.L., Chen, C., & Wilkes, D.M. (2013). Enhancing minimum spanning tree-based clustering by removing density-based outliers. Digital Signal Processing, 23(5), 1523–1538.MathSciNetCrossRef Wang, X., Wang, X.L., Chen, C., & Wilkes, D.M. (2013). Enhancing minimum spanning tree-based clustering by removing density-based outliers. Digital Signal Processing, 23(5), 1523–1538.MathSciNetCrossRef
go back to reference Wickham, H., & Stryjewski, L. (2011). 40 years of boxplots. Am Statistician. Wickham, H., & Stryjewski, L. (2011). 40 years of boxplots. Am Statistician.
go back to reference Zahn, C.T. (1971). Graph-theoretical methods for detecting and describing gestalt clusters. IEEE Transactions on computers, 100(1), 68–86.CrossRef Zahn, C.T. (1971). Graph-theoretical methods for detecting and describing gestalt clusters. IEEE Transactions on computers, 100(1), 68–86.CrossRef
go back to reference Zhong, C., Miao, D., & Fränti, P. (2011). Minimum spanning tree based split-and-merge: a hierarchical clustering method. Information Sciences, 181(16), 3397–3410.CrossRef Zhong, C., Miao, D., & Fränti, P. (2011). Minimum spanning tree based split-and-merge: a hierarchical clustering method. Information Sciences, 181(16), 3397–3410.CrossRef
Metadata
Title
A minimum spanning tree based partitioning and merging technique for clustering heterogeneous data sets
Authors
Gaurav Mishra
Sraban Kumar Mohanty
Publication date
22-04-2020
Publisher
Springer US
Published in
Journal of Intelligent Information Systems / Issue 3/2020
Print ISSN: 0925-9902
Electronic ISSN: 1573-7675
DOI
https://doi.org/10.1007/s10844-020-00602-z

Other articles of this Issue 3/2020

Journal of Intelligent Information Systems 3/2020 Go to the issue

Premium Partner