Top

Journal of Intelligent Information Systems

Published in:

22-04-2020

A minimum spanning tree based partitioning and merging technique for clustering heterogeneous data sets

Authors: Gaurav Mishra, Sraban Kumar Mohanty

Published in: Journal of Intelligent Information Systems | Issue 3/2020

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Clustering being an unsupervised learning technique, has been used extensively for knowledge discovery due to its less dependency on domain knowledge. Many clustering techniques were proposed in the literature to recognize the cluster of different characteristics. Most of them become inadequate either due to their dependency on user-defined parameters or when they are applied on multi-scale datasets. Hybrid clustering techniques have been proposed to take the advantage of both Partitional and Hierarchical techniques by first partitioning the dataset into several dense sub-clusters and merging them into actual clusters. However, the universality of the partition and merging criteria are not sufficient to capture many characteristics of the clusters. Minimum spanning tree (MST) has been used extensively for clustering because it preserves the intrinsic nature of the dataset even after the sparsification of the graph. In this paper, we propose a parameter-free, minimum spanning tree based efficient hybrid clustering algorithm to cluster the multi-scale datasets. In the first phase, we construct a MST of the dataset to capture the neighborhood information of data points and employ box-plot, an outlier detection technique on MST edge weights for effectively selecting the inconsistent edges to partition the data points into several dense sub-clusters. In the second phase, we propose a novel merging criterion to find the genuine clusters by iteratively merging only the pairs of adjacent sub-clusters. The merging technique involves both dis-connectivity and intra-similarity using the topology of two adjacent pairs which helps to identify the arbitrary shape and varying density clusters. Experiment results on various synthetic and real world datasets demonstrate the superior performance of the proposed technique over other popular clustering algorithms.

previous article Compacting frequent star patterns in RDF graphs

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Bezdek, J.C., & Pal, N.R. (1998). Some new indexes of cluster validity. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 28(3), 301–315.CrossRef

Blake, C., & Merz, C. (1998). Uci repository of machine learning databases [http://www.ics.uci.edu/mlearn/mlrepository.html], department of information and computer science, University of California, Irvine, CA, Vol. 55.

Chen, X. (2013). Clustering based on a near neighbor graph and a grid cell graph. Journal of Intelligent Information Systems, 40(3), 529–554.CrossRef

Cheng, Q., Liu, Z., Huang, J., & Cheng, G. (2016a). Community detection in hypernetwork via density-ordered tree partition. Applied Mathematics and Computation, 276, 384–393.MathSciNetCrossRef

Cheng, Q., Lu, X., Liu, Z., Huang, J., & Cheng, G. (2016b). Spatial clustering with density-ordered tree. Physica A:, Statistical Mechanics and its Applications, 460, 188–200.CrossRef

Chung, C.H., & Dai, B.R. (2014). A fragment-based iterative consensus clustering algorithm with a robust similarity. Knowledge and information systems, 41(3), 591–609.CrossRef

Das, A.K., & Sil, J. (2007). Cluster validation using splitting and merging technique, International conference on computational intelligence and multimedia applications (ICCIMA 2007), vol. 2, pp. 56–60. IEEE.

Du, M., Ding, S., Xue, Y., & Shi, Z. (2019). A novel density peaks clustering with sensitivity of local density and density-adaptive metric. Knowledge and Information Systems, 59(2), 285–309.CrossRef

Ester, M., Kriegel, H.P., Sander, J., Xu, X., & et al. (1996). A density-based algorithm for discovering clusters in large spatial databases with noise. In Kdd, vol. 96, pp. 226–231.

Grygorash, O., Zhou, Y., & Jorgensen, Z. (2006). Minimum spanning tree based clustering algorithms. In 18Th IEEE international conference on tools with artificial intelligence (ICTAI’06), pp. 73–81. IEEE.

Guha, S., Rastogi, R., & Shim, K. (1998). Cure: an efficient clustering algorithm for large databases. ACM Sigmod Record, 27(2), 73–84.CrossRef

Halkidi, M., Batistakis, Y., & Vazirgiannis, M. (2001). On clustering validation techniques. Journal of intelligent information systems, 17(2-3), 107–145.CrossRef

Hartigan, J.A., & Wong, M.A. (1979). Algorithm as 136: a k-means clustering algorithm. Journal of the Royal Statistical Society. Series C (Applied Statistics), 28(1), 100–108.MATH

Hu, W., & he Pan, Q. (2015). Data clustering and analyzing techniques using hierarchical clustering method. Multimedia Tools and Applications, 74(19), 8495–8504.CrossRef

Hyde, R., & et al. (2015). Lancaster university clustering datasets. http://www.lancaster.ac.uk/pg/hyder/Downloads/downloads.html.

Jain, A.K., & Dubes, R.C. (1988). Algorithms for clustering data, Prentice-Hall, Inc.

Jiau, H.C., Su, Y.J., Lin, Y.M., & Tsai, S.R. (2006). Mpm: a hierarchical clustering algorithm using matrix partitioning method for non-numeric data. Journal of Intelligent Information Systems, 26(2), 185–207.CrossRef

Jothi, R., Mohanty, S.K., & Ojha, A. (2016). Functional grouping of similar genes using eigenanalysis on minimum spanning tree based neighborhood graph. Computers in biology and medicine, 71, 135–148.CrossRef

Jothi, R., Mohanty, S.K., & Ojha, A. (2016). On careful selection of initial centers for k-means algorithm. In Proceedings of 3rd International Conference on Advanced Computing, Networking and Informatics, pp. 435–445. Springer.

Jothi, R., Mohanty, S.K., & Ojha, A. (2018). Fast approximate minimum spanning tree based clustering algorithm. Neurocomputing, 272, 542–557.CrossRef

Karypis, G., Han, E.H., & Kumar, V. (1999). Chameleon: Hierarchical clustering using dynamic modeling. Computer, 32(8), 68–75.CrossRef

Kavitha, E., & Tamilarasan, R. (2019). Agglo-hi clustering algorithm for gene expression micro array data using proximity measures. Multimedia Tools and Applications, 79, 9003–9017.CrossRef

Koga, H., Ishibashi, T., & Watanabe, T. (2007). Fast agglomerative hierarchical clustering algorithm using locality-sensitive hashing. Knowledge and Information Systems, 12(1), 25–53.CrossRef

Kriegel, H.P., Kröger, P., Sander, J., & Zimek, A. (2011). Density-based clustering. Wiley Interdisciplinary Reviews:, Data Mining and Knowledge Discovery, 1 (3), 231–240.

Kumar, K.M., & Reddy, A.R.M. (2016). A fast dbscan clustering algorithm by accelerating neighbor searching using groups method. Pattern Recognition, 58, 39–48.CrossRef

Li, J., Wang, X., & Wang, X. (2019). A scaled-mst-based clustering algorithm and application on image segmentation, Journal of Intelligent Information Systems, pp 1–25. https://doi.org/10.1007/s10844-019-00572-x.

Li, X., Kao, B., Luo, S., & Ester, M. (2018). Rosc: Robust spectral clustering on multi-scale data. In Proceedings of the 2018 World Wide Web Conference, pp. 157–166.

Limwattanapibool, O., & Arch-int, S. (2017). Determination of the appropriate parameters for k-means clustering using selection of region clusters based on density dbscan (srcd-dbscan). Expert Systems, 34(3), 12204.CrossRef

Lin, C.R., & Chen, M.S. (2005). Combining partitional and hierarchical algorithms for robust and efficient data clustering with cohesion self-merging. IEEE Transactions on Knowledge and Data Engineering, 17(2), 145–159.CrossRef

Mishra, G., & Mohanty, S. (2020). Rdmn: a relative density measure based on mst neighborhood for clustering multi-scale datasets, IEEE Transactions on Knowledge and Data Engineering, pp 1–1, https://doi.org/10.1109/TKDE.2020.2982400.

Mishra, G., & Mohanty, S.K. (2019). A fast hybrid clustering technique based on local nearest neighbor using minimum spanning tree. Expert Systems with Applications, 132, 28–43.CrossRef

Otoo, E.J., Shoshani, A., & Hwang, S.w. (2001). Clustering high dimensional massive scientific datasets. Journal of Intelligent Information Systems, 17(2-3), 147–168.CrossRef

Pasi, F., & et al. (2015). Clustering datasets. http://cs.uef.fi/sipu/datasets/.

Rand, W.M. (1971). Objective criteria for the evaluation of clustering methods. Journal of the American Statistical association, 66(336), 846–850.CrossRef

Schlitter, N., Falkowski, T., & Lässig, J. (2014). Dengraph-ho: a density-based hierarchical graph clustering algorithm. Expert Systems, 31(5), 469–479.CrossRef

Tong, T., Zhu, X., & Du, T. (2019). Connected graph decomposition for spectral clustering. Multimedia Tools and Applications, 78(23), 33247–33259.CrossRef

Wagner, S., & Wagner, D. (2007). Comparing clusterings: an overview. Universität Karlsruhe: Fakultät für Informatik Karlsruhe.

Walker, M., & Chakraborti, S. (2013). An asymmetrically modified boxplot for exploratory data analysis. The University of Alabama: Department of Information Systems Statistics, and Management Science.

Wang, X., Wang, X.L., Chen, C., & Wilkes, D.M. (2013). Enhancing minimum spanning tree-based clustering by removing density-based outliers. Digital Signal Processing, 23(5), 1523–1538.MathSciNetCrossRef

Wickham, H., & Stryjewski, L. (2011). 40 years of boxplots. Am Statistician.

Zahn, C.T. (1971). Graph-theoretical methods for detecting and describing gestalt clusters. IEEE Transactions on computers, 100(1), 68–86.CrossRef

Zhong, C., Miao, D., & Fränti, P. (2011). Minimum spanning tree based split-and-merge: a hierarchical clustering method. Information Sciences, 181(16), 3397–3410.CrossRef

Title: A minimum spanning tree based partitioning and merging technique for clustering heterogeneous data sets
Authors: Gaurav Mishra
Sraban Kumar Mohanty
Publication date: 22-04-2020
Publisher: Springer US
Published in: Journal of Intelligent Information Systems / Issue 3/2020
Print ISSN: 0925-9902
Electronic ISSN: 1573-7675
DOI: https://doi.org/10.1007/s10844-020-00602-z

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Other articles of this Issue 3/2020

A survey on intention analysis: successful approaches and open challenges

Machine learning for music genre: multifaceted review and experimentation with audioset

The contribution of linked open data to augment a traditional data warehouse

Compacting frequent star patterns in RDF graphs

Automated conversion from natural language query to SPARQL query

System for monitoring road slippery based on CCTV cameras and convolutional neural networks

Premium Partner