Skip to main content
Top
Published in: Progress in Artificial Intelligence 3/2020

09-06-2020 | Regular Paper

Gabriel graph-based connectivity and density for internal validity of clustering

Authors: Fatima Boudane, Ali Berrichi

Published in: Progress in Artificial Intelligence | Issue 3/2020

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Clustering has an important role in data mining field. However, there is a large variety of clustering algorithms and each could generate quite different results depending on input parameters. In the research literature, several cluster validity indices have been proposed to evaluate clustering results and find the partition that best fits the input dataset. However, these validity indices may fail to achieve satisfactory results, especially in case of clusters with arbitrary shapes. In this paper, we propose a new cluster validity index for density-based, arbitrarily shaped clusters. Our new index is based on the density and connectivity relations extracted among the data points, based on the proximity graph, Gabriel graph. The incorporation of the connectivity and density relations allows achieving the best clustering results in the case of clusters with any shape, size or density. The experimental results on synthetic and real datasets, using the well-known neighborhood-based clustering (NBC) algorithm and the DBSCAN (density-based spatial clustering of applications with noise) algorithm, illustrate the superiority of the proposed index over some classical and recent indices and show its effectiveness for the evaluation of clustering algorithms and the selection of their appropriate parameters.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Mirkin, B.: Clustering for Data Mining: A Data Recovery Approach. Chapman & Hall/CRC, Boca Raton, Florida (2005)MATH Mirkin, B.: Clustering for Data Mining: A Data Recovery Approach. Chapman & Hall/CRC, Boca Raton, Florida (2005)MATH
2.
go back to reference Sneath, P.H.A., Sokal, R.R.: Numerical Taxonomy. Books in Biology. W.H. Freeman and Company, San Francisco (1973)MATH Sneath, P.H.A., Sokal, R.R.: Numerical Taxonomy. Books in Biology. W.H. Freeman and Company, San Francisco (1973)MATH
3.
go back to reference Chou, C.H., Su, M.C., Lai, E.: A new cluster validity measure and its application to image compression. Pattern Anal. Appl. 7, 205–220 (2004)MathSciNet Chou, C.H., Su, M.C., Lai, E.: A new cluster validity measure and its application to image compression. Pattern Anal. Appl. 7, 205–220 (2004)MathSciNet
4.
go back to reference Barbarà, D., Jajodia, S.: Applications of Data Mining in Computer Security, pp. 78–99. Kluwer Academic Publishers, Boston, MA (2002)MATH Barbarà, D., Jajodia, S.: Applications of Data Mining in Computer Security, pp. 78–99. Kluwer Academic Publishers, Boston, MA (2002)MATH
5.
go back to reference Sarma, T.H., Viswanath, P., Reddy, B.E.: Speeding-up the kernel K-means clustering method: a prototype based hybrid approach. Pattern Recogn. Lett. 34, 564–573 (2013) Sarma, T.H., Viswanath, P., Reddy, B.E.: Speeding-up the kernel K-means clustering method: a prototype based hybrid approach. Pattern Recogn. Lett. 34, 564–573 (2013)
6.
go back to reference Hartigan, J.A., Wong, M.A.: A K-Means clustering algorithm. Appl. Stat. 28, 100–108 (1979)MATH Hartigan, J.A., Wong, M.A.: A K-Means clustering algorithm. Appl. Stat. 28, 100–108 (1979)MATH
8.
go back to reference Rand, W.M.: Objective criteria for the evaluation of clustering methods. J. Am. Stat. Assoc. 66(336), 846–850 (1971) Rand, W.M.: Objective criteria for the evaluation of clustering methods. J. Am. Stat. Assoc. 66(336), 846–850 (1971)
9.
go back to reference Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2(1), 193–218 (1985)MATH Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2(1), 193–218 (1985)MATH
10.
go back to reference Arbelaitz, O., Gurrutxaga, I., Muguerza, J., Pérez, J.M., Perona, I.: An extensive comparative study of cluster validity indices. Pattern Recognit. 46, 243–256 (2013) Arbelaitz, O., Gurrutxaga, I., Muguerza, J., Pérez, J.M., Perona, I.: An extensive comparative study of cluster validity indices. Pattern Recognit. 46, 243–256 (2013)
12.
go back to reference Liu, Y., Li, Z., Xiong, H., Gao, X., Wu, J., Wu, S.: Understanding and enhancement of internal clustering validation measures. IEEE Trans. Cybern. 43(3), 982–994 (2013) Liu, Y., Li, Z., Xiong, H., Gao, X., Wu, J., Wu, S.: Understanding and enhancement of internal clustering validation measures. IEEE Trans. Cybern. 43(3), 982–994 (2013)
14.
15.
go back to reference Davies, D.L., Bouldin, D.W.: A clustering separation measure. IEEE Trans. Pattern Anal. Mach. Intell. 1, 224–227 (1979) Davies, D.L., Bouldin, D.W.: A clustering separation measure. IEEE Trans. Pattern Anal. Mach. Intell. 1, 224–227 (1979)
16.
17.
go back to reference Lee, Y., Lee, J.H., Jun, C.H.: Validation measures of bi cluster solutions. Ind. Eng. Manag. Syst. 8(2), 101–108 (2009)MathSciNet Lee, Y., Lee, J.H., Jun, C.H.: Validation measures of bi cluster solutions. Ind. Eng. Manag. Syst. 8(2), 101–108 (2009)MathSciNet
19.
go back to reference Moulavi, D., Jaskowiak, P.A., Campello, R.J.G.B., Zimek, A., Sander, J.: Density-based clustering validation. In: Proceedings of the 14th SIAM International Conference on Data Mining (SDM), Philadelphia, PA (2014) Moulavi, D., Jaskowiak, P.A., Campello, R.J.G.B., Zimek, A., Sander, J.: Density-based clustering validation. In: Proceedings of the 14th SIAM International Conference on Data Mining (SDM), Philadelphia, PA (2014)
20.
go back to reference Gabriel, K.R., Sokal, R.R.: New statistical approach to geographic variation analysis. Syst. Zool. 18(3), 259–278 (1969) Gabriel, K.R., Sokal, R.R.: New statistical approach to geographic variation analysis. Syst. Zool. 18(3), 259–278 (1969)
21.
go back to reference Zhou, S., Zhao, Y., Guan, J., Huang, J.: NBC: A neighborhood-based clustering algorithm. In: Proceedings of the 9th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), pp. 361–371. (2005) Zhou, S., Zhao, Y., Guan, J., Huang, J.: NBC: A neighborhood-based clustering algorithm. In: Proceedings of the 9th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), pp. 361–371. (2005)
22.
go back to reference Halkidi, M., Batistakis, Y., Vazirgiannis, M.: On clustering validation techniques. J. Intell. Inf. Syst. 17(2/3), 107–145 (2001)MATH Halkidi, M., Batistakis, Y., Vazirgiannis, M.: On clustering validation techniques. J. Intell. Inf. Syst. 17(2/3), 107–145 (2001)MATH
23.
go back to reference Rousseeuw, P.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20, 53–65 (1987)MATH Rousseeuw, P.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20, 53–65 (1987)MATH
24.
go back to reference Zhang, D., Ji, M., Yang, J., Zhang, Y., Xie, F.: A novel cluster validity index for fuzzy clustering based on bipartite modularity. Fuzzy Sets Syst. 253, 122–137 (2014)MathSciNet Zhang, D., Ji, M., Yang, J., Zhang, Y., Xie, F.: A novel cluster validity index for fuzzy clustering based on bipartite modularity. Fuzzy Sets Syst. 253, 122–137 (2014)MathSciNet
25.
go back to reference Yang, X., Song, Q., Cao, A.: A new cluster validity for data clustering. Neural Process. Lett. 23(3), 325–344 (2006) Yang, X., Song, Q., Cao, A.: A new cluster validity for data clustering. Neural Process. Lett. 23(3), 325–344 (2006)
26.
go back to reference Rojas-Thomas, J.C., Santos, M., Mora, M.: New internal index for clustering validation based on graphs. Expert Syst. Appl. 86, 334–349 (2017) Rojas-Thomas, J.C., Santos, M., Mora, M.: New internal index for clustering validation based on graphs. Expert Syst. Appl. 86, 334–349 (2017)
27.
go back to reference Yang, J., Lee, I.: Cluster validity through graph based boundary analysis. In: IKE, pp. 204–210. (2004) Yang, J., Lee, I.: Cluster validity through graph based boundary analysis. In: IKE, pp. 204–210. (2004)
28.
go back to reference Pal, N., Biswas, J.: Cluster validation using graph theoretic concepts. Pattern Recognit. 30(6), 847–857 (1997) Pal, N., Biswas, J.: Cluster validation using graph theoretic concepts. Pattern Recognit. 30(6), 847–857 (1997)
29.
go back to reference Halkidi, M., Vazirgiannis, M.: Clustering validity assessment: finding the optimal partitioning of a data set. In: Proceedings of the First IEEE International Conference on Data Mining (ICDM’01), pp. 187–194, California, USA (2001) Halkidi, M., Vazirgiannis, M.: Clustering validity assessment: finding the optimal partitioning of a data set. In: Proceedings of the First IEEE International Conference on Data Mining (ICDM’01), pp. 187–194, California, USA (2001)
30.
go back to reference Halkidi, M., Vazirgiannis, M.: A density-based cluster validity approach using multi-representatives. Pattern Recognit. Lett. 29, 773–786 (2008) Halkidi, M., Vazirgiannis, M.: A density-based cluster validity approach using multi-representatives. Pattern Recognit. Lett. 29, 773–786 (2008)
31.
go back to reference Žalik, K.R., Žalik, B.: Validity index for clusters of different sizes and densities. Pattern Recognit. Lett. 32, 221–234 (2011)MATH Žalik, K.R., Žalik, B.: Validity index for clusters of different sizes and densities. Pattern Recognit. Lett. 32, 221–234 (2011)MATH
32.
go back to reference Thalamuthu, A., Mukhopadhyay, I., Zheng, X., Tseng, G.: Evaluation and comparison of gene clustering methods in microarray analysis. Bioinformatics 22(19), 2405–2412 (2006) Thalamuthu, A., Mukhopadhyay, I., Zheng, X., Tseng, G.: Evaluation and comparison of gene clustering methods in microarray analysis. Bioinformatics 22(19), 2405–2412 (2006)
33.
go back to reference Carpineto, C., Romano, G.: Consensus clustering based on a new probabilistic rand index with application to subtopic retrieval. IEEE Trans. Pattern Anal. Mach. Intell. 34, 2315–2326 (2012) Carpineto, C., Romano, G.: Consensus clustering based on a new probabilistic rand index with application to subtopic retrieval. IEEE Trans. Pattern Anal. Mach. Intell. 34, 2315–2326 (2012)
34.
go back to reference Yeh, C.C., Yang, M.S.: Evaluation measures for cluster ensembles based on a fuzzy generalized Rand index. Appl. Soft Comput. 57, 225–234 (2017) Yeh, C.C., Yang, M.S.: Evaluation measures for cluster ensembles based on a fuzzy generalized Rand index. Appl. Soft Comput. 57, 225–234 (2017)
35.
go back to reference Zaidi, F., Melançon, G.: Evaluating the quality of clustering algorithms using cluster path lengths, 10th Industrial Conference (ICDM), pp. 42–56, Berlin, Germany (2010) Zaidi, F., Melançon, G.: Evaluating the quality of clustering algorithms using cluster path lengths, 10th Industrial Conference (ICDM), pp. 42–56, Berlin, Germany (2010)
36.
go back to reference Almeida, H., NetoZaki Jr., D.G.W.M.M.J.: Towards a better quality metric for graph cluster evaluation. J. Inf. Data Manag. 3(3), 378–393 (2012) Almeida, H., NetoZaki Jr., D.G.W.M.M.J.: Towards a better quality metric for graph cluster evaluation. J. Inf. Data Manag. 3(3), 378–393 (2012)
37.
go back to reference Urquhart, R.: Graph theoretical clustering based on limited neighbourhood sets. Pattern Recognit. 15(3), 173–187 (1982)MATH Urquhart, R.: Graph theoretical clustering based on limited neighbourhood sets. Pattern Recognit. 15(3), 173–187 (1982)MATH
38.
go back to reference Koontz, W.L.G., Narendra, P.M., Fukunaga, K.: A graph-theoretic approach to non parametric cluster analysis. IEEE Trans. Comput. 25, 936–944 (1976)MathSciNetMATH Koontz, W.L.G., Narendra, P.M., Fukunaga, K.: A graph-theoretic approach to non parametric cluster analysis. IEEE Trans. Comput. 25, 936–944 (1976)MathSciNetMATH
39.
go back to reference Liu, D., Novovskiy, G.V., Sourina, O.: Effective clustering and boundary detection algorithm based on Delaunay triangulation. Pattern Recognit. Lett. 29, 1261–1273 (2008) Liu, D., Novovskiy, G.V., Sourina, O.: Effective clustering and boundary detection algorithm based on Delaunay triangulation. Pattern Recognit. Lett. 29, 1261–1273 (2008)
40.
go back to reference Inkaya, T., Kayalıgil, S., Özdemirel, N.E.: A new density-based clustering approach in graph theoretic context. Int. J. Comput. Sci. Inf. Technol. 5(2), 117–135 (2010) Inkaya, T., Kayalıgil, S., Özdemirel, N.E.: A new density-based clustering approach in graph theoretic context. Int. J. Comput. Sci. Inf. Technol. 5(2), 117–135 (2010)
42.
go back to reference Franti, P.: Speech and image processing unit, clustering datasets. University of Eastern Finland, School of Computing (2015) Franti, P.: Speech and image processing unit, clustering datasets. University of Eastern Finland, School of Computing (2015)
43.
go back to reference Ester, M., Kriegel, H. P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of KDD, pp. 226–231 (1996) Ester, M., Kriegel, H. P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of KDD, pp. 226–231 (1996)
44.
go back to reference Zhu, E., Ma, R.: An effective partitional clustering algorithm based on new clustering validity index. Appl. Soft Comput. 71, 608–621 (2018) Zhu, E., Ma, R.: An effective partitional clustering algorithm based on new clustering validity index. Appl. Soft Comput. 71, 608–621 (2018)
45.
go back to reference Rezaei, M., Fränti, P.: Set matching measures for external cluster validity. IEEE Trans. Knowl. Data Eng. 28(8), 2173–2186 (2016) Rezaei, M., Fränti, P.: Set matching measures for external cluster validity. IEEE Trans. Knowl. Data Eng. 28(8), 2173–2186 (2016)
Metadata
Title
Gabriel graph-based connectivity and density for internal validity of clustering
Authors
Fatima Boudane
Ali Berrichi
Publication date
09-06-2020
Publisher
Springer Berlin Heidelberg
Published in
Progress in Artificial Intelligence / Issue 3/2020
Print ISSN: 2192-6352
Electronic ISSN: 2192-6360
DOI
https://doi.org/10.1007/s13748-020-00209-z

Other articles of this Issue 3/2020

Progress in Artificial Intelligence 3/2020 Go to the issue

Premium Partner