Top

Neural Computing and Applications

Published in:

13-03-2021 | S.I. : WSOM 2019

Automatic identification of the number of clusters in hierarchical clustering

Authors: Ashutosh Karna, Karina Gibert

Published in: Neural Computing and Applications | Issue 1/2022

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Hierarchical clustering is one of the most suitable tools to discover the underlying true structure of a dataset in the case of unsupervised learning where the ground truth is unknown and classical machine learning classifiers are not suitable. In many real applications, it provides a perspective on inner data structure and is preferred to partitional methods. However, determining the resulting number of clusters in hierarchical clustering requires human expertise to deduce this from the dendrogram and this represents a major challenge in making a fully automatic system such as the ones required for decision support in Industry 4.0. This research proposes a general criterion to perform the cut of a dendrogram automatically, by comparing six original criteria based on the Calinski-Harabasz index. The performance of each criterion on 95 real-life dendrograms of different topologies is evaluated against the number of classes proposed by the experts and a winner criterion is determined. This research is framed in a bigger project to build an Intelligent Decision Support system to assess the performance of 3D printers based on sensor data in real-time, although the proposed criteria can be used in other real applications of hierarchical clustering.The methodology is applied to a real-life dataset from the 3D printers and the huge reduction in CPU time is also shown by comparing the CPU time before and after this modification of the entire clustering method. It also reduces the dependability on human-expert to provide the number of clusters by inspecting the dendrogram. Further, such a process allows applying hierarchical clustering in an automatic mode in real-life industrial applications and allows the continuous monitoring of real 3D printers in production, and helps in building an Intelligent Decision Support System to detect operational modes, anomalies, and other behavioral patterns.

previous article Supervised learning in the presence of concept drift: a modelling framework

next article Continually trained life-long classification

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Bruzzese D, Vistocco D (2010) Cutting the dendrogram through permutation tests. In: Proceedings of COMPSTAT’2010, pp 847–854

Bruzzese D, Vistocco D (2015) Despota: dendrogram slicing through a pemutation test approach. J Classif 32(2):285–304MathSciNetCrossRef

Caliński T, Harabasz J (1974) A dendrite method for cluster analysis. Commun Stat-Theory Methods 3(1):1–27MathSciNetCrossRef

Cowgill MC, Harvey RJ, Watson LT (1999) A genetic algorithm approach to cluster analysis. Comput Math Appl 37(7):99–108MathSciNetCrossRef

Davies DL, Bouldin DW (1979) A cluster separation measure. IEEE Trans Pattern Anal Mach Intell PAMI-1 (2):224–227

Dunn J (1974) A graph theoretic analysis of pattern classification via tamura’s fuzzy relation. IEEE Trans Syst Man Cybern 3:310–313CrossRef

Dunn JC (1973) A fuzzy relative of the isodata process and its use in detecting compact well-separated clusters. J Cybern 3(3):32–57. https://doi.org/10.1080/01969727308546046MathSciNetCrossRefMATH

Ferraretti D, Gamberoni G, Lamma E (2009) Automatic cluster selection using index driven search strategy. In: Congress of the Italian Association for artificial intelligence, Springer, pp 172–181

Gibert K, Marti-Puig P, Cusidó J, Solé-Casals J et al (2018) Identifying health status of wind turbines by using self organizing maps and interpretation-oriented post-processing tools. Energies 11(4):723CrossRef

10.

Gibert K, Nonell R, Velarde J, Colillas M (2005) Knowledge discovery with clustering: impact of metrics and reporting phase by using klass. Neural Netw World 15(4):319

11.

Gibert K, Sànchez-Marrè M, Codina V (2010) Choosing the right data mining technique: classification of methods and intelligent recommendation. Fifth international Congress on environmental modelling and software

12.

Gibert K, Sànchez-Marrè M, Izquierdo J (2016) A survey on pre-processing techniques: relevant issues in the context of environmental data mining. AI Commun 29(6):627–663MathSciNetCrossRef

13.

Gibert K, Sevilla-Villanueva B, Sànchez-Marrè M (2016) The role of significance tests in consistent interpretation of nested partitions. J Comput Appl Math 292:623–633. https://doi.org/10.1016/j.cam.2015.01.031MathSciNetCrossRefMATH

14.

Guha S, Rastogi R, Shim K (1998) Cure: an efficient clustering algorithm for large databases. ACM Sigmod Record 27(2):73–84CrossRef

15.

Hermann M, Pentek T, Otto B (2016) Design principles for industrie 4.0 scenarios. In: 2016 49th Hawaii international conference on system sciences (HICSS), pp 3928–3937. IEEE

16.

HP: Hp multi jet fusion technology (2020) Technical whitepaper. https://www8.hp.com/us/en/printers/3d-printers/products/multi-jet-technology.html. Accessed 30 May 2020

17.

Jain AK, Murty MN, Flynn PJ (1999) Data clustering: a review. ACM Comput Surv (CSUR) 31(3):264–323CrossRef

18.

Johnson SC (1967) Hierarchical clustering schemes. Psychometrika 32(3):241–254CrossRef

19.

Jung Y, Park H, Du DZ, Drake BL (2003) A decision criterion for the optimal number of clusters in hierarchical clustering. J Glob Optim 25(1):91–111MathSciNetCrossRef

20.

Karna A, Gibert K (2019) Using hierarchical clustering to understand behavior of 3d printer sensors. In: International workshop on self-organizing maps, Springer, pp 150–159

21.

Karna A, Gibert K. Bootstrap cure: a novel clustering approach forsensor data. State of the art on sensor data scienceand application to 3d printing industry. Computers in Industry (Submitted)

22.

Liu Y, Wu X, Shen Y (2011) Automatic clustering using genetic algorithms. Appl Math comput 218(4):1267–1279MathSciNetMATH

23.

Miller GA (1956) The magical number seven, plus or minus two: some limits on our capacity for processing information. Psychol Rev 63(2):81CrossRef

24.

Milligan GW (1981) A monte carlo study of thirty internal criterion measures for cluster analysis. Psychometrika 46(2):187–199CrossRef

25.

Milligan GW, Cooper MC (1985) An examination of procedures for determining the number of clusters in a data set. Psychometrika 50(2):159–179CrossRef

26.

Nale SB, Kalbande AG (2015) A review on 3d printing technology. Int J Innov Emerg Res Eng 2(9):2394–5494

27.

Rodas J, Gibert K, Rojo JE (2001) Electroshock effects identification using classification based on rules. In: International symposium on medical data analysis, Springer, pp 238–244

28.

Rousseeuw PJ (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 20:53–65CrossRef

29.

Rüßmann M, Lorenz M, Gerbert P, Waldner M, Justus J, Engel P, Harnisch M (2015) Industry 4.0: the future of productivity and growth in manufacturing industries. Boston Consulting Group 9(1):54–89

30.

Saaty TL, Ozdemir MS (2003) Why the magic number seven plus or minus two. Math Comput Model 38(3–4):233–244MathSciNetCrossRef

31.

Salvador S, Chan P (2004) Determining the number of clusters/segments in hierarchical clustering/segmentation algorithms. In: 16th IEEE international conference on tools with artificial intelligence, pp 576–584. IEEE

32.

Sevilla-Villanueva B, Gibert K, Sànchez-Marrè M (2016) Using cvi for understanding class topology in unsupervised scenarios. In: Conference of the Spanish association for artificial intelligence, Springer, pp 135–149

33.

Sugar CA, James GM (2003) Finding the number of clusters in a dataset: an information-theoretic approach. J Am Stat Assoc 98(463):750–763MathSciNetCrossRef

34.

Tibshirani R, Walther G, Hastie T (2001) Estimating the number of clusters in a data set via the gap statistic. J Royal Stat Soc Ser B (Stat Methodol) 63(2):411–423MathSciNetCrossRef

35.

Ward JH Jr (1963) Hierarchical grouping to optimize an objective function. J Am Stat Assoc 58(301):236–244MathSciNetCrossRef

36.

Wong VK, Hernandez A (2012) A review of additive manufacturing. ISRN Mech Eng 2012:1–10. https://doi.org/10.5402/2012/208760CrossRef

37.

Yang Y, Chen K (2010) Temporal data clustering via weighted clustering ensemble with different representations. IEEE Trans Knowl Data Eng 23(2):307–320CrossRef

38.

Yang Y, Jiang J (2018) Adaptive bi-weighting toward automatic initialization and model selection for hmm-based hybrid meta-clustering ensembles. IEEE Trans Cybern 49(5):1657–1668MathSciNetCrossRef

39.

Zhou S, Xu Z, Liu F (2016) Method for determining the optimal number of clusters based on agglomerative hierarchical clustering. IEEE Trans Neural Netw Learn Syst 28(12):3007–3017MathSciNetCrossRef

Title: Automatic identification of the number of clusters in hierarchical clustering
Authors: Ashutosh Karna
Karina Gibert
Publication date: 13-03-2021
Publisher: Springer London
Published in: Neural Computing and Applications / Issue 1/2022
Print ISSN: 0941-0643
Electronic ISSN: 1433-3058
DOI: https://doi.org/10.1007/s00521-021-05873-3

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Springer Professional "Wirtschaft+Technik"

Other articles of this Issue 1/2022

Self-organizing mappings on the flag manifold with applications to hyper-spectral image data analysis

Memristor-based circuit implementation of Competitive Neural Network based on online unsupervised Hebbian learning rule for pattern recognition

Asset criticality and risk prediction for an effective cybersecurity risk management of cyber-physical system

SOM-based aggregation for graph convolutional neural networks

A conjugate gradient algorithm based on double parameter scaled Broyden–Fletcher–Goldfarb–Shanno update for optimization problems and image restoration

A neural network training algorithm for singular perturbation boundary value problems

Premium Partner