Top

Published in:

2015 | OriginalPaper | Chapter

Avalanche: A Hierarchical, Divisive Clustering Algorithm

Authors : Paul K. Amalaman, Christoph F. Eick

Published in: Machine Learning and Data Mining in Pattern Recognition

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Hierarchical clustering has been successfully used in many applications, such as bioinformatics and social sciences. In this paper, we introduce Avalanche, a new top-down hierarchical clustering approach that takes a dissimilarity matrix as its input. Such a tool can be used for applications where the dataset is partitioned based on pairwise distances among the examples, such as taxonomy generation tools and molecular biology applications in which dissimilarity among gene sequences are used as inputs — as opposed to flat file attribute/value pair datasets. The proposed algorithm uses local as well as global information to recursively split data associated with a tree node into two sub-nodes until some predefined termination condition is met. To split a node, initially the example that is furthest away from the other examples — the anti-medoid — is assigned to right sub-node and then additional examples are progressively assigned to this node which are nearest neighbors of the previously added example as long as a given objective function improves. Experimental evaluations done with artificial and real world datasets show that the new approach has improved speed, and obtained comparable clustering results as the well-known UPGMA algorithm on all datasets used in the experiment.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

previous chapter Semi-Supervised Stream Clustering Using Labeled Data Points

next chapter Author Attribution of Email Messages Using Parse-Tree Features

Ao, S.I., Yip, K., Ng, M., Cheung, D., Fong, P.-Y., Melhado, I., Sham, P.C.: Clustag: hierarchical clustering and graph methods for selecting tag SNPs. Bioinformatics 21(8), 1735–1736 (2005)CrossRef

Bien, J., Tibshirani, R.: Hierarchical clustering with prototypes via minimax linkage. J. Am. Stat. Assoc. 106, 1075–1084 (2011)MATHMathSciNetCrossRef

Boley, D.L.: Principal direction divisive partitioning. Data Min. Knowl. Disc. 2(4), 325–344 (1998)CrossRef

Chitta, R., Narasimha Murty, M.: Two-level k-means clustering algorithm for k–ψψ relationship establishment and linear-time classification. Pattern Recogn. 43(3), 796–804 (2010)MATHCrossRef

Defays, D.: An efficient algorithm for a complete link method. Comput. J. Br. Comput. Soc. 20(4), 364–366 (1977)MATHMathSciNet

Forgy, E.: Cluster analysis of multivariate data: efficiency versus interpretability of classification. Biometrics 21, 768–780 (1965)

Gose, E., Johnsonbaugh, R., Jost, S.: Pattern Recognition & Image Analysis. Prentice-Hall, New York (1996)

Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning; Data Mining, Inference and Prediction, 2nd edn. Springer, New York (2009)MATH

Everitt, B., Landau, S., Leese, M.: Cluster Analysis, 4th edn. Arnold, London (2001)MATH

10.

Jain, A.K., Dubes, R.C.: Algorithms for Clustering Data. Prentice-Hall advance reference series. Prentice-Hall, Upper Saddle River (1988)MATH

11.

Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. 31(3), 264–323 (1999)CrossRef

12.

Murugesan, K., Zhang, J.: Hybrid bisect K-means clustering algorithm. In: 2011 Second International Conference on Business Computing and Global Informatization, pp. 216–219

13.

Tamura, K., Stecher, G., Peterson, D., Filipski, A., Kumar, S.: MEGA6: molecular evolutionary genetics analysis version 6.0. Mol. Biol. Evol. 30, 2725–2729 (2013)CrossRef

14.

Selim, S.Z., Ismail, M.A.: K-means-type algorithms: a generalized convergence theorem and characterization of local optimality. IEEE Trans. Pattern Anal. Mach. Intell. 6(1), 81–86 (1984)MATHCrossRef

15.

Savaresi, S.M., Boley, D.L., Bittanti, S., Gazzaniga, G.: Choosing the cluster to split in bisecting divisive clustering algorithms. In: SIAM International Conference on Data Mining (2002)

16.

Steinbach, M., Karypis, G., Kumar, V. A comparison of document clustering techniques. In: Proceedings of World Text Mining Conference, KDD 2000, Boston (2000)

17.

Sibson, R.: SLINK: an optimally efficient algorithm for the single-link cluster method. Comput. J. Br. Comput. Soc. 16(1), 30–34 (1973)MathSciNet

18.

Tan, P.-N., Steinbach, M., Kumar, V.: Introduction to Data Mining, 1st edn. Addison-Wesley, Boston (2005)

19.

Ward Jr, J.H.: Hierarchical grouping to optimize an objective function. J. Am. Stat. Assoc. 58, 236–244 (1963)CrossRef

20.

Mertens, S.: Computational the easiest hard problem. In: Percus, A., Istrate, G., Moore, C. (eds.) Complexity and Statistical Physics. Oxford University Press, Oxford (2006)

21.

The Joint Genome Institute: https://img.jgi.doe.gov/cgi-bin/w/main.cgi (2015)

Title: Avalanche: A Hierarchical, Divisive Clustering Algorithm
Authors: Paul K. Amalaman
Christoph F. Eick
Publisher: Springer International Publishing
Book: Machine Learning and Data Mining in Pattern Recognition
Print ISBN: 978-3-319-21023-0

Electronic ISBN: 978-3-319-21024-7

Copyright Year: 2015
DOI: https://doi.org/10.1007/978-3-319-21024-7_20

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner