Skip to main content
Top

2015 | OriginalPaper | Chapter

Avalanche: A Hierarchical, Divisive Clustering Algorithm

Authors : Paul K. Amalaman, Christoph F. Eick

Published in: Machine Learning and Data Mining in Pattern Recognition

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Hierarchical clustering has been successfully used in many applications, such as bioinformatics and social sciences. In this paper, we introduce Avalanche, a new top-down hierarchical clustering approach that takes a dissimilarity matrix as its input. Such a tool can be used for applications where the dataset is partitioned based on pairwise distances among the examples, such as taxonomy generation tools and molecular biology applications in which dissimilarity among gene sequences are used as inputs — as opposed to flat file attribute/value pair datasets. The proposed algorithm uses local as well as global information to recursively split data associated with a tree node into two sub-nodes until some predefined termination condition is met. To split a node, initially the example that is furthest away from the other examples — the anti-medoid — is assigned to right sub-node and then additional examples are progressively assigned to this node which are nearest neighbors of the previously added example as long as a given objective function improves. Experimental evaluations done with artificial and real world datasets show that the new approach has improved speed, and obtained comparable clustering results as the well-known UPGMA algorithm on all datasets used in the experiment.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Ao, S.I., Yip, K., Ng, M., Cheung, D., Fong, P.-Y., Melhado, I., Sham, P.C.: Clustag: hierarchical clustering and graph methods for selecting tag SNPs. Bioinformatics 21(8), 1735–1736 (2005)CrossRef Ao, S.I., Yip, K., Ng, M., Cheung, D., Fong, P.-Y., Melhado, I., Sham, P.C.: Clustag: hierarchical clustering and graph methods for selecting tag SNPs. Bioinformatics 21(8), 1735–1736 (2005)CrossRef
2.
3.
go back to reference Boley, D.L.: Principal direction divisive partitioning. Data Min. Knowl. Disc. 2(4), 325–344 (1998)CrossRef Boley, D.L.: Principal direction divisive partitioning. Data Min. Knowl. Disc. 2(4), 325–344 (1998)CrossRef
4.
go back to reference Chitta, R., Narasimha Murty, M.: Two-level k-means clustering algorithm for k–ψψ relationship establishment and linear-time classification. Pattern Recogn. 43(3), 796–804 (2010)MATHCrossRef Chitta, R., Narasimha Murty, M.: Two-level k-means clustering algorithm for k–ψψ relationship establishment and linear-time classification. Pattern Recogn. 43(3), 796–804 (2010)MATHCrossRef
5.
go back to reference Defays, D.: An efficient algorithm for a complete link method. Comput. J. Br. Comput. Soc. 20(4), 364–366 (1977)MATHMathSciNet Defays, D.: An efficient algorithm for a complete link method. Comput. J. Br. Comput. Soc. 20(4), 364–366 (1977)MATHMathSciNet
6.
go back to reference Forgy, E.: Cluster analysis of multivariate data: efficiency versus interpretability of classification. Biometrics 21, 768–780 (1965) Forgy, E.: Cluster analysis of multivariate data: efficiency versus interpretability of classification. Biometrics 21, 768–780 (1965)
7.
go back to reference Gose, E., Johnsonbaugh, R., Jost, S.: Pattern Recognition & Image Analysis. Prentice-Hall, New York (1996) Gose, E., Johnsonbaugh, R., Jost, S.: Pattern Recognition & Image Analysis. Prentice-Hall, New York (1996)
8.
go back to reference Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning; Data Mining, Inference and Prediction, 2nd edn. Springer, New York (2009)MATH Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning; Data Mining, Inference and Prediction, 2nd edn. Springer, New York (2009)MATH
9.
go back to reference Everitt, B., Landau, S., Leese, M.: Cluster Analysis, 4th edn. Arnold, London (2001)MATH Everitt, B., Landau, S., Leese, M.: Cluster Analysis, 4th edn. Arnold, London (2001)MATH
10.
go back to reference Jain, A.K., Dubes, R.C.: Algorithms for Clustering Data. Prentice-Hall advance reference series. Prentice-Hall, Upper Saddle River (1988)MATH Jain, A.K., Dubes, R.C.: Algorithms for Clustering Data. Prentice-Hall advance reference series. Prentice-Hall, Upper Saddle River (1988)MATH
11.
go back to reference Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. 31(3), 264–323 (1999)CrossRef Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. 31(3), 264–323 (1999)CrossRef
12.
go back to reference Murugesan, K., Zhang, J.: Hybrid bisect K-means clustering algorithm. In: 2011 Second International Conference on Business Computing and Global Informatization, pp. 216–219 Murugesan, K., Zhang, J.: Hybrid bisect K-means clustering algorithm. In: 2011 Second International Conference on Business Computing and Global Informatization, pp. 216–219
13.
go back to reference Tamura, K., Stecher, G., Peterson, D., Filipski, A., Kumar, S.: MEGA6: molecular evolutionary genetics analysis version 6.0. Mol. Biol. Evol. 30, 2725–2729 (2013)CrossRef Tamura, K., Stecher, G., Peterson, D., Filipski, A., Kumar, S.: MEGA6: molecular evolutionary genetics analysis version 6.0. Mol. Biol. Evol. 30, 2725–2729 (2013)CrossRef
14.
go back to reference Selim, S.Z., Ismail, M.A.: K-means-type algorithms: a generalized convergence theorem and characterization of local optimality. IEEE Trans. Pattern Anal. Mach. Intell. 6(1), 81–86 (1984)MATHCrossRef Selim, S.Z., Ismail, M.A.: K-means-type algorithms: a generalized convergence theorem and characterization of local optimality. IEEE Trans. Pattern Anal. Mach. Intell. 6(1), 81–86 (1984)MATHCrossRef
15.
go back to reference Savaresi, S.M., Boley, D.L., Bittanti, S., Gazzaniga, G.: Choosing the cluster to split in bisecting divisive clustering algorithms. In: SIAM International Conference on Data Mining (2002) Savaresi, S.M., Boley, D.L., Bittanti, S., Gazzaniga, G.: Choosing the cluster to split in bisecting divisive clustering algorithms. In: SIAM International Conference on Data Mining (2002)
16.
go back to reference Steinbach, M., Karypis, G., Kumar, V. A comparison of document clustering techniques. In: Proceedings of World Text Mining Conference, KDD 2000, Boston (2000) Steinbach, M., Karypis, G., Kumar, V. A comparison of document clustering techniques. In: Proceedings of World Text Mining Conference, KDD 2000, Boston (2000)
17.
go back to reference Sibson, R.: SLINK: an optimally efficient algorithm for the single-link cluster method. Comput. J. Br. Comput. Soc. 16(1), 30–34 (1973)MathSciNet Sibson, R.: SLINK: an optimally efficient algorithm for the single-link cluster method. Comput. J. Br. Comput. Soc. 16(1), 30–34 (1973)MathSciNet
18.
go back to reference Tan, P.-N., Steinbach, M., Kumar, V.: Introduction to Data Mining, 1st edn. Addison-Wesley, Boston (2005) Tan, P.-N., Steinbach, M., Kumar, V.: Introduction to Data Mining, 1st edn. Addison-Wesley, Boston (2005)
19.
go back to reference Ward Jr, J.H.: Hierarchical grouping to optimize an objective function. J. Am. Stat. Assoc. 58, 236–244 (1963)CrossRef Ward Jr, J.H.: Hierarchical grouping to optimize an objective function. J. Am. Stat. Assoc. 58, 236–244 (1963)CrossRef
20.
go back to reference Mertens, S.: Computational the easiest hard problem. In: Percus, A., Istrate, G., Moore, C. (eds.) Complexity and Statistical Physics. Oxford University Press, Oxford (2006) Mertens, S.: Computational the easiest hard problem. In: Percus, A., Istrate, G., Moore, C. (eds.) Complexity and Statistical Physics. Oxford University Press, Oxford (2006)
Metadata
Title
Avalanche: A Hierarchical, Divisive Clustering Algorithm
Authors
Paul K. Amalaman
Christoph F. Eick
Copyright Year
2015
DOI
https://doi.org/10.1007/978-3-319-21024-7_20

Premium Partner