Skip to main content
Log in

Efficient algorithms for divisive hierarchical clustering with the diameter criterion

  • Published:
Journal of Classification Aims and scope Submit manuscript

Abstract

Divisive hierarchical clustering algorithms with the diameter criterion proceed by recursively selecting the cluster with largest diameter and partitioning it into two clusters whose largest diameter is smallest possible. We provide two such algorithms with complexitiesO(\(\overline M \) N 2) andO(N 2logN) respectively, where\(\overline M \) denotes the maximum number of clusters in a partition andN the number of entities to be clustered. The former algorithm, an efficient implementation of an algorithm of Hubert, allows to find all partitions into at most\(\overline M \) clusters and is inO(N 2) for fixed\(\overline M \). Moreover, if in each partitioning the size of the largest cluster is bounded byp times the number of entities in the set to be partitioned, with 1/2<=p<1, it provides a complete hierarchy of partitionsO(N 2 logN) time. The latter algorithm, a refinement of an algorithm of Rao allows to build a complete hierarchy of partitions inO(N 2 logN) time without any restriction. Comparative computational experiments with both algorithms and with an agglomerative hierarchical algorithm of Benzécri are reported.

Résumé

Les algorithmes de classification hiérarchique descendante utilisant le critère du diamètre, sélectionnent récursivement la classe de plus grand diamètre et la partitionnent en deux classes, dont le plus grand diamètre est le plus, petit possible. Nous proposons deux tels algorithmes, avec des complexités enO (\(\overline M \)N2) etO(N 2 logN) respectivement, où\(\overline M \) désigne le nombre maximum de classes d'une partition etN le nombre d'objets à classifier. Le premier algorithme, une implantation d'un algorithme de Hubert, permet de construire des partitions avec au plus\(\overline M \) classes et est enO(N 2) pour\(\overline M \) fixé. De plus, si dans chaque bipartition le nombre d'objets de la plus grande classe, est borné parp fois le nombre d'objets de l'ensemble à partitionner, où 1/2≤p<1, cet algorithme permet de construire une hiérarchie complète de partitions en tempsO(N 2 logN). Le second algorithme, un raffinement d'un algorithme de Rao, permet de construire une hiérarchie complète de partitions en tempsO(N 2 logN) sans aucune restriction On présente également des résultats de calcul comparatifs pour les deux algorithmes et pour l'algorithme de classification hiérarchique ascendante de Benzécri.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • AHO, A. V., HOPCROFT, J. F., and ULLMAN, J. D. (1974).The Design and Analysis of Computer Algorithms, Reading, Massachusetts: Addison-Wesley.

    MATH  Google Scholar 

  • BENZÉCRI, J. P. (1973),L'Analyse de Données. 1, La Taxinomie, Paris: Dunod.

    Google Scholar 

  • BENZÉCRI, J. P. (1982), “Construction d'une classification ascendante hiérarchique par la recherche en chaîne des voisins réciproques,”Les Cahiers de l'Analyse des Données, (VII 2, 209–218.

    Google Scholar 

  • BRUCKER, P. (1978), “On the Complexity of Clustering Problems,” inOptimization and Operations Research, Lecture Notes in Economics and Mathematical Systems, 157, Eds. M. Beckmann and H. Kunzi, Heidelberg: Springer-Verlag, 45–54.

    Google Scholar 

  • DAY, H. E., and EDELSBRUNNER, H. (1984), “Effcient Algorithrns for Agglomerative Hierarchical Clustering Methods,”Journal of Classification, 1, 7–24.

    Article  MATH  Google Scholar 

  • DEFAYS, D. (1977), “An Efficient Algorithm for a Complete Link Method,”The Computer Journal, 20(4) 364–366.

    Article  MathSciNet  MATH  Google Scholar 

  • DELATTRE, M., and HANSEN, P. (1980), “Bicriterion Cluster Analysis,”IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-2(4), 277–291.

    Article  MATH  Google Scholar 

  • DIDAY, E., et al. (1979),Optimisation en Classification Automatique, Le Chesnay: INRIA.

    MATH  Google Scholar 

  • DIJKSTRA, E. W. (1959), “A Note on Two Problems in Connexion with Graphs,”Numerische Mathematik, 1, 269–271.

    Article  MATH  MathSciNet  Google Scholar 

  • DORING, V., HANSEN, P., and JAUMARD, B. (1990), “Divisive Hierarchical Clustering with the Diameter Criterion: Hubet's Algorithm,”Cahiers du GERAD (forthcoming).

  • DORING, V., HANSEN, P., and JAUMARD, B. (1990), “Divisive Hierarchical Clustering with the Diameter Criterion: Algorithm Alltrees,”Cahiers du GERAD (forthcoming).

  • EDWARDS, A. F. S., and CAVALLI-SFORZA, L. L. (1965), “A Method for Cluster Analysis,”Biometrics, 21(2), 362–375.

    Article  Google Scholar 

  • FISHER, R. A. (1936), “The Use of Multiple Measurements in Taxonomic Problems,”Annals of Eugenics, VII(II), 179–188.

    Google Scholar 

  • GAREY, M. R., and JOHNSON, D. S. (1979),Computers and Intractability: A Guide to the Theory of NP-Completeness. San Francisco: Freeman.

    MATH  Google Scholar 

  • GROTSCHEL, M., and WAKABAYASHI, Y. (1989), “A Cutting-plane Algorithm for a Clustering Problem,”Mathematical Programing, 45(1),Series B, 59–96.

    Article  MathSciNet  Google Scholar 

  • GUÉNOCHE, A. (1989), “Partitions with Minimum Diameter,” Conference of the I.F.C.S., Charlottesville.

  • HANSEN, P., and DELATTRE, M. (1978), “Complete Link Cluster Analysis by Graph Coloring,”Journal of the American Statistical Association, 73, 397–403.

    Article  Google Scholar 

  • HANSEN, P., and JAUMARD, B. (1987), “Minimum Sum of Diameters Clustering,”Journal of Classification, 4, 215–226.

    Article  MATH  MathSciNet  Google Scholar 

  • HARMAN, H. H. (1967),Modern Factor Analysis Chicago: University of Chicago Press.

    MATH  Google Scholar 

  • HUBERT, L. (1973), “Monotone Invariant Clustering Procedures,”Psychometrika, 38(1), 47–62.

    Article  MATH  Google Scholar 

  • JAMBU, M. (1978),Classification automatique pour l'analyse des données, Paris: Dunod (Cluster Analysis and Data Analysis, Amsterdam: North Holland, 1983).

    Google Scholar 

  • JUAN, J. (1982), “Programme de classification hiérarchique par l'algorithme de la recherche en chaîne des voisns réciproques,”Les Cahiers de l'Analyse des Données, VII(2), 219–225.

    Google Scholar 

  • KRUSKAL, J.B. (1956), “On the Shortest Spanning Subtree of a Graph,”Proceedings of the American Mathematical Society, 7, 48–50.

    Article  MathSciNet  Google Scholar 

  • LANCE, G. N., and WILLIAMS, W. T. (1967), “Mixed-data Classificatory Systems: I. Agglomerative Systems,”Australian Computer Journal, 1, 15–20.

    Google Scholar 

  • LECLERC, B. (1986), “Caractérisation, construction et dénombrement des ultramétriques supérieures minimales,”Statistique et Analyse des Données, 11(2), 26–50.

    MATH  MathSciNet  Google Scholar 

  • McNAUGHTON-SMITH, P., WILLIAMS, W. T., DALE, M. B., and MOCKETT, L. G. (1964), “Dissimilarity Analysis,” Nature, 202, 1034–1035.

    Article  MATH  Google Scholar 

  • MONMA, C., and SURI, S. (1989), “Partitioning Points and Graphs to Minimize the Maximum or the Sum of Diameters,”Proceedings of the Sixth International Conference on the Theory and Applications of Graphs, New York: Wiley.

    Google Scholar 

  • MURTAGH, F. (1983), “A Survey of Recent Advances in Hierarchical Algorithms,”The Computer Journal, 26(4), 354–359.

    MATH  Google Scholar 

  • MURTAGH, F. (1984), “Complexities of Hierarchic Clustering Algorithms: State of the Art,”Computational Statistics Quarterly, 1(2), 101–113.

    MATH  MathSciNet  Google Scholar 

  • PRIM, R.C. (1957), “Shortest Connection Networks and Some Generalizations”Bell System Technical Journal, 36, 1389–1401.

    Google Scholar 

  • RAO, M. R. (1971), “Cluster Analysis and Mathematical Programming,”Journal of the American Statistical Association, 66, 622–626.

    Article  MATH  Google Scholar 

  • VICHI, M. (1985), “On a Flexible and Computationally Feasible Divisive Clustering Technique,”Revista di Statistics, 18(4), 200–208.

    Google Scholar 

  • WELCH, W.J. (1982), “Algorithmic Complexity: Three NP-Hard Problems in Computational Statistics,”Journal of Statistical Computing and Simulation, 15, 17–25.

    MATH  MathSciNet  Google Scholar 

  • WILLIAMS, J.W.J. (1964), “Algorithm 232, Heapsort,”Communications of the Association for Computing Machinery, 7(6), 347–348.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Guénoche, A., Hansen, P. & Jaumard, B. Efficient algorithms for divisive hierarchical clustering with the diameter criterion. Journal of Classification 8, 5–30 (1991). https://doi.org/10.1007/BF02616245

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF02616245

Keywords

Navigation