Skip to main content
Top

2016 | OriginalPaper | Chapter

Determining the Number of Clusters Using Multivariate Ranks

Authors : Mohammed Baragilly, Biman Chakraborty

Published in: Recent Advances in Robust Statistics: Theory and Applications

Publisher: Springer India

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Determining number of clusters in a multivariate data has become one of the most important issues in very diversified areas of scientific disciplines. The forward search algorithm is a graphical approach that helps us in this task. The traditional forward search approach based on Mahalanobis distances has been introduced by Hadi (1992), Atkinson (1994), while Atkinson et al. (2004) used it as a clustering method. But like many other Mahalanobis distance-based methods, it cannot be correctly applied to asymmetric distributions and more generally, to distributions which depart from the elliptical symmetry assumption. We propose a new forward search methodology based on spatial ranks, where clusters are grown with one data point at a time sequentially, using spatial ranks with respect to the points already in the subsample. The algorithm starts from a randomly chosen initial subsample. We illustrate with simulated data that the proposed algorithm is robust to the choice of initial subsample and it performs well in different mixture multivariate distributions. We also propose a modified algorithm based on the volume of central rank regions. Our numerical examples show that it produces the best results under elliptic symmetry.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
go back to reference Atkinson AC (1994) Fast very robust methods for the detection of multiple outliers. J Am Stat Assoc 89:1329–1339CrossRefMATH Atkinson AC (1994) Fast very robust methods for the detection of multiple outliers. J Am Stat Assoc 89:1329–1339CrossRefMATH
go back to reference Atkinson AC, Mulira H (1993) The stalactite plot for the detection of multivariate outliers. Stat Comput 3:27–35CrossRef Atkinson AC, Mulira H (1993) The stalactite plot for the detection of multivariate outliers. Stat Comput 3:27–35CrossRef
go back to reference Atkinson AC, Riani M (2012) Discussion on the paper by spiegelhalter, sherlaw-johnson, bardsley, blunt, wood and grigg. J Roy Stat Soc 175 Atkinson AC, Riani M (2012) Discussion on the paper by spiegelhalter, sherlaw-johnson, bardsley, blunt, wood and grigg. J Roy Stat Soc 175
go back to reference Atkinson AC, Riani M, Cerioli A (2004) Exploring multivariate data with the forward search. Springer, NewYorkCrossRefMATH Atkinson AC, Riani M, Cerioli A (2004) Exploring multivariate data with the forward search. Springer, NewYorkCrossRefMATH
go back to reference Atkinson AC, Riani M, Cerioli A (2006) Random start forward searches with envelopes for detecting clusters in multivariate data. Springer, Berlin, pp 163–171 Atkinson AC, Riani M, Cerioli A (2006) Random start forward searches with envelopes for detecting clusters in multivariate data. Springer, Berlin, pp 163–171
go back to reference Azzalini A, Bowman A (1990) A look at some data on the old faithful geyser. J Roy Stat Soc 39(3):357–365MATH Azzalini A, Bowman A (1990) A look at some data on the old faithful geyser. J Roy Stat Soc 39(3):357–365MATH
go back to reference Beale EML (1969) Euclidean cluster analysis. ISI, Voorburg, Netherlands Beale EML (1969) Euclidean cluster analysis. ISI, Voorburg, Netherlands
go back to reference Duda RO, Hart PE (1973) Pattern classification and scene analysis. Wiley, New YorkMATH Duda RO, Hart PE (1973) Pattern classification and scene analysis. Wiley, New YorkMATH
go back to reference Fraley C, Raftery A (2003) Enhanced model-based clustering, density estimation and discriminant analysis: Mclust. J Classif 20(263):286MathSciNetMATH Fraley C, Raftery A (2003) Enhanced model-based clustering, density estimation and discriminant analysis: Mclust. J Classif 20(263):286MathSciNetMATH
go back to reference Gan G, Ma C, Wu J (2007) Data clustering theory, algorithms, and applications. ASA-SIAM series on statistics and applied probability. Philadelphia Gan G, Ma C, Wu J (2007) Data clustering theory, algorithms, and applications. ASA-SIAM series on statistics and applied probability. Philadelphia
go back to reference Gordon AD (1998) Cluster validation. In: C Hayashi KYeae, N Ohsumi (eds) Data science, classification and related methods. Springer, Tokyo, pp 22–39 Gordon AD (1998) Cluster validation. In: C Hayashi KYeae, N Ohsumi (eds) Data science, classification and related methods. Springer, Tokyo, pp 22–39
go back to reference Hadi AS (1992) Identifying multiple outliers in multivariate data. J Roy Stat Soc 54:761–771MathSciNet Hadi AS (1992) Identifying multiple outliers in multivariate data. J Roy Stat Soc 54:761–771MathSciNet
go back to reference Hadi AS, Simonoff JS (1993) Procedures for the identification of multiple outliers in linear models. J Am Stat Assoc 88(424):1264–1272MathSciNetCrossRef Hadi AS, Simonoff JS (1993) Procedures for the identification of multiple outliers in linear models. J Am Stat Assoc 88(424):1264–1272MathSciNetCrossRef
go back to reference Krzanowski WJ, Lai YT (1985) A criterion for determining the number of clusters in a data set. Biometrics 44(23):34MathSciNetMATH Krzanowski WJ, Lai YT (1985) A criterion for determining the number of clusters in a data set. Biometrics 44(23):34MathSciNetMATH
go back to reference Marriott FHC (1971) Practical problems in a method of cluster analysis. Biometrics 27:501–514CrossRef Marriott FHC (1971) Practical problems in a method of cluster analysis. Biometrics 27:501–514CrossRef
go back to reference Milligan GW, Cooper MC (1985) An examination of procedures for determining the number of clusters in a data set. Psychometrika 50:159–179CrossRef Milligan GW, Cooper MC (1985) An examination of procedures for determining the number of clusters in a data set. Psychometrika 50:159–179CrossRef
go back to reference Mojena R (1977) Hierarchical grouping methods and stopping rules: an evaluation. Comput J 20:359–363CrossRefMATH Mojena R (1977) Hierarchical grouping methods and stopping rules: an evaluation. Comput J 20:359–363CrossRefMATH
go back to reference Overall JE, Magee KN (1992) Replication as a rule for determining the number of clusters in hierarchical cluster analysis. Appl Psychol Measur 16:119–128CrossRef Overall JE, Magee KN (1992) Replication as a rule for determining the number of clusters in hierarchical cluster analysis. Appl Psychol Measur 16:119–128CrossRef
go back to reference Serfling R (2002) A depth function and a scale curve based on spatial quantiles. In: Dodge Y (ed) Statistical data analysis based on the L1-norm and related methods. Birkhaeuser, pp 25–38 Serfling R (2002) A depth function and a scale curve based on spatial quantiles. In: Dodge Y (ed) Statistical data analysis based on the L1-norm and related methods. Birkhaeuser, pp 25–38
go back to reference Sugar CA, James GM (2003) Finding the number of clusters in a data set: an information theoretic approach. J Am Stat Assoc 98:750–763MathSciNetCrossRefMATH Sugar CA, James GM (2003) Finding the number of clusters in a data set: an information theoretic approach. J Am Stat Assoc 98:750–763MathSciNetCrossRefMATH
go back to reference Tibshirani R, Walther G, Hastie T (2001) Estimating the number of clusters in a data set via the gap statistic. J Roy Stat Soc 63:411–423MathSciNetCrossRefMATH Tibshirani R, Walther G, Hastie T (2001) Estimating the number of clusters in a data set via the gap statistic. J Roy Stat Soc 63:411–423MathSciNetCrossRefMATH
Metadata
Title
Determining the Number of Clusters Using Multivariate Ranks
Authors
Mohammed Baragilly
Biman Chakraborty
Copyright Year
2016
Publisher
Springer India
DOI
https://doi.org/10.1007/978-81-322-3643-6_2