Skip to main content
Top

2019 | OriginalPaper | Chapter

12. Cluster Analysis

Author : Thomas Cleff

Published in: Applied Statistics and Multivariate Data Analysis for Business and Economics

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Before we turn to the subject of cluster analysis, think for a moment about the meaning of the word cluster. The term refers to a group of individuals or objects that converge around a certain point and are thus closely related in their position. In astronomy there are clusters of stars; in chemistry, clusters of atoms. Economic research often relies on techniques that consider groups within a total population. For instance, firms that engage in target group marketing must first divide consumers into segments, or clusters of potential customers. Indeed, in many contexts researchers and economists need accurate methods for delineating homogenous groups within a set of observations. Groups may contain individuals (such as people or their behaviours) or objects (such as firms, products, or patents). This chapter thus takes a cue from Goethe’s Faust (1987, Line 1943–45): “You soon will [understand]; just carry on as planned/You’ll learn reductive demonstrations/And all the proper classifications”.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
1
By contrast, divisive clustering methods start by collecting all observations as one cluster. They proceed by splitting the initial cluster into two groups and continue by splitting the subgroups, repeating this process down the line. The main disadvantage of divisive methods is their high level of computational complexity. With agglomerative methods, the most complicated set of calculations comes in the first step: for n observations, a total of n(n−1)/2 distance measurements must be performed. With divisive methods containing two non-empty clusters, there are a total of 2(n−1)−1 possible calculations. The greater time required for calculating divisive hierarchical clusters explains why this method is used infrequently by researchers and not included in standard statistics software.
 
2
In the case of two dimensions, the Euclidean distance and the Pythagorean theorem provide the same results.
 
3
In standardization – sometimes also called z-transform – the mean of x is subtracted from each x variable value and the result divided by the standard deviation (S) of the x variable: \( {z}_i=\frac{x_i-\overline{x}}{S} \).
 
4
Say we wanted to dichotomize calories per fl. oz. using three calorie variables. Calorie variable 1 assumes the value of one when the calories in a beer lie between 60 and 99.99 calories, otherwise it is equal to zero. Calorie variable 2 assumes the value one when the calories in a beer lie between 100 and 139.99 calories, otherwise it is equal to zero. Calorie variable 3 assumes the value one when the calories in a beer lie between 140 and 200 calories; otherwise it is equal to zero.
 
5
The centroid is determined by calculating the mean for every variable for all observations of each cluster separately.
 
6
Euclidean distance of #9 to centroid CLU#1: \( \sqrt{{\left(-0.571-\left(-0.401\right)\right)}^2+{\left(0.486-\left(-0.563\right)\right)}^2}=1.06 \).
 
7
Euclidean distance of #9 to centroid CLU#2: \( \sqrt{{\left(1.643-\left(-0.401\right)\right)}^2+{\left(0.719-\left(-0.563\right)\right)}^2}=2.41 \).
 
8
Euclidean distance of #9 to centroid CLU#3: \( \sqrt{{\left(-0.401-\left(-0.401\right)\right)}^2+{\left(-1.353-\left(-0.563\right)\right)}^2}=0.79 \).
 
Literature
go back to reference Backhaus, K., Erichson, B., Plinke, W., Weiber, R. (2016). Multivariate Analysemethoden. Eine Anwendungsorientierte Einführung, 14th Edition. Berlin, Heidelberg: Springer.CrossRef Backhaus, K., Erichson, B., Plinke, W., Weiber, R. (2016). Multivariate Analysemethoden. Eine Anwendungsorientierte Einführung, 14th Edition. Berlin, Heidelberg: Springer.CrossRef
go back to reference Berg, S. (1981). Optimalität bei Cluster-Analysen, Münster: Dissertation, Fachbereich Wirtschafts- und Sozialwissenschaften, Westfälische Wilhelms-Universität Münster. Berg, S. (1981). Optimalität bei Cluster-Analysen, Münster: Dissertation, Fachbereich Wirtschafts- und Sozialwissenschaften, Westfälische Wilhelms-Universität Münster.
go back to reference Bühl, A. (2019). SPSS: Einführung in die moderne Datenanalyse ab SPSS 25, 16th Edition. Munich: Pearson Studium. Bühl, A. (2019). SPSS: Einführung in die moderne Datenanalyse ab SPSS 25, 16th Edition. Munich: Pearson Studium.
go back to reference Everitt, B.S., Rabe-Hesketh, S. (2004). A Handbook of Statistical Analyses Using Stata, 3rd Edition. Chapman & Hall: Boca Raton. Everitt, B.S., Rabe-Hesketh, S. (2004). A Handbook of Statistical Analyses Using Stata, 3rd Edition. Chapman & Hall: Boca Raton.
go back to reference Goethe, J.W. (1987). Faust Part One. Translated with an Introduction and Notes by David Luke. New York: Oxford University Press. Goethe, J.W. (1987). Faust Part One. Translated with an Introduction and Notes by David Luke. New York: Oxford University Press.
go back to reference Janssens, W., Wijnen, K., Pelsmacker de, P., Kenvove van, P. (2008). Marketing Research with SPSS. Essex: Pearson Education. Janssens, W., Wijnen, K., Pelsmacker de, P., Kenvove van, P. (2008). Marketing Research with SPSS. Essex: Pearson Education.
go back to reference Kaufman, L., Rousseeuw, P.J. (1990). Finding Groups in Data. New York: Wiley.CrossRef Kaufman, L., Rousseeuw, P.J. (1990). Finding Groups in Data. New York: Wiley.CrossRef
go back to reference Mooi, E., Sarstedt, M. (2019). A Concise Guide to Market Research. The Process, Data, and Methods Using IBM SPSS Statistics, 3rd Edition. Berlin, Heidelberg: Springer. Mooi, E., Sarstedt, M. (2019). A Concise Guide to Market Research. The Process, Data, and Methods Using IBM SPSS Statistics, 3rd Edition. Berlin, Heidelberg: Springer.
go back to reference Ward, J. H., Jr. (1963). Hierarchical grouping to optimize an objective function. Journal of the American Statistical Association, 58, 236–244.CrossRef Ward, J. H., Jr. (1963). Hierarchical grouping to optimize an objective function. Journal of the American Statistical Association, 58, 236–244.CrossRef
Metadata
Title
Cluster Analysis
Author
Thomas Cleff
Copyright Year
2019
Publisher
Springer International Publishing
DOI
https://doi.org/10.1007/978-3-030-17767-6_12