Skip to main content
Erschienen in:
Buchtitelbild

2016 | OriginalPaper | Buchkapitel

Using Distance Graphs to Find Meaningful Levels of a Hierarchical Sequence Prior to Performing a Cluster Analysis

verfasst von : David Allen Olsen

Erschienen in: Informatics in Control, Automation and Robotics

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

By unwinding the assumptions that underlie the standard complete linkage method, the size of a hierarchical sequence reverts back from n levels to \(\frac{n \cdot (n - 1)}{2} + 1\) levels, and the time complexity to construct cluster sets becomes \(O(n^{4})\). To resolve this problem, distance graphs are used to find meaningful levels of an \(\frac{n \cdot (n-1)}{2} + 1\)-level hierarchical sequence prior to performing a cluster analysis. By doing so, it is possible to construct only the cluster sets for meaningful levels and reduce the time complexity from \(O(n^{4})\) to \(O(ln^2)\). Increasing the dimensionality of the data points helps reveal inherent structure in noisy data, which is necessary for finding meaningful levels. The means is theoretically validated. Empirical results from three experiments show that the means does not impose structure on a data set, that it is easy to use, and that it can identify cluster sets that have real world meaning.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
A “meaningful cluster set” refers to a cluster set that can have real world meaning. Where there is good inherent structure, a “meaningful level” refers to a level of a hierarchical sequence at which a new configuration of clusters has finished forming. These definitions appear to be synonymous for \(\frac{n \cdot (n - 1)}{2} + 1\)-level hierarchical sequences. The cluster set that is constructed for a meaningful level is a meaningful cluster set, so these terms are used interchangeably.
 
2
The model for a measured value is measured value = true value + bias (accuracy) + random error (statistical uncertainty or precision) [12]. This model has substantially broader applicability than the taxonomic model that is the basis for the standard complete linkage method.
 
3
These data sets are used by many cyber-physical systems and include time series. For example, a typical automobile has about 500 sensors; a small, specialty brewery has about 600 sensors; and a small power plant has about 1100 sensors. The new clustering method may accommodate large-n, large-m data sets as well, and future work includes using multicore and/or heterogeneous processors to parallelize parts of the new clustering method, but large-n, large-m data sets are not the focus here.
 
4
In real world terms, this is the same as calibrating the sensors.
 
5
Attenuating the effects of noise refers to reducing the effects of noise on cluster construction.
 
6
Seven packets from mote 9 were dropped during transmission.
 
Literatur
1.
Zurück zum Zitat Murtagh, F.: The remarkable simplicity of very high dimensional data: application of model-based clustering. J. Classif. 26, 249–277 (2009)CrossRefMATHMathSciNet Murtagh, F.: The remarkable simplicity of very high dimensional data: application of model-based clustering. J. Classif. 26, 249–277 (2009)CrossRefMATHMathSciNet
2.
Zurück zum Zitat Everitt, B., Landau, S., Leese, M., Stahl, D.: Cluster Analysis (5th edn.). Wiley (2011) Everitt, B., Landau, S., Leese, M., Stahl, D.: Cluster Analysis (5th edn.). Wiley (2011)
3.
Zurück zum Zitat Anderberg, M.: Cluster Analysis for Applications. Academic Press (1973) Anderberg, M.: Cluster Analysis for Applications. Academic Press (1973)
4.
Zurück zum Zitat Kirk, D., Hwu, W.: Programming Massively Parallel Processors (2d edn.). Elsevier (2013) Kirk, D., Hwu, W.: Programming Massively Parallel Processors (2d edn.). Elsevier (2013)
5.
Zurück zum Zitat Jain, A., Dubes, R.: Algorithms for Clustering Data. Prentice Hall (1988) Jain, A., Dubes, R.: Algorithms for Clustering Data. Prentice Hall (1988)
6.
Zurück zum Zitat Berkhin, P.: A survey of clustering data mining techniques. In: Kogan, J., Nicholas, C., Teboulle, M. (eds.) Grouping Multidimensional Data: Recent Advances in Clustering, ch. 2, pp. 25–71. Springer (2006) Berkhin, P.: A survey of clustering data mining techniques. In: Kogan, J., Nicholas, C., Teboulle, M. (eds.) Grouping Multidimensional Data: Recent Advances in Clustering, ch. 2, pp. 25–71. Springer (2006)
7.
Zurück zum Zitat Lance, G., Williams, W.: A general theory of classificatory sorting strategies II clustering systems. Comput. J. 10(3), 271–277 (1967)CrossRef Lance, G., Williams, W.: A general theory of classificatory sorting strategies II clustering systems. Comput. J. 10(3), 271–277 (1967)CrossRef
8.
Zurück zum Zitat Olsen, D.: INCLude Hierarchical Clustering: A Hierarchical Clustering Method Based Solely on Interpoint Distances. Technical report, Minneapolis (2014) Olsen, D.: INCLude Hierarchical Clustering: A Hierarchical Clustering Method Based Solely on Interpoint Distances. Technical report, Minneapolis (2014)
9.
Zurück zum Zitat Johnson, R., Wichern, D.: Applied Multivariate Statistical Analysis (5th edn.). Prentice Hall (2002) Johnson, R., Wichern, D.: Applied Multivariate Statistical Analysis (5th edn.). Prentice Hall (2002)
10.
Zurück zum Zitat Isermann, R.: Fault-Diagnosis Systems: An Introduction from Fault Detection to Fault Tolerance. Springer (2006) Isermann, R.: Fault-Diagnosis Systems: An Introduction from Fault Detection to Fault Tolerance. Springer (2006)
12.
Zurück zum Zitat Navidi, W.: Statistics for Engineers and Scientists. McGraw-Hill (2006) Navidi, W.: Statistics for Engineers and Scientists. McGraw-Hill (2006)
13.
Zurück zum Zitat Tibshirani, R., Walther, G., Hastie, T.: Estimating the number of clusters in a dataset via the gap statistic. J. Roy. Stat. Soc.: Ser. B (Stat. Methodol.) 63(2), 411–423 (2001) Tibshirani, R., Walther, G., Hastie, T.: Estimating the number of clusters in a dataset via the gap statistic. J. Roy. Stat. Soc.: Ser. B (Stat. Methodol.) 63(2), 411–423 (2001)
14.
Zurück zum Zitat Kim, H., Lee, S.: A semi-supervised document clustering technique for information organization. In: Proceedings of the Ninth International Conference on Information and Knowledge Management (CIKM ’00), pp. 30–37. McLean (2000) Kim, H., Lee, S.: A semi-supervised document clustering technique for information organization. In: Proceedings of the Ninth International Conference on Information and Knowledge Management (CIKM ’00), pp. 30–37. McLean (2000)
15.
Zurück zum Zitat Daniels, K., Giraud-Carrier, C.: Learning the threshold in hierarchical agglomerative clustering. In: Proceedings of the Fifth International Conference on Machine Learning and Applications (ICMLA ’06), pp. 270–278. Orlando (2006) Daniels, K., Giraud-Carrier, C.: Learning the threshold in hierarchical agglomerative clustering. In: Proceedings of the Fifth International Conference on Machine Learning and Applications (ICMLA ’06), pp. 270–278. Orlando (2006)
16.
Zurück zum Zitat Matula, D.: Graph theoretic techniques for cluster analysis algorithms. In: van Ryzin, J. (ed.) Classification and Clustering, pp. 95–129. Academic Press (1977) Matula, D.: Graph theoretic techniques for cluster analysis algorithms. In: van Ryzin, J. (ed.) Classification and Clustering, pp. 95–129. Academic Press (1977)
17.
Zurück zum Zitat Peay, E.: Hierarchical clique structures. Sociometry 37(1), 54–65 (1974)CrossRef Peay, E.: Hierarchical clique structures. Sociometry 37(1), 54–65 (1974)CrossRef
18.
Zurück zum Zitat Peay, E.: Nonmetric grouping: clusters and cliques. Psychometrika 40(3), 297–313 (1975)CrossRefMATH Peay, E.: Nonmetric grouping: clusters and cliques. Psychometrika 40(3), 297–313 (1975)CrossRefMATH
19.
Zurück zum Zitat Beyer, K., Goldstein, J., Ramakrishnan, R., Shaft, U.: When Is “Nearest Neighbor” Meaningful? Technical report, University of Wisconsin-Madison Department of Computer Sciences, Madison (1998) Beyer, K., Goldstein, J., Ramakrishnan, R., Shaft, U.: When Is “Nearest Neighbor” Meaningful? Technical report, University of Wisconsin-Madison Department of Computer Sciences, Madison (1998)
20.
Zurück zum Zitat Hinneburg, A., Aggarwal, C., Keim, D.: What is the nearest neighbor in high dimensional spaces? In: Proceedings of the 26th International Conference on Very Large Data Bases (VLDB 2000), pp. 506–515. Cairo (2000) Hinneburg, A., Aggarwal, C., Keim, D.: What is the nearest neighbor in high dimensional spaces? In: Proceedings of the 26th International Conference on Very Large Data Bases (VLDB 2000), pp. 506–515. Cairo (2000)
21.
Zurück zum Zitat Olsen, D.: Means for Finding Meaningful Levels of a Hierarchical Sequence Prior to Performing a Cluster Analysis. Technical report. Minneapolis (2014) Olsen, D.: Means for Finding Meaningful Levels of a Hierarchical Sequence Prior to Performing a Cluster Analysis. Technical report. Minneapolis (2014)
22.
Zurück zum Zitat Cormen, T., Leiserson, C., Rivest, R., Stein, C.: Introduction to Algorithms (2nd edn.). MIT Press (2004) Cormen, T., Leiserson, C., Rivest, R., Stein, C.: Introduction to Algorithms (2nd edn.). MIT Press (2004)
23.
Zurück zum Zitat Olsen, D.: Closing the loop on a complete linkage hierarchical clustering method. In: Proceedings of the 11th International Conference on Informatics in Control, Automation and Robotics (ICINCO 2014). Vienna (2014) Olsen, D.: Closing the loop on a complete linkage hierarchical clustering method. In: Proceedings of the 11th International Conference on Informatics in Control, Automation and Robotics (ICINCO 2014). Vienna (2014)
Metadaten
Titel
Using Distance Graphs to Find Meaningful Levels of a Hierarchical Sequence Prior to Performing a Cluster Analysis
verfasst von
David Allen Olsen
Copyright-Jahr
2016
DOI
https://doi.org/10.1007/978-3-319-26453-0_1

Neuer Inhalt