nach oben

Erschienen in:

2016 | OriginalPaper | Buchkapitel

Using Distance Graphs to Find Meaningful Levels of a Hierarchical Sequence Prior to Performing a Cluster Analysis

verfasst von : David Allen Olsen

Erschienen in: Informatics in Control, Automation and Robotics

Verlag: Springer International Publishing

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

By unwinding the assumptions that underlie the standard complete linkage method, the size of a hierarchical sequence reverts back from n levels to \(\frac{n \cdot (n - 1)}{2} + 1\) levels, and the time complexity to construct cluster sets becomes \(O(n^{4})\). To resolve this problem, distance graphs are used to find meaningful levels of an \(\frac{n \cdot (n-1)}{2} + 1\)-level hierarchical sequence prior to performing a cluster analysis. By doing so, it is possible to construct only the cluster sets for meaningful levels and reduce the time complexity from \(O(n^{4})\) to \(O(ln^2)\). Increasing the dimensionality of the data points helps reveal inherent structure in noisy data, which is necessary for finding meaningful levels. The means is theoretically validated. Empirical results from three experiments show that the means does not impose structure on a data set, that it is easy to use, and that it can identify cluster sets that have real world meaning.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel Erratum to: Informatics in Control, Automation and Robotics

Nächstes Kapitel Reconfigurable Priority Ceiling Protocol: A Safe Way to Real-Time Reconfiguration

A “meaningful cluster set” refers to a cluster set that can have real world meaning. Where there is good inherent structure, a “meaningful level” refers to a level of a hierarchical sequence at which a new configuration of clusters has finished forming. These definitions appear to be synonymous for \(\frac{n \cdot (n - 1)}{2} + 1\)-level hierarchical sequences. The cluster set that is constructed for a meaningful level is a meaningful cluster set, so these terms are used interchangeably.

The model for a measured value is measured value = true value + bias (accuracy) + random error (statistical uncertainty or precision) [12]. This model has substantially broader applicability than the taxonomic model that is the basis for the standard complete linkage method.

These data sets are used by many cyber-physical systems and include time series. For example, a typical automobile has about 500 sensors; a small, specialty brewery has about 600 sensors; and a small power plant has about 1100 sensors. The new clustering method may accommodate large-n, large-m data sets as well, and future work includes using multicore and/or heterogeneous processors to parallelize parts of the new clustering method, but large-n, large-m data sets are not the focus here.

In real world terms, this is the same as calibrating the sensors.

Attenuating the effects of noise refers to reducing the effects of noise on cluster construction.

Seven packets from mote 9 were dropped during transmission.

Murtagh, F.: The remarkable simplicity of very high dimensional data: application of model-based clustering. J. Classif. 26, 249–277 (2009)CrossRefMATHMathSciNet

Everitt, B., Landau, S., Leese, M., Stahl, D.: Cluster Analysis (5th edn.). Wiley (2011)

Anderberg, M.: Cluster Analysis for Applications. Academic Press (1973)

Kirk, D., Hwu, W.: Programming Massively Parallel Processors (2d edn.). Elsevier (2013)

Jain, A., Dubes, R.: Algorithms for Clustering Data. Prentice Hall (1988)

Berkhin, P.: A survey of clustering data mining techniques. In: Kogan, J., Nicholas, C., Teboulle, M. (eds.) Grouping Multidimensional Data: Recent Advances in Clustering, ch. 2, pp. 25–71. Springer (2006)

Lance, G., Williams, W.: A general theory of classificatory sorting strategies II clustering systems. Comput. J. 10(3), 271–277 (1967)CrossRef

Olsen, D.: INCLude Hierarchical Clustering: A Hierarchical Clustering Method Based Solely on Interpoint Distances. Technical report, Minneapolis (2014)

Johnson, R., Wichern, D.: Applied Multivariate Statistical Analysis (5th edn.). Prentice Hall (2002)

10.

Isermann, R.: Fault-Diagnosis Systems: An Introduction from Fault Detection to Fault Tolerance. Springer (2006)

11.

Gill, H.: CPS overview. In: Symposium on Control and Modeling Cyber-Physical Systems, Champaign. http://www.csl.illinois.edu/video/csl-emerging-topics-2011-cyber-physical-systems-helen-gill-presentation (2011)

12.

Navidi, W.: Statistics for Engineers and Scientists. McGraw-Hill (2006)

13.

Tibshirani, R., Walther, G., Hastie, T.: Estimating the number of clusters in a dataset via the gap statistic. J. Roy. Stat. Soc.: Ser. B (Stat. Methodol.) 63(2), 411–423 (2001)

14.

Kim, H., Lee, S.: A semi-supervised document clustering technique for information organization. In: Proceedings of the Ninth International Conference on Information and Knowledge Management (CIKM ’00), pp. 30–37. McLean (2000)

15.

Daniels, K., Giraud-Carrier, C.: Learning the threshold in hierarchical agglomerative clustering. In: Proceedings of the Fifth International Conference on Machine Learning and Applications (ICMLA ’06), pp. 270–278. Orlando (2006)

16.

Matula, D.: Graph theoretic techniques for cluster analysis algorithms. In: van Ryzin, J. (ed.) Classification and Clustering, pp. 95–129. Academic Press (1977)

17.

Peay, E.: Hierarchical clique structures. Sociometry 37(1), 54–65 (1974)CrossRef

18.

Peay, E.: Nonmetric grouping: clusters and cliques. Psychometrika 40(3), 297–313 (1975)CrossRefMATH

19.

Beyer, K., Goldstein, J., Ramakrishnan, R., Shaft, U.: When Is “Nearest Neighbor” Meaningful? Technical report, University of Wisconsin-Madison Department of Computer Sciences, Madison (1998)

20.

Hinneburg, A., Aggarwal, C., Keim, D.: What is the nearest neighbor in high dimensional spaces? In: Proceedings of the 26th International Conference on Very Large Data Bases (VLDB 2000), pp. 506–515. Cairo (2000)

21.

Olsen, D.: Means for Finding Meaningful Levels of a Hierarchical Sequence Prior to Performing a Cluster Analysis. Technical report. Minneapolis (2014)

22.

Cormen, T., Leiserson, C., Rivest, R., Stein, C.: Introduction to Algorithms (2nd edn.). MIT Press (2004)

23.

Olsen, D.: Closing the loop on a complete linkage hierarchical clustering method. In: Proceedings of the 11th International Conference on Informatics in Control, Automation and Robotics (ICINCO 2014). Vienna (2014)

Titel: Using Distance Graphs to Find Meaningful Levels of a Hierarchical Sequence Prior to Performing a Cluster Analysis
verfasst von: David Allen Olsen
Verlag: Springer International Publishing
Buch: Informatics in Control, Automation and Robotics
Print ISBN: 978-3-319-26451-6

Electronic ISBN: 978-3-319-26453-0

Copyright-Jahr: 2016
DOI: https://doi.org/10.1007/978-3-319-26453-0_1

Neuer Inhalt

Bildnachweise

VDI-Icon, Profil Icon, inhalt2, Springer Professional Modul/© Springer Fachmedien Wiesbaden GmbH, Nachhaltigkeitsaward Key Visual/© Cometis AG/Global ESG Monitor | Daniel Rupp | Generiert mit KI, Search Icon, Banner Hanser, Beijing Auto Show 2024: Deutsche Hersteller wollen angreifen./© EKH-Pictures / Generated with AI / Stock.adobe.com, Buchstaben, die aus einem Megaphon kommen/© MicroStockHub/Getty Images/iStock, Digitale Lieferkette/© zapp2photo / stock.adobe.com, Zeitschrift Wissensmanagement Cover, PatentFit-Logo/© Springer Fachmedien Wiesbaden GmbH, Sustainibility Finance/© Robert Kneschke / stock.adobe.com / Springer Fachmedien Wiesbaden GmbH, Zukunftswerkstatt Sales Excellence 2024/© AndreyPopov / Getty Images / iStock, 2023_Antrieb/© supervisuell

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Neuer Inhalt

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.