Skip to main content
main-content

Über dieses Buch

This book is published open access under a CC BY 4.0 license.

It covers aspects of unsupervised machine learning used for knowledge discovery in data science and introduces a data-driven approach to cluster analysis, the Databionic swarm (DBS). DBS consists of the 3D landscape visualization and clustering of data. The 3D landscape enables 3D printing of high-dimensional data structures. The clustering and number of clusters or an absence of cluster structure are verified by the 3D landscape at a glance. DBS is the first swarm-based technique that shows emergent properties while exploiting concepts of swarm intelligence, self-organization and the Nash equilibrium concept from game theory. It results in the elimination of a global objective function and the setting of parameters. By downloading the R package DBS can be applied to data drawn from diverse research fields and used even by non-professionals in the field of data mining.

Unsere Produktempfehlungen

Premium-Abo der Gesellschaft für Informatik

Sie erhalten uneingeschränkten Vollzugriff auf alle acht Fachgebiete von Springer Professional und damit auf über 45.000 Fachbücher und ca. 300 Fachzeitschriften.

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit dem Kombi-Abo erhalten Sie vollen Zugriff auf über 1,8 Mio. Dokumente aus mehr als 61.000 Fachbüchern und rund 500 Fachzeitschriften aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Umwelt
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe

Testen Sie jetzt 30 Tage kostenlos.

Basis-Abo der Gesellschaft für Informatik

Sie erhalten uneingeschränkten Vollzugriff auf die Inhalte der Fachgebiete Business IT + Informatik und Management + Führung und damit auf über 30.000 Fachbücher und ca. 130 Fachzeitschriften.

Weitere Produktempfehlungen anzeigen

Inhaltsverzeichnis

Frontmatter

Open Access

Chapter 1. Introduction

We live in a time when information is cheaply available and saved as data nearly everywhere. The amount of generated data is growing exponentially. By the end of the year 2016 alone, 9000 exabytes of data will have been generated, equal to 9 trillion gigabytes or the capacity of 360 billion Blu-ray Discs [Schiele, 2016].
Michael Christoph Thrun

Open Access

Chapter 2. Fundamentals

The first section of this chapter familiarizes the reader with the definitions of the basic notation and terminology used in this thesis. Concepts of graph theory are introduced in the next section. They give rise to a new concept of neighborhoods, which is utilized in several chapters.
Michael Christoph Thrun

Open Access

Chapter 3. Approaches to Cluster Analysis

Many data mining methods rely on some concept of the similarity between pieces of information encoded in the data of interest. Various names have been applied to these clustering methods, depending largely on the field of application in data science. For example, in biology the term “numerical taxonomy” is used [Thorel et al., 1990], in psychology the term Q analysis is sometimes employed, market researchers often talk about “segmentation” [Arimond/Elfessi, 2001] and in the artificial intelligence literature, unsupervised pattern recognition is the favored label [Everitt et al., 2001, p. 4].
Michael Christoph Thrun

Open Access

Chapter 4. Methods of Projection

Dimensionality reduction techniques reduce the dimensions of the input space to facilitate the exploration of structures in high-dimensional data. Two general dimensionality reduction approaches exist: manifold learning and projection. Manifold-learning methods attempt to find a sub-space in which the high-dimensional distances can be preserved.
Michael Christoph Thrun

Open Access

Chapter 5. Visualizing the Output Space

Projection methods are a common approach to dimensionality reduction with the aim of transforming high-dimensional data into a low-dimensional space. For data visualization purposes, projections into two dimensions are considered here. However, when the output space is limited to two dimensions, the low-dimensional similarities cannot completely represent the high-dimensional distances, which can result in a misleading interpretation of the underlying structures.
Michael Christoph Thrun

Open Access

Chapter 6. Quality Assessments of Visualizations

Dimensionality reduction techniques reduce the dimensions of the input space to facilitate the exploration of structures in high-dimensional data. Two general dimensionality reduction approaches exist: manifold learning and projection. Manifold learning methods attempt to find sub-spaces in which the high-dimensional distances are preserved.
Michael Christoph Thrun

Open Access

Chapter 7. Behavior-based Systems in Data Science

Many technological advances have been achieved with the help of bionics, which is defined as the application of biological methods and systems found in nature. A related, rarely discussed subfield of information technology is called databionics. Databionics refers to the attempt to adopt information processing techniques from nature.
Michael Christoph Thrun

Open Access

Chapter 8. Databionic Swarm (DBS)

This chapter introduces a new concept for the use of swarm intelligence. It makes use of insights from the previous chapter and proposes a projection method based on a swarm of intelligent agents called DataBots [Ultsch, 2000c]. This new swarm is called a polar swarm (Pswarm) because its agents move in polar coordinates based on symmetry considerations (see [Feynman et al., 2007, pp. 147-153, 745]).
Michael Christoph Thrun

Open Access

Chapter 9. Experimental Methodology

This chapter describes all the data sets used in the results chapter and the parameter settings for the various methods. In the final section, brief overviews of the Gene Ontology (GO) database and overrepresentation analysis (ORA) are provided. For general distribution analyses, the CRAN R package AdaptGauss [Thrun/Ultsch, 2015; Ultsch et al., 2015] was used.
Michael Christoph Thrun

Open Access

Chapter 10. Results on Pre-classified Data Sets

This chapter has three sections. In the first section, the results of the Databionic swarm (DBS) clustering framework are compared with the given prior classifications for data sets from the Fundamental Clustering Problems Suite (FCPS) [Ultsch, 2005a]. The results for nine data sets analyzed using common clustering algorithms are compared in the first subsection.
Michael Christoph Thrun

Open Access

Chapter 11. DBS on Natural Data Sets

Several real-world data sets are used in this chapter to show that Databionic swarm (DBS) is able to find clusters in a variety of cases. The leukemia data set is based on luminance measurements of 7747 different active or non-active genes in 554 human subjects. The World GDP data set is a multivariate time series that consists of monetary values for 190 countries from 1970 to 2010.
Michael Christoph Thrun

Open Access

Chapter 12. Knowledge Discovery with DBS

In contrast to chapter 11, in which Databionic swarm (DBS) clustering was applied to recognize more or less obvious knowledge, this chapter shows that DBS is also able to discover new knowledge. A hydrological data set of multivariate time series [Aubert et al., 2016] and a data set consisting of pain genes [Ultsch et al., 2016b] are used for this purpose. In [Aubert et al., 2016], a high-frequency time series analysis was performed, but no prediction could be made.
Michael Christoph Thrun

Open Access

Chapter 13. Discussion

This work examined and analyzed patterns in high-dimensional data characterized by discontinuity. Such distance- or density-based patterns are either compact or connected structures. If the structures are compact, inter- versus intracluster distances are relevant.
Michael Christoph Thrun

Open Access

Chapter 14. Conclusion

A new and data-driven approach for cluster analysis and visualization is introduced in this work. The projection based clustering combines structures preserved in two dimensions with underlying high-dimensional structures (see also [Thrun et al., 2017, Thrun/Ultsch, 2017a]). It is a flexible and robust approach for cluster analysis that consists of three independent modules which can be optionally combined into the Databionic swarm (DBS).
Michael Christoph Thrun

Backmatter

Weitere Informationen

Premium Partner

BranchenIndex Online

Die B2B-Firmensuche für Industrie und Wirtschaft: Kostenfrei in Firmenprofilen nach Lieferanten, Herstellern, Dienstleistern und Händlern recherchieren.

Whitepaper

- ANZEIGE -

Best Practices für die Mitarbeiter-Partizipation in der Produktentwicklung

Unternehmen haben das Innovationspotenzial der eigenen Mitarbeiter auch außerhalb der F&E-Abteilung erkannt. Viele Initiativen zur Partizipation scheitern in der Praxis jedoch häufig. Lesen Sie hier  - basierend auf einer qualitativ-explorativen Expertenstudie - mehr über die wesentlichen Problemfelder der mitarbeiterzentrierten Produktentwicklung und profitieren Sie von konkreten Handlungsempfehlungen aus der Praxis.
Jetzt gratis downloaden!

Bildnachweise