nach oben

Erschienen in:

2020 | OriginalPaper | Buchkapitel

21. Classifying Streaming Data

verfasst von : Prof. Max Bramer

Erschienen in: Principles of Data Mining

Verlag: Springer London

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

This chapter is concerned with the classification of streaming data, i.e. data which arrives (generally in large quantities) from some automatic process over a period of days, months, years or potentially forever.

Generating a classification tree for streaming data requires a different approach from the TDIDT algorithm described earlier in this book. The algorithm given here, H-Tree, is a variant of the popular VFDT algorithm which generates a type of decision tree called a Hoeffding Tree. The algorithm is described and explained in detailed with accompanying pseudocode for the benefit of readers who may be interested in developing their own implementations. An example is given to illustrate a way of comparing the rules generated by H-Tree with those from TDIDT.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel Text Mining

Nächstes Kapitel Classifying Streaming Data II: Time-Dependent Data

We distinguish between nodes which have or have not previously been split on an attribute. The former are called internal nodes; the latter are called leaf nodes. We will consider the root node not as a third type of node but as an internal node after it has been split on an attribute and a leaf node before that.

A note on notation. In this chapter array elements are generally shown enclosed in square brackets, e.g. \(\textit{currentAtts}[2]\). However an array containing a number of constant values will generally be denoted by those values separated by commas and enclosed in braces. So \(\textit{currentAtts}[2]\) is \(\{\textit{att1}, \textit{att2}, \textit{att3}, \textit{att5}, \textit{att6}, \textit{att7}\}\).

The row and column headings are provided to assist the reader only. The table itself has 3 rows and 3 columns.

Pseudocode fragments are provided for the benefit of readers who may be interested in developing their own implementations of the H-Tree algorithm. Other readers can safely ignore them.

As initially there are no other nodes, all incoming records will be sorted there.

In Figures 21.6, 21.8 and 21.9 we depart from our usual notation for trees and show the values that are in the classtotals array for each node.

Confusion matrices were described in Chapter 7.

For some practical applications, to have a tree with a smaller number of leaf nodes which predicts the same or almost the same classifications as the complete TDIDT decision tree might be considered preferable, but we will not pursue that issue here.

[1]

Domingos, P., & Hulten, G. (2000). Mining high-speed data streams. In Proceedings of the sixth ACM SIGKDD international conference on knowledge discovery and data mining (pp. 71–80). New York: ACM. CrossRef

[2]

Hoeffding, W. (1963). Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association, 58 (301), 13–30. MathSciNetCrossRef

Titel: Classifying Streaming Data
verfasst von: Prof. Max Bramer
Verlag: Springer London
Buch: Principles of Data Mining
Print ISBN: 978-1-4471-7492-9

Electronic ISBN: 978-1-4471-7493-6

Copyright-Jahr: 2020
DOI: https://doi.org/10.1007/978-1-4471-7493-6_21

Neuer Inhalt

Bildnachweise

VDI-Icon, Profil Icon, inhalt2, Springer Professional Modul/© Springer Fachmedien Wiesbaden GmbH, Nachhaltigkeitsaward Key Visual/© Cometis AG/Global ESG Monitor | Daniel Rupp | Generiert mit KI, Search Icon, Banner Hanser, Jonas Klose/© Pine Valley Capital GmbH, Carina Kießling von der Strategieberatung Roland Berger/© Monika Walther Fotografie | ATZ, Beijing Auto Show 2024: Deutsche Hersteller wollen angreifen./© EKH-Pictures / Generated with AI / Stock.adobe.com, Zeitschrift Wissensmanagement Cover, PatentFit-Logo/© Springer Fachmedien Wiesbaden GmbH, Zukunftswerkstatt Sales Excellence 2024/© AndreyPopov / Getty Images / iStock, 2023_Antrieb/© supervisuell, ATZ-Webinar: Prototypenfreie Entwicklung durch Offline- und Driver-in-the-Loop-HiL-Tests /© (c) VI-grade

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Neuer Inhalt

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.