Skip to main content

2020 | OriginalPaper | Buchkapitel

21. Classifying Streaming Data

verfasst von : Prof. Max Bramer

Erschienen in: Principles of Data Mining

Verlag: Springer London

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

This chapter is concerned with the classification of streaming data, i.e. data which arrives (generally in large quantities) from some automatic process over a period of days, months, years or potentially forever.
Generating a classification tree for streaming data requires a different approach from the TDIDT algorithm described earlier in this book. The algorithm given here, H-Tree, is a variant of the popular VFDT algorithm which generates a type of decision tree called a Hoeffding Tree. The algorithm is described and explained in detailed with accompanying pseudocode for the benefit of readers who may be interested in developing their own implementations. An example is given to illustrate a way of comparing the rules generated by H-Tree with those from TDIDT.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
We distinguish between nodes which have or have not previously been split on an attribute. The former are called internal nodes; the latter are called leaf nodes. We will consider the root node not as a third type of node but as an internal node after it has been split on an attribute and a leaf node before that.
 
2
A note on notation. In this chapter array elements are generally shown enclosed in square brackets, e.g. \(\textit{currentAtts}[2]\). However an array containing a number of constant values will generally be denoted by those values separated by commas and enclosed in braces. So \(\textit{currentAtts}[2]\) is \(\{\textit{att1}, \textit{att2}, \textit{att3}, \textit{att5}, \textit{att6}, \textit{att7}\}\).
 
3
The row and column headings are provided to assist the reader only. The table itself has 3 rows and 3 columns.
 
4
Pseudocode fragments are provided for the benefit of readers who may be interested in developing their own implementations of the H-Tree algorithm. Other readers can safely ignore them.
 
5
As initially there are no other nodes, all incoming records will be sorted there.
 
6
In Figures 21.6, 21.8 and 21.9 we depart from our usual notation for trees and show the values that are in the classtotals array for each node.
 
7
Confusion matrices were described in Chapter 7.
 
8
For some practical applications, to have a tree with a smaller number of leaf nodes which predicts the same or almost the same classifications as the complete TDIDT decision tree might be considered preferable, but we will not pursue that issue here.
 
Literatur
[1]
Zurück zum Zitat Domingos, P., & Hulten, G. (2000). Mining high-speed data streams. In Proceedings of the sixth ACM SIGKDD international conference on knowledge discovery and data mining (pp. 71–80). New York: ACM. CrossRef Domingos, P., & Hulten, G. (2000). Mining high-speed data streams. In Proceedings of the sixth ACM SIGKDD international conference on knowledge discovery and data mining (pp. 71–80). New York: ACM. CrossRef
[2]
Zurück zum Zitat Hoeffding, W. (1963). Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association, 58 (301), 13–30. MathSciNetCrossRef Hoeffding, W. (1963). Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association, 58 (301), 13–30. MathSciNetCrossRef
Metadaten
Titel
Classifying Streaming Data
verfasst von
Prof. Max Bramer
Copyright-Jahr
2020
Verlag
Springer London
DOI
https://doi.org/10.1007/978-1-4471-7493-6_21

Neuer Inhalt