research-article

Categorizing and mining concept drifting data streams

Authors:
Peng Zhang

Chinese Academy of Sciences, Beijing, China

Chinese Academy of Sciences, Beijing, China
View Profile

,
Xingquan Zhu

Florida Atlantic University, Boca Raton, FL, USA

Florida Atlantic University, Boca Raton, FL, USA
View Profile

,
Yong Shi

University of Nebraska at Omaha, Nebraska, NE, USA

University of Nebraska at Omaha, Nebraska, NE, USA
View Profile

KDD '08: Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data miningAugust 2008Pages 812–820https://doi.org/10.1145/1401890.1401987

Published:24 August 2008Publication History

KDD '08: Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining

Pages 812–820

ABSTRACT

Mining concept drifting data streams is a defining challenge for data mining research. Recent years have seen a large body of work on detecting changes and building prediction models from stream data, with a vague understanding on the types of the concept drifting and the impact of different types of concept drifting on the mining algorithms. In this paper, we first categorize concept drifting into two scenarios: Loose Concept Drifting (LCD) and Rigorous Concept Drifting (RCD), and then propose solutions to handle each of them separately. For LCD data streams, because concepts in adjacent data chunks are sufficiently close to each other, we apply kernel mean matching (KMM) method to minimize the discrepancy of the data chunks in the kernel space. Such a minimization process will produce weighted instances to build classifier ensemble and handle concept drifting data streams. For RCD data streams, because genuine concepts in adjacent data chunks may randomly and rapidly change, we propose a new Optimal Weights Adjustment (OWA) method to determine the optimum weight values for classifiers trained from the most recent (up-to-date) data chunk, such that those classifiers can form an accurate classifier ensemble to predict instances in the yet-to-come data chunk. Experiments on synthetic and real-world datasets will show that weighted instance approach is preferable when the concept drifting is mainly caused by the changing of the class prior probability; whereas the weighted classifier approach is preferable when the concept drifting is mainly triggered by the changing of the conditional probability.

References

P. Domingos & G. Hulten. 2000. Mining high-speed data streams, Proc. of KDD. Google ScholarDigital Library
G. Hulten, L. Spencer, and P. Domingos. 2001. Mining time-changing data streams. In SIGKDD, pages 97--106. Google ScholarDigital Library
B.Babcock, S.Babu, M.Datar, R.Motawani, and J.Widom. 2002. Models and issues in data stream systems. In PODS. Google ScholarDigital Library
C. Aggarwal. 2007. Data Streams: Models and Algorithms. Springer. Google ScholarDigital Library
Y. Chen, G. Dong, J. Han, B. W. Wah, and J. Wang. 2002. Multi-dimensional regression analysis of time-series data streams. In VLDB, Hongkong, China. Google ScholarDigital Library
C. Aggarwal, J. Han, J. Wang, and P. S. Yu. 2004. On demand classification of data streams. In Proc. KDD'04. Google ScholarDigital Library
R. Klinkenberg and T. Joachims.2000. Detecting concept drift with support vector machines. In Proc. ICML. Google ScholarDigital Library
Y. Yang, X. Wu, and X. Zhu. 2005. Combining proactive and reactive predictions for data streams. In Proc. KDD'05. Google ScholarDigital Library
J. Gao, W. Fan, and J. Han, 2007. On appropriate assumptions to mine data streams: Analysis and Practice, In Proc. of IEEE ICDM, pp.143--152. Google ScholarDigital Library
W.Nick Street and YongSeog Kim, 2001, A streaming ensemble algorithm (SEA) for large--scale classification, In Proc. of SIGKDD, pp.377--382. Google ScholarDigital Library
J. Z. Kolter and M. A. Maloof. 2005. Using additive expert ensembles to cope with concept drift. In Proc. ICML. Google ScholarDigital Library
M. Scholz and R. Klinkenberg. 2005. An Ensemble Classifier for Drifting Concepts. In Proc. of the 2nd International Workshop on Knowledge Discovery in Data Streams.Google Scholar
H. Wang, W. Fan, P. Yu, & J. Han. 2003, Mining concept--drifting data streams using ensemble classifiers, in Proc. of KDD. Google ScholarDigital Library
X. Zhu, P. Zhang, X. Lin, and Y. Shi. 2007. Active Learning from Data Streams. In Proc. of IEEE ICDM. Google ScholarDigital Library
W. Dai, Q. Yang, G. Xue, and Y. Yu. 2007. Boosting for Transfer Learning, In Proc. of ICML. Google ScholarDigital Library
H. Shimodaira, 2000. Improving predictive inference under covariate shift by weighting the log-likelihood function. Journal of Statistical Planning and Inference, 90,227--244.Google ScholarCross Ref
M. Sugiyama, & K. Müüller, 2005. Model selection under covariate shift. In Proc. of ICANN. Google ScholarDigital Library
S. Bickel, M. Brückner, and T. Scheffer. 2007. Discriminative learning for differing training and test distributions, In Proc. of ICML, pages 81 -- 88. Google ScholarDigital Library
Bickel, S., & Scheffer, T. 2007. Dirichlet-enhanced spam filtering based on biased samples. Advances in Neural Information Processing Systems.Google Scholar
M. Dudik, R. Schapire, & S. Phillips, 2005. Correcting sample selection bias in maximum entropy density estimation. Advances in Neural Info. Processing Systems.Google Scholar
J. Huang, A. Smola, A. Gretton, K. Borgwardt, & B. Schöölkopf, 2007. Correcting sample selection bias by unlabeled data. Advances in Neural Info. Proc. Systems.Google Scholar
K. Tumer & J. Ghosh.1996. Analysis of decision boundaries in linearly combined neural classifiers, Pattern Recognition, 29(2).Google Scholar
I. Witten & E. Frank. 2005. Data mining: practical machine learning tools and techniques, Morgan Kaufmann. Google ScholarDigital Library
D. Kifer, S. David, J. Gehrke. 2004, Detecing changes in data streams, in Proc. of VLDB, Toronto, Canada. Google ScholarDigital Library

Index Terms

Categorizing and mining concept drifting data streams
1. Computing methodologies
  1. Machine learning
2. Information systems
  1. Information systems applications
    1. Data mining

Recommendations

Mining concept-drifting data streams using ensemble classifiers
KDD '03: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining

Recently, mining data streams with concept drifts for actionable insights has become an important and challenging task for a wide range of applications including credit card fraud protection, target marketing, network intrusion detection, etc. ...
Read More
An Aggregate Ensemble for Mining Concept Drifting Data Streams with Noise
PAKDD '09: Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining

Recent years have witnessed a large body of research work on mining concept drifting data streams, where a primary assumption is that the up-to-date data chunk and the yet-to-come data chunk share identical distributions, so classifiers with good ...
Read More
A Dynamic Weighted Ensemble to Cope with Concept Drifting Classification
ICYCS '08: Proceedings of the 2008 The 9th International Conference for Young Computer Scientists

In the real world concepts are not stable and change with time and a lot of other hidden factors. Stream classifiers should be sensitive to the drifting of concept in an automatic way. In this paper, we proposed a new weighted majority strategy for the ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
KDD '08: Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
August 2008
1116 pages
ISBN:9781605581934
DOI:10.1145/1401890
General Chair:
Ying Li
Microsoft adCenter Labs
,
Program Chairs:
Bing Liu
University of Illinois at Chicago
,
Sunita Sarawagi
Indian Institute of Technology, Bombay
Copyright © 2008 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 24 August 2008
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
classification
concept drifting
data streams
ensemble learning
Qualifiers
- research-article
Conference

Acceptance Rates
KDD '08 Paper Acceptance Rate118of593submissions,20%Overall Acceptance Rate1,133of8,635submissions,13%
More
Upcoming Conference
KDD '24

Sponsor:

sigkdd

sigkdd

The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona , Spain
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 67
  Total Citations
  View Citations
- 2,020
  Total Downloads
- Downloads (Last 12 months)28
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Categorizing and mining concept drifting data streams

KDD '08: Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining

ABSTRACT

References

Cited By

Index Terms

Recommendations

Mining concept-drifting data streams using ensemble classifiers

An Aggregate Ensemble for Mining Concept Drifting Data Streams with Noise

A Dynamic Weighted Ensemble to Cope with Concept Drifting Classification

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Categorizing and mining concept drifting data streams

KDD '08: Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining

ABSTRACT

References

Cited By

Index Terms

Recommendations

Mining concept-drifting data streams using ensemble classifiers

An Aggregate Ensemble for Mining Concept Drifting Data Streams with Noise

A Dynamic Weighted Ensemble to Cope with Concept Drifting Classification

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media