ABSTRACT
Mining data streams is important in both science and commerce. Two major challenges are (1) the data may grow without limit so that it is difficult to retain a long history; and (2) the underlying concept of the data may change over time. Different from common practice that keeps recent raw data, this paper uses a measure of conceptual equivalence to organize the data history into a history of concepts. Along the journey of concept change, it identifies new concepts as well as re-appearing ones, and learns transition patterns among concepts to help prediction. Different from conventional methodology that passively waits until the concept changes, this paper incorporates proactive and reactive predictions. In a proactive mode, it anticipates what the new concept will be if a future concept change takes place, and prepares prediction strategies in advance. If the anticipation turns out to be correct, a proper prediction model can be launched instantly upon the concept change. If not, it promptly resorts to a reactive mode: adapting a prediction model to the new data. A system RePro is proposed to implement these new ideas. Experiments compare the system with representative existing prediction methods on various benchmark data sets that represent diversified scenarios of concept change. Empirical evidence demonstrates that the proposed methodology is an effective and efficient solution to prediction for data streams.
- C. C. Aggarwal, J. Han, J. Wang, and P. S. Yu. A framework for clustering evolving data streams. In Proc. of 29th VLDB, pages 81--92, 2003. Google ScholarDigital Library
- V. Ganti, J. Gehrke, and R. Ramakrishnan. Demon: Mining and monitoring evolving data. IEEE TKDE, 13(1):50--63, 2001. Google ScholarDigital Library
- G. Hulten, L. Spencer, and P. Domingos. Mining time-changing data streams. In Proc. of 7th SIGKDD, pages 97--106, 2001. Google ScholarDigital Library
- E. Keogh and S. Kasetty. On the need for time series data mining benchmarks: a survey and empirical demonstration. In Proc. of 8th SIGKDD, pages 102--111, 2002. Google ScholarDigital Library
- J. Z. Kolter and M. A. Maloof. Dynamic weighted majority: A new ensemble method for tracking concept drift. In Proc. of 3rd ICDM, pages 123--130, 2003. Google ScholarDigital Library
- C. Lanquillon and I. Renz. Adaptive information filtering: Detecting changes in text streams. In Proc. of 8th CIKM, pages 538--544, 1999. Google ScholarDigital Library
- J. R. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, 1993. Google ScholarDigital Library
- M. Salganicoff. Tolerating concept and sampling shift in lazy learning using prediction error context switching. Artificial Intelligence Review, 11(1-5):133--155, February 1997. Google ScholarDigital Library
- K. O. Stanley. Learning concept drift with a committee of decision trees, 2003. Technical Report AI-03-302, Department of Computer Sciences, University of Texas at Austin, USA.Google Scholar
- W. N. Street and Y. Kim. A streaming ensemble algorithm (sea) for large-scale classification. In Proc. of 7th SIGKDD, pages 377--382, 2001. Google ScholarDigital Library
- A. Tsymbal. The problem of concept drift: definitions and related work, 2004. Technical Report TCD-CS-2004-15, Computer Science Department, Trinity College Dublin, Ireland.Google Scholar
- H. Wang, W. Fan, P. S. Yu, and J. Han. Mining concept-drifting data streams using ensemble classifiers. In Proc. of 9th SIGKDD, pages 226--235, 2003. Google ScholarDigital Library
- G. Widmer and M. Kubat. Learning in the presence of concept drift and hidden contexts. Machine Learning, 23(1):69--101, 1996. Google ScholarDigital Library
- Y. Yang, X. Wu, and X. Zhu. Dealing with predictive-but-unpredictable attributes in noisy data sources. In Proc. of 8th PKDD, pages 471--483, 2004. Google ScholarDigital Library
- Y. Yang, X. Wu, and X. Zhu. Proactive-reactive prediction for data streams, 2005. Technical Report CS-05-03, Department of Computer Sciences, University of Vermont, USA.Google Scholar
Index Terms
- Combining proactive and reactive predictions for data streams
Recommendations
Mining in Anticipation for Concept Change: Proactive-Reactive Prediction in Data Streams
Prediction in streaming data is an important activity in the modern society. Two major challenges posed by data streams are (1) the data may grow without limit so that it is difficult to retain a long history of raw data; and (2) the underlying concept ...
Enhancing data stream predictions with reliability estimators and explanation
Incremental learning from data streams is increasingly attracting research focus due to many real streaming problems (such as learning from transactions, sensors or other sequential observations) that require processing and forecasting in the real time. ...
Detecting concept drift in data streams using model explanation
A novel concept drift detector for data streams is proposed.The drift detector can be combined with an arbitrary classification algorithm.The drift detector uses model explanation to detect concept drift.The approach features good drift detection, ...
Comments