Drift detection using uncertainty distribution divergence

Lindstrom, Patrick; Mac Namee, Brian; Delany, Sarah Jane

doi:10.1007/s12530-012-9061-6

Drift detection using uncertainty distribution divergence

Original Paper
Published: 08 August 2012

Volume 4, pages 13–25, (2013)
Cite this article

Evolving Systems Aims and scope Submit manuscript

Patrick Lindstrom¹,
Brian Mac Namee¹ &
Sarah Jane Delany¹

717 Accesses
26 Citations
Explore all metrics

Abstract

Data generated from naturally occurring processes tends to be non-stationary. For example, seasonal and gradual changes in climate data and sudden changes in financial data. In machine learning the degradation in classifier performance due to such changes in the data is known as concept drift and there are many approaches to detecting and handling it. Most approaches to detecting concept drift, however, make the assumption that true classes for test examples will be available at no cost shortly after classification and base the detection of concept drift on measures relying on these labels. The high labelling cost in many domains provides a strong motivation to reduce the number of labelled instances required to detect and handle concept drift. Triggered detection approaches that do not require labelled instances to detect concept drift show great promise for achieving this. In this paper we present Confidence Distribution Batch Detection, an approach that provides a signal correlated to changes in concept without using labelled data. This signal combined with a trigger and a rebuild policy can maintain classifier accuracy which, in most cases, matches the accuracy achieved using classification error based detection techniques but using only a limited amount of labelled data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Notes

References

Cohn D, Atlas L, Ladner R (1994) Improving generalization with active learning. Mach Learn 15(2):201–221
Google Scholar
Delany SJ, Cunningham P, Tsymbal A, Coyle L (2005) A case-based technique for tracking concept drift in spam filtering. Knowl Based Syst 18(4–5):187–195
Article Google Scholar
Fan W, Huang Y, Wang H, Yu PS (2004) Active mining of data streams. Proc Fourth SIAM Int Conf Data Min 35(4):457–461
MathSciNet Google Scholar
Gama J, Medas P, Castillo G, Rodrigues P (2004) Learning with drift detection. In: Bazzan A, Labidi S (eds) Advances in artificial intelligence SBIA 2004. Lecture notes in computer science, vol 3171. Springer, Berlin, pp 66–112
Gao J, Fan W, Han J (2007) On appropriate assumptions to mine data streams: analysis and practice. In: Seventh IEEE international conference on data mining, 2007. ICDM 2007, pp 143–152
Hsiao W, Chang T (2008) An incremental cluster-based approach to spam filtering. Expert Syst Appl 34(3):1599–1608
Article Google Scholar
Huang S, Dong Y (2007) An active learning system for mining time-changing data streams. Intell Data Anal 11(4):401–419
Google Scholar
Kifer D, Ben-David S, Gehrke J (2004) Detecting change in data streams. In: Proceedings of the thirtieth international conference on very large data bases. VLDB Endowment, vol 30, pp 180–191
Klinkenberg R (2004) Learning drifting concepts: example selection vs. example weighting. Intell Data Anal 8(3):281–300
Google Scholar
Klinkenberg R, Renz I (1998) Adaptive information filtering: learning in the presence of concept drifts. In: Workshop notes of the ICML/AAAI-98 workshop learning for text categorization. AAAI Press, Menlo Park, pp 33–40
Kolter J, Maloof M (2003) Dynamic weighted majority: a new ensemble method for tracking concept drift. In: Third IEEE international conference on data mining, 2003. ICDM 2003. IEEE, New York, pp 123–130
Kubat M (1989) Floating approximation in time-varying knowledge bases. Pattern Recognit Lett 10(4):223–227
Article MATH Google Scholar
Kuncheva LI (2009) Using control charts for detecting concept change in streaming data. Tech. Rep. BCS-TR-001-2009, School of Computer Science, Bangor University, UK
Lanquillon C (1999) Information filtering in changing domains. In: Proceedings of the 16th international joint conference on artificial intelligence, pp 41–48
Lewis D (1995) Evaluating and optimizing autonomous text classification systems. In: Proceedings of the 18th annual international ACM SIGIR conference on research and development in information retrieval. ACM, New York, pp 246–254
Lindstrom P, Mac Namee B, Delany SJ (2010) Handling concept drift in a text data stream constrained by high labelling cost. In: Guesgen HW, Murray RC (eds) Proceedings of the twenty-third international Florida artificial intelligence research society conference. AAAI Press, Menlo Park
Lindstrom P, Mac Namee B, Delany SJ (2011) Drift detection using uncertainty distribution divergence. In: 2nd International workshop on handling concept drift in adaptive information systems (HaCDAIS). IEEE Computer Society, New York, pp 604–608
Masud M, Gao J, Khan L, Han J, Thuraisingham B (2008) A practical approach to classify evolving data streams: training with limited amount of labeled data. In: Eighth IEEE international conference on data mining, 2008. ICDM ’08, pp 929–934
Montgomery DC (2004) Introduction to statistical quality control. Wiley, New York
Google Scholar
Nishida K, Yamauchi K (2007) Detecting concept drift using statistical testing. In: Corruble V, Takeda M, Suzuki E (eds) Discovery science. Lecture notes in computer science, vol 4755. Springer, Berlin, pp 264–269
Schlimmer JC, Granger RH (1986) Incremental learning from noisy data. Mach Learn 1:317–354
Google Scholar
Sebastio R, Gama J (2007) Change detection in learning histograms from data streams. In: Neves J, Santos M, Machado J (eds) Progress in artificial intelligence. Lecture notes in computer science, vol 4874. Springer, Berlin, pp 112–123
Spinosa EJ, de Leon AP, Gama J (2007) OLINDDA: a cluster-based approach for detecting novelty and concept drift in data streams. In: Proceedings of the 2007 ACM symposium on applied computing, SAC ’07. ACM, New York, pp 448–452
Swan R, Allan J (1999) Extracting significant time varying features from text. In: Proceedings of the eighth international conference on information and knowledge management, CIKM ’99. ACM, New York, pp 38–45
Vapnik VN (1995) The nature of statistical learning theory. Springer, New York
MATH Google Scholar
Woolam C, Masud M, Khan L (2009) Lacking labels in the stream: classifying evolving stream data with few labels. In: Rauch J, Ras Z, Berka P, Elomaa T (eds) Foundations of intelligent systems. Lecture notes in computer science, vol 5722. Springer, Berlin, pp 552–562
Yang Y, Liu X (1999) A re-examination of text categorization methods. In: Proceedings of the 22nd annual international ACM SIGIR conference on research and development in information retrieval, SIGIR ’99. ACM, New York, pp 42–49
Zhu X, Zhang P, Lin X, Shi Y (2007) Active learning from data streams. In: Proceedings of the 2007 seventh IEEE international conference on data mining, ICDM ’07. IEEE Computer Society, Washington, DC, pp 757–762
Žliobaite I (2010) Change with delayed labeling: when is it detectable? In: Proceedings of the 2010 IEEE international conference on data mining workshops, ICDMW ’10. IEEE Computer Society, Washington, DC, pp 843–850
Žliobaite I, Bifet A, Pfahringer B, Holmes G (2011) Active learning with evolving streaming data. In: Gunopulos D, Vazirgiannis M, Malerba D, Hofmann T (eds) Proceedings of the 2011 European conference on machine learning and knowledge discovery in databases, ECML PKDD’11, vol Part III. Springer, Berlin, pp 597–612

Download references

Author information

Authors and Affiliations

School of Computing, Dublin Institute of Technology, Dublin, Ireland
Patrick Lindstrom, Brian Mac Namee & Sarah Jane Delany

Authors

Patrick Lindstrom
View author publications
You can also search for this author in PubMed Google Scholar
Brian Mac Namee
View author publications
You can also search for this author in PubMed Google Scholar
Sarah Jane Delany
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Patrick Lindstrom.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lindstrom, P., Mac Namee, B. & Delany, S.J. Drift detection using uncertainty distribution divergence. Evolving Systems 4, 13–25 (2013). https://doi.org/10.1007/s12530-012-9061-6

Download citation

Received: 07 February 2012
Accepted: 15 July 2012
Published: 08 August 2012
Issue Date: March 2013
DOI: https://doi.org/10.1007/s12530-012-9061-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Drift detection using uncertainty distribution divergence

Abstract

Access this article

Similar content being viewed by others

Concept learning using one-class classifiers for implicit drift detection in evolving data streams

SABeDM: a sliding adaptive beta distribution model for concept drift detection in a dynamic environment

Concept Drift Detection Using Online Histogram-Based Bayesian Classifiers

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Drift detection using uncertainty distribution divergence

Abstract

Access this article

Similar content being viewed by others

Concept learning using one-class classifiers for implicit drift detection in evolving data streams

SABeDM: a sliding adaptive beta distribution model for concept drift detection in a dynamic environment

Concept Drift Detection Using Online Histogram-Based Bayesian Classifiers

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation