nach oben

Knowledge and Information Systems

Erschienen in:

01.12.2017 | Regular Paper

Tackling heterogeneous concept drift with the Self-Adjusting Memory (SAM)

verfasst von: Viktor Losing, Barbara Hammer, Heiko Wersing

Erschienen in: Knowledge and Information Systems | Ausgabe 1/2018

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Data mining in non-stationary data streams is particularly relevant in the context of Internet of Things and Big Data. Its challenges arise from fundamentally different drift types violating assumptions of data independence or stationarity. Available methods often struggle with certain forms of drift or require unavailable a priori task knowledge. We propose the Self-Adjusting Memory (SAM) model for the k-nearest-neighbor (kNN) algorithm. SAM-kNN can deal with heterogeneous concept drift, i.e., different drift types and rates, using biologically inspired memory models and their coordination. Its basic idea is to have dedicated models for current and former concepts used according to the demands of the given situation. It can be easily applied in practice without meta parameter optimization. We conduct an extensive evaluation on various benchmarks, consisting of artificial streams with known drift characteristics and real-world datasets. We explicitly add new benchmarks enabling a precise performance analysis on multiple types of drift. Highly competitive results throughout all experiments underline the robustness of SAM-kNN as well as its capability to handle heterogeneous concept drift. Knowledge about drift characteristics in streaming data is not only crucial for a precise algorithm evaluation, but it also facilitates the choice of an appropriate algorithm on real-world applications. Therefore, we additionally propose two tests, able to determine the type and strength of drift. We extract the drift characteristics of all utilized datasets and use it for our analysis of the SAM in relation to other methods.

Vorheriger Artikel Binary classifier calibration using an ensemble of piecewise linear regression models

Nächster Artikel Exploiting a novel algorithm and GPUs to break the ten quadrillion pairwise comparisons barrier for time series motifs and joins

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

The VFDT is often called Hoeffding Tree (HT).

In case of ties, we prioritize the models in the following order: \(\hbox {kNN}_{M_{\text {ST}}}\), \(\hbox {kNN}_{M_{\text {LT}}}, \hbox {kNN}_{M_\text {C}}\).

We used kMeans++ [31] because of its efficiency and scalability to larger datasets.

https://github.com/vlosing.

We used the complete benchmarks in our experiments, but for real-world tasks this is often unrealistic.

We used three test models with \(W=\{500, 1000, 5000\}\) in all tests.

We consistently used \(\hat{w} = 20{,}000\).

Regarding our approach, the available space is shared between the STM and LTM.

Chen M, Mao S, Liu Y (2014) Big data: a survey. Mobile Netw Appl 19(2):171–209CrossRef

Atzori L, Iera A, Morabito G (2010) The internet of things: a survey. Comput Netw 54(15):2787–2805CrossRefMATH

Gama J, Medas P, Castillo G, Rodrigues P (2004) Learning with drift detection. In: Advances in artificial intelligence—SBIA. Springer, pp 286–295

Kolter JZ, Maloof MA (2007) Dynamic weighted majority: an ensemble method for drifting concepts. J Mach Learn Res 8:2755–2790MATH

Elwell R, Polikar R (2011) Incremental learning of concept drift in nonstationary environments. IEEE Trans Neural Netw 22(10):1517–1531CrossRef

Schiaffino S, Garcia P, Amandi A (2008) eTeacher: providing personalized assistance to e-learning students. Comput Educ 51(4):1744–1754CrossRef

Dudani SA (1976) The distance-weighted k-nearest-neighbor rule. IEEE Trans Syst Man Cybern 4:325–327CrossRef

Gama J, Žliobaitė I, Bifet A, Pechenizkiy M, Bouchachia A (2014) A survey on concept drift adaptation. ACM Comput Surv (CSUR) 46(4):44CrossRefMATH

Ditzler G, Roveri M, Alippi C, Polikar R (2015) Learning in nonstationary environments: a survey. IEEE Comput Intell Mag 10(4):12–25CrossRef

10.

Bifet A, Gavald R. (2007) Learning from time-changing data with adaptive windowing, pp 443–448. https://doi.org/10.1137/1.9781611972771.42

11.

Dasu T, Krishnan S, Venkatasubramanian S, Yi K (2006) An information-theoretic approach to detecting changes in multi-dimensional data streams. In: Proceedings of symposium on the interface of statistics, computing science, and applications. Citeseer

12.

Kifer D, Ben-David S, Gehrke J (2004) Detecting change in data streams. In: Proceedings of the thirtieth international conference on very large data bases-volume 30. VLDB Endowment, pp 180–191

13.

Bifet A, Pfahringer B, Read J, Holmes G (2013) Efficient data stream classification via probabilistic adaptive windows. In: Proceedings of the 28th annual ACM symposium on applied computing, ser. SAC ’13. ACM, New York, NY, USA, pp 801–806

14.

Widmer G, Kubat M (1996) Learning in the presence of concept drift and hidden contexts. Mach Learn 23(1):69–101

15.

Klinkenberg R, Joachims T (2000) Detecting concept drift with support vector machines. In: Proceedings of the seventeenth international conference on machine learning, ser. ICML ’00. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, pp 487–494

16.

Domingos P, Hulten G (2000) Mining high-speed data streams. In: Proceedings of the sixth ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 71–80

17.

Jaber G, Cornuéjols A, Tarroux P (2013) Online learning: Searching for the best forgetting strategy under concept drift. In: Neural information processing. Springer, pp 400–408

18.

Bifet A, Holmes G, Pfahringer B (2010) Leveraging bagging for evolving data streams. In: Machine learning and knowledge discovery in databases. Springer, pp 135–150

19.

Oza NC (2005) Online bagging and boosting. In: 2005 IEEE international conference on systems, man and cybernetics, vol 3, pp 2340–2345

20.

Freund Y, Schapire R, Abe N (1999) A short introduction to boosting. J Jpn Soc Artif Intell 14(771–780):1612

21.

Hammer B, Hasenfuss A (2010) Topographic mapping of large dissimilarity data sets. Neural Comput 22(9):2229–2284MathSciNetCrossRefMATH

22.

Alex N, Hasenfuss A, Hammer B (2009) Patch clustering for massive data sets. Neurocomputing 72(7–9):1455–1469CrossRef

23.

Loeffel PX, Marsala C, Detyniecki M (2015) Classification with a reject option under concept drift: the droplets algorithm. In: 2015 IEEE international conference on data science and advanced analytics (DSAA), Oct 2015, pp 1–9

24.

Zhang P, Gao BJ, Zhu X, Guo L (2011) Enabling fast lazy learning for data streams. In: Proceedings of the 2011 IEEE 11th international conference on data mining, ser. ICDM ’11. IEEE Computer Society, Washington, DC, USA, pp 932–941

25.

Law Y-N, Zaniolo C (2005) An adaptive nearest neighbor classification algorithm for data streams. In: Knowledge discovery in databases: PKDD 2005. Springer, pp 108–120

26.

Xioufis ES, Spiliopoulou M, Tsoumakas G, Vlahavas I (2011) Dealing with concept drift and class imbalance in multi-label stream classification. In: Proceedings of the twenty-second international joint conference on artificial intelligence. vol 2, ser. IJCAI’11. AAAI Press, pp 1583–1588

27.

Atkinson R, Shiffrin R (1968) Human memory: a proposed system and its control processes. Psychol Learn Motiv 2:89–195CrossRef

28.

Dudai Y (2004) The neurobiology of consolidations, or, how stable is the engram? Annu Rev Psychol 55:51–86CrossRef

29.

Miller GA (1956) The magical number seven, plus or minus two: some limits on our capacity for processing information. Psychol Rev 63(2):81CrossRef

30.

Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M (2011) Scikit-learn: machine learning in Python. JMLR 12:2825–2830MathSciNetMATH

31.

Arthur D, Vassilvitskii S (2007) k-means++: The advantages of careful seeding. In: Proceedings of the eighteenth annual ACM-SIAM symposium on discrete algorithms. Society for Industrial and Applied Mathematics, pp 1027–1035

32.

Bifet A, Holmes G, Kirkby R, Pfahringer B (2010) MOA: massive online analysis. J Mach Learn Res 11:1601–1604

33.

Street WN, Kim Y (2001) A streaming ensemble algorithm (sea) for large-scale classification. In: Proceedings of the seventh ACM SIGKDD international conference on knowledge discovery and data mining, ser. KDD ’01. ACM, New York, NY, USA, pp 377–382

34.

Harries M (1999) Splice-2 comparative evaluation: Electricity pricing. Technical report, University of New South Wales

35.

Baena-Garcıa M, del Campo-Ávila J, Fidalgo R, Bifet A, Gavalda R, Morales-Bueno R (2006) Early drift detection method. In: Fourth international workshop on knowledge discovery from data streams, vol 6, pp 77–86

36.

Kuncheva LI, Plumpton CO (2008) Adaptive learning rate for online linear discriminant classifiers. In: Structural, syntactic, and statistical pattern recognition. Springer, pp 510–519

37.

Zliobaite I (2013) How good is the electricity benchmark for evaluating concept drift adaptation. CoRR arxiv:1301.3524

38.

Gama J, Rocha R, Medas P (2003) Accurate decision trees for mining high-speed data streams. In: Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 523–528

39.

Oza NC, Russell S (2001) Experimental comparisons of online and batch versions of bagging and boosting. In: Proceedings of the seventh ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 359–364

40.

Losing V, Hammer B, Wersing H (2015) Interactive online learning for obstacle classification on a mobile robot. In: 2015 international joint conference on neural networks (IJCNN). IEEE, pp 1–8

41.

Wilcox RR (2012) Introduction to robust estimation and hypothesis testing. Academic Press, LondonMATH

42.

Mitchell TM (1997) Machine learning, 1st edn. McGraw-Hill Inc., New York

43.

LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324

44.

Bonferroni CE (1936) Teoria statistica delle classi e calcolo delle probabilita. Libreria internazionale Seeber

45.

Breiman L, Friedman J, Olshen R, Stone C (1984) Classification and regression trees. Wadsworth and Brooks, MontereyMATH

46.

Yianilos PN (1993) Data structures and algorithms for nearest neighbor search in general metric spaces. In: Proceedings of the fourth annual ACM-SIAM symposium on discrete algorithms, ser. SODA ’93. Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, pp 311–321

Titel: Tackling heterogeneous concept drift with the Self-Adjusting Memory (SAM)
verfasst von: Viktor Losing
Barbara Hammer
Heiko Wersing
Publikationsdatum: 01.12.2017
Verlag: Springer London
Erschienen in: Knowledge and Information Systems / Ausgabe 1/2018
Print ISSN: 0219-1377
Elektronische ISSN: 0219-3116
DOI: https://doi.org/10.1007/s10115-017-1137-y

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Weitere Artikel der Ausgabe 1/2018

Differentially private counting of users’ spatial regions

Resling: a scalable and generic framework to mine top-k representative subgraph patterns

Auditing black-box models for indirect influence

Iterative column subset selection

Speeding up dynamic time warping distance for sparse time series data

Binary classifier calibration using an ensemble of piecewise linear regression models