Skip to main content
Erschienen in: International Journal of Data Science and Analytics 3/2017

23.02.2017 | Regular Paper

Fading histograms in detecting distribution and concept changes

verfasst von: Raquel Sebastião, João Gama, Teresa Mendonça

Erschienen in: International Journal of Data Science and Analytics | Ausgabe 3/2017

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The remarkable number of real applications under dynamic scenarios is driving a novel ability to generate and gather information. Nowadays, a massive amount of information is generated at a high-speed rate, known as data streams. Moreover, data are collected under evolving environments. Due to memory restrictions, data must be promptly processed and discarded immediately. Therefore, dealing with evolving data streams raises two main questions: (i) how to remember discarded data? and (ii) how to forget outdated data? To maintain an updated representation of the time-evolving data, this paper proposes fading histograms. Regarding the dynamics of nature, changes in data are detected through a windowing scheme that compares data distributions computed by the fading histograms: the adaptive cumulative windows model (ACWM). The online monitoring of the distance between data distributions is evaluated using a dissimilarity measure based on the asymmetry of the Kullback–Leibler divergence. The experimental results support the ability of fading histograms in providing an updated representation of data. Such property works in favor of detecting distribution changes with smaller detection delay time when compared with standard histograms. With respect to the detection of concept changes, the ACWM is compared with 3 known algorithms taken from the literature, using artificial data and using public data sets, presenting better results. Furthermore, we the proposed method was extended for multidimensional and the experiments performed show the ability of the ACWM for detecting distribution changes in these settings.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Ayres-de Campos, D., Bernardes, J., Garrido, A., Marquesde-de Sà, J., Pereira-Leite, L.: Sisporto 2.0: a program for automated analysis of cardiotocograms. J. Matern. Fetal Med. 9(5), 311–318 (2000)CrossRef Ayres-de Campos, D., Bernardes, J., Garrido, A., Marquesde-de Sà, J., Pereira-Leite, L.: Sisporto 2.0: a program for automated analysis of cardiotocograms. J. Matern. Fetal Med. 9(5), 311–318 (2000)CrossRef
3.
Zurück zum Zitat Bach, S., Maloof, M.: Paired learners for concept drift. In: Eighth IEEE International Conference on Data Mining, 2008. ICDM ’08, pp. 23–32 (2008). doi:10.1109/ICDM.2008.119 Bach, S., Maloof, M.: Paired learners for concept drift. In: Eighth IEEE International Conference on Data Mining, 2008. ICDM ’08, pp. 23–32 (2008). doi:10.​1109/​ICDM.​2008.​119
4.
Zurück zum Zitat Baena-García, M., Campo-Ávila, J.D., Fidalgo, R., Bifet, A., Gavaldà, R., Morales-Bueno, R.: Early drift detection method. In: In 4th ECML PKDD International Workshop on Knowledge Discovery from Data Streams, pp. 77–86 (2006) Baena-García, M., Campo-Ávila, J.D., Fidalgo, R., Bifet, A., Gavaldà, R., Morales-Bueno, R.: Early drift detection method. In: In 4th ECML PKDD International Workshop on Knowledge Discovery from Data Streams, pp. 77–86 (2006)
5.
Zurück zum Zitat Basseville, M., Nikiforov, I.: Detection of Abrupt Changes: Theory and Applications. Prentice-Hall, Englewood Cliffs (1993) Basseville, M., Nikiforov, I.: Detection of Abrupt Changes: Theory and Applications. Prentice-Hall, Englewood Cliffs (1993)
6.
Zurück zum Zitat Berthold, M., Hand, D.J. (eds.): Intelligent Data Analysis: An Introduction, 1st edn. Springer, New York, Inc., Secaucus, NJ, USA (1999) Berthold, M., Hand, D.J. (eds.): Intelligent Data Analysis: An Introduction, 1st edn. Springer, New York, Inc., Secaucus, NJ, USA (1999)
7.
Zurück zum Zitat Bifet, A., Gavaldà, R.: Learning from time-changing data with adaptive windowing. In: In SIAM International Conference on Data Mining, Berlin, Heidelberg (2007) Bifet, A., Gavaldà, R.: Learning from time-changing data with adaptive windowing. In: In SIAM International Conference on Data Mining, Berlin, Heidelberg (2007)
8.
Zurück zum Zitat Bifet, A., Holmes, G., Kirkby, R., Pfahringer, B.: Moa: Massive online analysis. J. Mach. Learn. Res. 11, 1601–1604 (2010) Bifet, A., Holmes, G., Kirkby, R., Pfahringer, B.: Moa: Massive online analysis. J. Mach. Learn. Res. 11, 1601–1604 (2010)
9.
Zurück zum Zitat Chakrabarti, K., Garofalakis, M.N., Rastogi, R., Shim, K.: Approximate query processing using wavelets. In: Abbadi, A.E., Brodie, M.L., Chakravarthy, S., Dayal, U., Kamel, N., Schlageter, G., Whang, K.Y. (Eds.) VLDB 2000, Proceedings of 26th International Conference on Very Large Data Bases, September 10–14, 2000, Cairo, Egypt, Morgan Kaufmann, pp. 111–122 (2000) Chakrabarti, K., Garofalakis, M.N., Rastogi, R., Shim, K.: Approximate query processing using wavelets. In: Abbadi, A.E., Brodie, M.L., Chakravarthy, S., Dayal, U., Kamel, N., Schlageter, G., Whang, K.Y. (Eds.) VLDB 2000, Proceedings of 26th International Conference on Very Large Data Bases, September 10–14, 2000, Cairo, Egypt, Morgan Kaufmann, pp. 111–122 (2000)
13.
Zurück zum Zitat Dasu, T., Krishnan, S., Venkatasubramanian, S., Yi, K.: An information-theoretic approach to detecting changes in multi-dimensional data streams. In: Proceedings of the Symposium on the Interface of Statistics, Computing Science, and Applications (2006) Dasu, T., Krishnan, S., Venkatasubramanian, S., Yi, K.: An information-theoretic approach to detecting changes in multi-dimensional data streams. In: Proceedings of the Symposium on the Interface of Statistics, Computing Science, and Applications (2006)
14.
Zurück zum Zitat Gama, J.: Knowledge Discovery from Data Streams, 1st edn. Chapman & Hall/CRC, London (2010)CrossRefMATH Gama, J.: Knowledge Discovery from Data Streams, 1st edn. Chapman & Hall/CRC, London (2010)CrossRefMATH
15.
Zurück zum Zitat Gama, J., Medas, P., Castillo, G., Rodrigues, P.P.: Learning with drift detection. In: In SBIA Brazilian Symposium on Artificial Intelligence, Springer Verlag, pp. 286–295 (2004) Gama, J., Medas, P., Castillo, G., Rodrigues, P.P.: Learning with drift detection. In: In SBIA Brazilian Symposium on Artificial Intelligence, Springer Verlag, pp. 286–295 (2004)
16.
19.
Zurück zum Zitat Gonçalves, H., Bernardes, J., Paula Rocha, A., Ayres-de Campos, D.: Linear and nonlinear analysis of heart rate patterns associated with fetal behavioral states in the antepartum period. Early Hum. Dev. 83(9), 585–591 (2007)CrossRef Gonçalves, H., Bernardes, J., Paula Rocha, A., Ayres-de Campos, D.: Linear and nonlinear analysis of heart rate patterns associated with fetal behavioral states in the antepartum period. Early Hum. Dev. 83(9), 585–591 (2007)CrossRef
20.
Zurück zum Zitat Guha, S., Shim, K., Woo, J.: Rehist: Relative error histogram construction algorithms. In: Proceedings of the 30th International Conference on. Very Large Data Bases, pp. 300–311 (2004) Guha, S., Shim, K., Woo, J.: Rehist: Relative error histogram construction algorithms. In: Proceedings of the 30th International Conference on. Very Large Data Bases, pp. 300–311 (2004)
23.
Zurück zum Zitat Jagadish, H.V., Koudas, N., Muthukrishnan, S., Poosala, V., Sevcik, K.C., Suel, T.: Optimal histograms with quality guarantees. In: Proceedings of the 24rd International Conference on Very Large Data Bases, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, VLDB ’98, pp. 275–286. http://dl.acm.org/citation.cfm?id=645924.671191 (1998) Jagadish, H.V., Koudas, N., Muthukrishnan, S., Poosala, V., Sevcik, K.C., Suel, T.: Optimal histograms with quality guarantees. In: Proceedings of the 24rd International Conference on Very Large Data Bases, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, VLDB ’98, pp. 275–286. http://​dl.​acm.​org/​citation.​cfm?​id=​645924.​671191 (1998)
27.
Zurück zum Zitat Kuncheva, L.I.: Classifier ensembles for detecting concept change in streaming data: overview and perspectives. In: 2nd Workshop SUEMA 2008 (ECAI 2008), pp. 5–10. (2008) Kuncheva, L.I.: Classifier ensembles for detecting concept change in streaming data: overview and perspectives. In: 2nd Workshop SUEMA 2008 (ECAI 2008), pp. 5–10. (2008)
29.
Zurück zum Zitat MATLAB\(\textregistered \) & Simulink\(\textregistered \). Student Version R2007a. The MathWorks Inc., Natick, Massachusetts (2007) MATLAB\(\textregistered \) & Simulink\(\textregistered \). Student Version R2007a. The MathWorks Inc., Natick, Massachusetts (2007)
31.
Zurück zum Zitat Mitchell, T.M.: Machine Learning, 1st edn. McGraw-Hill Inc., New York (1997)MATH Mitchell, T.M.: Machine Learning, 1st edn. McGraw-Hill Inc., New York (1997)MATH
32.
Zurück zum Zitat Mouss, H., Mouss, D., Mouss, N., Sefouhi, L.: Test of page-hinckley, an approach for fault detection in an agroalimentary production system. In: Control Conference, 2004. 5th Asian, vol. 2, pp. 815–818 (2004) Mouss, H., Mouss, D., Mouss, N., Sefouhi, L.: Test of page-hinckley, an approach for fault detection in an agroalimentary production system. In: Control Conference, 2004. 5th Asian, vol. 2, pp. 815–818 (2004)
36.
Zurück zum Zitat Sebastião, R., Gama, J., Mendonça, T.: Comparing data distribution using fading histograms. In: ECAI 2014—21st European Conference on Artificial Intelligence, 18–22 August 2014, Prague, Czech Republic—Including Prestigious Applications of Intelligent Systems (PAIS 2014), pp. 1095–1096 (2014) Sebastião, R., Gama, J., Mendonça, T.: Comparing data distribution using fading histograms. In: ECAI 2014—21st European Conference on Artificial Intelligence, 18–22 August 2014, Prague, Czech Republic—Including Prestigious Applications of Intelligent Systems (PAIS 2014), pp. 1095–1096 (2014)
37.
38.
Zurück zum Zitat Street, W.N., Kim, Y.: A streaming ensemble algorithm (sea) for large-scale classification. In: Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM Press, pp. 377–382 (2001) Street, W.N., Kim, Y.: A streaming ensemble algorithm (sea) for large-scale classification. In: Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM Press, pp. 377–382 (2001)
Metadaten
Titel
Fading histograms in detecting distribution and concept changes
verfasst von
Raquel Sebastião
João Gama
Teresa Mendonça
Publikationsdatum
23.02.2017
Verlag
Springer International Publishing
Erschienen in
International Journal of Data Science and Analytics / Ausgabe 3/2017
Print ISSN: 2364-415X
Elektronische ISSN: 2364-4168
DOI
https://doi.org/10.1007/s41060-017-0043-4