Skip to main content
Top

2019 | OriginalPaper | Chapter

Concept Drift Based Multi-dimensional Data Streams Sampling Method

Authors : Ling Lin, Xiaolong Qi, Zhirui Zhu, Yang Gao

Published in: Advances in Knowledge Discovery and Data Mining

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

A summary can immensely reduce the time and space complexity of an algorithm. This concept is considered a research hotspot in the field of data stream mining. Data streams are characterized as having continuous data arrival, rapid speed, large scale, and cannot be completely stored in memory simultaneously. A summary is often formed in the memory to approximate the database query or data mining task. A sampling technique is a commonly used method for constructing data stream summaries. Traditional simple random sampling algorithms do not consider the conceptual drift of data distributions that change over time. Therefore, a challenging task is sampling the summary of the data distribution in multi-dimensional data streams of a concept drift. This study proposes a sampling algorithm that ensures the consistency of the data distribution with the data streams of the concept drift. First, probability statistics is used on the data stream cells in the reference window to obtain data distribution. A probability sampling is performed on the basis of this distribution. Second, the sliding window is used to continuously detect whether the data distribution has changed. If the data distribution does not change, then the original sampling data are maintained. Otherwise, the data distribution in the statistical window is restated to form a new sampling probability. The proposed algorithm ensures that the data distribution in the data profile is continually consistent with the population distribution. We compare our algorithm with the state-of-the-art algorithms on synthetic and real data sets. Experimental results demonstrate the effectiveness of our algorithm.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Agarwal, P.K., Cormode, G., Huang, Z., Phillips, J.M., Wei, Z., Yi, K.: Mergeable summaries. ACM Trans. Database Syst. (TODS) 38(4), 26 (2013)MathSciNetCrossRefMATH Agarwal, P.K., Cormode, G., Huang, Z., Phillips, J.M., Wei, Z., Yi, K.: Mergeable summaries. ACM Trans. Database Syst. (TODS) 38(4), 26 (2013)MathSciNetCrossRefMATH
2.
go back to reference Rivetti, N., Busnel, Y., Mostefaoui, A.: Efficiently summarizing data streams over sliding windows. In: 2015 IEEE 14th International Symposium on Network Computing and Applications (NCA), pp. 151–158. IEEE (2015) Rivetti, N., Busnel, Y., Mostefaoui, A.: Efficiently summarizing data streams over sliding windows. In: 2015 IEEE 14th International Symposium on Network Computing and Applications (NCA), pp. 151–158. IEEE (2015)
3.
go back to reference Cormode, G., Duffield, N.: Sampling for big data: a tutorial. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1975–1975. ACM (2014) Cormode, G., Duffield, N.: Sampling for big data: a tutorial. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1975–1975. ACM (2014)
5.
go back to reference Al-Kateb, M., Lee, B.S., Wang, X.S.: Adaptive-size reservoir sampling over data streams. In: 19th International Conference on Scientific and Statistical Database Management, p. 22. IEEE (2007) Al-Kateb, M., Lee, B.S., Wang, X.S.: Adaptive-size reservoir sampling over data streams. In: 19th International Conference on Scientific and Statistical Database Management, p. 22. IEEE (2007)
6.
go back to reference Babcock, B., Datar, M., Motwani, R.: Sampling from a moving window over streaming data. In: Proceedings of the Thirteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 633–634. Society for Industrial and Applied Mathematics (2002) Babcock, B., Datar, M., Motwani, R.: Sampling from a moving window over streaming data. In: Proceedings of the Thirteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 633–634. Society for Industrial and Applied Mathematics (2002)
7.
go back to reference Song, X., Wu, M., Jermaine, C., Ranka, S.: Statistical change detection for multi-dimensional data. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 667–676. ACM (2007) Song, X., Wu, M., Jermaine, C., Ranka, S.: Statistical change detection for multi-dimensional data. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 667–676. ACM (2007)
8.
go back to reference Qahtan, A.A., Alharbi, B., Wang, S., Zhang, X.: A PCA-based change detection framework for multidimensional data streams: change detection in multidimensional data streams. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 935–944. ACM (2015) Qahtan, A.A., Alharbi, B., Wang, S., Zhang, X.: A PCA-based change detection framework for multidimensional data streams: change detection in multidimensional data streams. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 935–944. ACM (2015)
9.
go back to reference Ahmed, M.: Data summarization: a survey. Knowl. Inf. Syst. 58, 1–25 (2018) Ahmed, M.: Data summarization: a survey. Knowl. Inf. Syst. 58, 1–25 (2018)
11.
go back to reference Gibbons, P.B., Matias, Y.: New sampling-based summary statistics for improving approximate query answers. In: ACM SIGMOD Record, vol. 27, no. 2, pp. 331–342. ACM (1998) Gibbons, P.B., Matias, Y.: New sampling-based summary statistics for improving approximate query answers. In: ACM SIGMOD Record, vol. 27, no. 2, pp. 331–342. ACM (1998)
12.
go back to reference Zhang, J., Xu, J., Liao, S.S.: Sampling methods for summarizing unordered vehicle-to-vehicle data streams. Transp. Res. Part C: Emerg. Technol. 23, 56–67 (2012)CrossRef Zhang, J., Xu, J., Liao, S.S.: Sampling methods for summarizing unordered vehicle-to-vehicle data streams. Transp. Res. Part C: Emerg. Technol. 23, 56–67 (2012)CrossRef
13.
go back to reference Chuang, K.-T., Chen, H.-L., Chen, M.-S.: Feature-preserved sampling over streaming data. ACM Trans. Knowl. Discov. Data (TKDD) 2(4), 15 (2009) Chuang, K.-T., Chen, H.-L., Chen, M.-S.: Feature-preserved sampling over streaming data. ACM Trans. Knowl. Discov. Data (TKDD) 2(4), 15 (2009)
14.
go back to reference Tillé, Y.: Sampling algorithms. In: Lovric, M. (ed.) International Encyclopedia of Statistical Science, pp. 1273–1274. Springer, Heidelberg (2011)CrossRef Tillé, Y.: Sampling algorithms. In: Lovric, M. (ed.) International Encyclopedia of Statistical Science, pp. 1273–1274. Springer, Heidelberg (2011)CrossRef
15.
go back to reference Al-Kateb, M., Lee, B.S.: Adaptive stratified reservoir sampling over heterogeneous data streams. Inf. Syst. 39, 199–216 (2014)CrossRef Al-Kateb, M., Lee, B.S.: Adaptive stratified reservoir sampling over heterogeneous data streams. Inf. Syst. 39, 199–216 (2014)CrossRef
16.
go back to reference Zhang, X., Furtlehner, C., Germain-Renaud, C., Sebag, M.: Data stream clustering with affinity propagation. IEEE Trans. Knowl. Data Eng. 26(7), 1644–1656 (2014)CrossRef Zhang, X., Furtlehner, C., Germain-Renaud, C., Sebag, M.: Data stream clustering with affinity propagation. IEEE Trans. Knowl. Data Eng. 26(7), 1644–1656 (2014)CrossRef
Metadata
Title
Concept Drift Based Multi-dimensional Data Streams Sampling Method
Authors
Ling Lin
Xiaolong Qi
Zhirui Zhu
Yang Gao
Copyright Year
2019
DOI
https://doi.org/10.1007/978-3-030-16148-4_26

Premium Partner