Skip to main content
Top

2018 | OriginalPaper | Chapter

Combining Active Learning and Self-Labeling for Data Stream Mining

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Data stream mining is among the most vital contemporary data science challenges. In this work we concentrate on the issue of actual availability of true class labels. Assumption that the ground truth for each instance becomes known right after processing it is far from being realistic, due to usually high costs connected with its acquisition. Active learning is an attractive solution to this problem, as it selects most valuable instances for labeling. In this paper, we propose to augment the active learning module with self-labeling approach. This allows classifier to automatically label instances for which it displays the highest certainty and use them for further training. Although in this preliminary work we use a static threshold for self-labeling, the obtained results are encouraging. Our experimental study shows that this approach complements the active learning strategy and allows to improve data stream classification, especially in scenarios with very small labeling budget.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Abdallah, Z.S., Gaber, M.M., Srinivasan, B., Krishnaswamy, S.: Anynovel: detection of novel concepts in evolving data streams. Evolving Syst. 7(2), 73–93 (2016)CrossRef Abdallah, Z.S., Gaber, M.M., Srinivasan, B., Krishnaswamy, S.: Anynovel: detection of novel concepts in evolving data streams. Evolving Syst. 7(2), 73–93 (2016)CrossRef
2.
go back to reference Aggarwal, C.C., Kong, X., Gu, Q., Han, J., Yu, P.S.: Active learning: a survey. In: Data Classification: Algorithms and Applications, pp. 571–606 (2014) Aggarwal, C.C., Kong, X., Gu, Q., Han, J., Yu, P.S.: Active learning: a survey. In: Data Classification: Algorithms and Applications, pp. 571–606 (2014)
3.
go back to reference Bifet, A., de Francisci Morales, G., Read, J., Holmes, G., Pfahringer, B.: Efficient online evaluation of big data stream classifiers. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 59–68 (2015) Bifet, A., de Francisci Morales, G., Read, J., Holmes, G., Pfahringer, B.: Efficient online evaluation of big data stream classifiers. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 59–68 (2015)
4.
go back to reference Bifet, A., Gavaldà, R.: Adaptive Learning from Evolving Data Streams, pp. 249–260 (2009) Bifet, A., Gavaldà, R.: Adaptive Learning from Evolving Data Streams, pp. 249–260 (2009)
5.
go back to reference Cano, A., Zafra, A., Ventura, S.: Parallel evaluation of pittsburgh rule-based classifiers on gpus. Neurocomputing 126, 45–57 (2014)CrossRef Cano, A., Zafra, A., Ventura, S.: Parallel evaluation of pittsburgh rule-based classifiers on gpus. Neurocomputing 126, 45–57 (2014)CrossRef
6.
go back to reference Czarnecki, W.M., Tabor, J.: Online extreme entropy machines for streams classification and active learning. In: Proceedings of the 9th International Conference on Computer Recognition Systems CORES 2015, Wroclaw, Poland, 25–27 May 2015, pp. 371–381 (2015) Czarnecki, W.M., Tabor, J.: Online extreme entropy machines for streams classification and active learning. In: Proceedings of the 9th International Conference on Computer Recognition Systems CORES 2015, Wroclaw, Poland, 25–27 May 2015, pp. 371–381 (2015)
7.
go back to reference Gama, J.A., Žliobaitė, I., Bifet, A., Pechenizkiy, M., Bouchachia, A.: A survey on concept drift adaptation. ACM Comput. Surv. 46, 1–37 (2014) Gama, J.A., Žliobaitė, I., Bifet, A., Pechenizkiy, M., Bouchachia, A.: A survey on concept drift adaptation. ACM Comput. Surv. 46, 1–37 (2014)
8.
go back to reference Krawczyk, B., Minku, L.L., Gama, J., Stefanowski, J., Woźniak, M.: Ensemble learning for data stream analysis: a survey. Inf. Fusion 37, 132–156 (2017)CrossRef Krawczyk, B., Minku, L.L., Gama, J., Stefanowski, J., Woźniak, M.: Ensemble learning for data stream analysis: a survey. Inf. Fusion 37, 132–156 (2017)CrossRef
9.
go back to reference Nguyen, H., Ng, W.K., Woon, Y.: Concurrent semi-supervised learning with active learning of data streams. Trans. Large-Scale Data Knowl.-Centered Syst. 8, 113–136 (2013) Nguyen, H., Ng, W.K., Woon, Y.: Concurrent semi-supervised learning with active learning of data streams. Trans. Large-Scale Data Knowl.-Centered Syst. 8, 113–136 (2013)
10.
go back to reference Settles, B.: Active learning literature survey. Computer Sciences Technical report. University of Wisconsin-Madison (2009) Settles, B.: Active learning literature survey. Computer Sciences Technical report. University of Wisconsin-Madison (2009)
11.
go back to reference Triguero, I., García, S., Herrera, F.: Self-labeled techniques for semi-supervised learning: taxonomy, software and empirical study. Knowl. Inf. Syst. 42(2), 245–284 (2015)CrossRef Triguero, I., García, S., Herrera, F.: Self-labeled techniques for semi-supervised learning: taxonomy, software and empirical study. Knowl. Inf. Syst. 42(2), 245–284 (2015)CrossRef
12.
go back to reference Žliobaitė, I., Bifet, A., Pfahringer, B., Holmes, G.: Active learning with drifting streaming data. IEEE Trans. Neural Netw. Learn. Syst. 1, 27–39 (2014) Žliobaitė, I., Bifet, A., Pfahringer, B., Holmes, G.: Active learning with drifting streaming data. IEEE Trans. Neural Netw. Learn. Syst. 1, 27–39 (2014)
13.
go back to reference Woźniak, M.: A hybrid decision tree training method using data streams. Knowl. Inf. Syst. 29(2), 335–347 (2011)CrossRef Woźniak, M.: A hybrid decision tree training method using data streams. Knowl. Inf. Syst. 29(2), 335–347 (2011)CrossRef
14.
go back to reference Woźniak, M., Ksieniewicz, P., Cyganek, B., Kasprzak, A., Walkowiak, K.: Active learning classification of drifted streaming data. In: International Conference on Computational Science 2016, ICCS 2016, 6–8 June 2016, San Diego, California, USA, pp. 1724–1733 (2016) Woźniak, M., Ksieniewicz, P., Cyganek, B., Kasprzak, A., Walkowiak, K.: Active learning classification of drifted streaming data. In: International Conference on Computational Science 2016, ICCS 2016, 6–8 June 2016, San Diego, California, USA, pp. 1724–1733 (2016)
15.
go back to reference Woźniak, M., Ksieniewicz, P., Cyganek, B., Walkowiak, K.: Ensembles of heterogeneous concept drift detectors - experimental study. In: Computer Information Systems and Industrial Management - 15th IFIP TC8 International Conference, CISIM 2016, Vilnius, Lithuania, September 14-16, 2016, Proceedings, pp. 538–549 (2016) Woźniak, M., Ksieniewicz, P., Cyganek, B., Walkowiak, K.: Ensembles of heterogeneous concept drift detectors - experimental study. In: Computer Information Systems and Industrial Management - 15th IFIP TC8 International Conference, CISIM 2016, Vilnius, Lithuania, September 14-16, 2016, Proceedings, pp. 538–549 (2016)
Metadata
Title
Combining Active Learning and Self-Labeling for Data Stream Mining
Authors
Łukasz Korycki
Bartosz Krawczyk
Copyright Year
2018
DOI
https://doi.org/10.1007/978-3-319-59162-9_50

Premium Partner