Skip to main content
Top

2016 | OriginalPaper | Chapter

Outlier Detection and Elimination in Stream Data – An Experimental Approach

Authors : Mateusz Kalisch, Marcin Michalak, Piotr Przystałka, Marek Sikora, Łukasz Wróbel

Published in: Rough Sets

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

In the paper the issue of outlier detection and substitution (correction) in stream data is raised. The previous research showed that even a small number of outliers in the data influences the prediction model application quality in a significant way. In this paper we try to find a proper complex method of outliers proceeding for stream data. The procedure consists of a method of outlier detection, a statistic used for the outstanding values replacement, a historic horizon for the replacing value calculation. To find the best strategy, a wide grid of experiments were prepared. All experiments were performed on semi–artificial data: data coming from the underground coal mining environment with an artificially introduced dependent variable and randomly introduced outliers. In the paper a new approach for the local outlier correction is presented, that in several cases improved the classification quality.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
1
In case of exceeding the range of the variable, an appropriate boundary value was used.
 
Literature
1.
go back to reference Abadi, D., Carney, D., Çetintemel, U., et al.: Aurora: a new model and architecture for data stream management. VLDB J. 12(2), 120–139 (2003)CrossRef Abadi, D., Carney, D., Çetintemel, U., et al.: Aurora: a new model and architecture for data stream management. VLDB J. 12(2), 120–139 (2003)CrossRef
2.
go back to reference Arvind, A., Brian, B., Shivnath, B., John, C., Keith, I., Rajeev, M., Utkarsh, S., Jennifer, W.: Stream: The stanford data stream management system (2004) Arvind, A., Brian, B., Shivnath, B., John, C., Keith, I., Rajeev, M., Utkarsh, S., Jennifer, W.: Stream: The stanford data stream management system (2004)
3.
go back to reference Breunig, M., Kriegel, H.P., Ng, R., Sander, J.: LOF: identifying density-based local outliers. In: Proceedings of ACM SIGMOD International Conference on Management of Data, pp. 93–104 (2000) Breunig, M., Kriegel, H.P., Ng, R., Sander, J.: LOF: identifying density-based local outliers. In: Proceedings of ACM SIGMOD International Conference on Management of Data, pp. 93–104 (2000)
4.
go back to reference Chandrasekaran, S., Cooper, O., Deshpande, A., et al.: TelegraphCQ: continuous dataflow processing. In: Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data, p. 668 (2003) Chandrasekaran, S., Cooper, O., Deshpande, A., et al.: TelegraphCQ: continuous dataflow processing. In: Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data, p. 668 (2003)
5.
6.
go back to reference Gupta, M., Gao, J., Aggarwal, C., Han, J.: Outlier detection for temporal data: a survey. IEEE Trans. Knowl. Data Eng. 26(9), 2250–2267 (2014)MathSciNetCrossRefMATH Gupta, M., Gao, J., Aggarwal, C., Han, J.: Outlier detection for temporal data: a survey. IEEE Trans. Knowl. Data Eng. 26(9), 2250–2267 (2014)MathSciNetCrossRefMATH
7.
go back to reference Halatchev, M., Gruenwald, L.: Estimating missing values in related sensor data streams. In: Haritsa, J., Vijayaraman, T. (eds.) COMAD, pp. 83–94 (2005) Halatchev, M., Gruenwald, L.: Estimating missing values in related sensor data streams. In: Haritsa, J., Vijayaraman, T. (eds.) COMAD, pp. 83–94 (2005)
8.
go back to reference Hodge, V., Austin, J.: A survey of outlier detection methodologies. Artif. Intell. Rev. 22(2), 85–126 (2004)CrossRefMATH Hodge, V., Austin, J.: A survey of outlier detection methodologies. Artif. Intell. Rev. 22(2), 85–126 (2004)CrossRefMATH
9.
go back to reference Kalisch, M., Michalak, M., Sikora, M., Wróbel, Ł., Przystałka, P.: Influence of outliers introduction on predictive models quality. Commun. Comput. Inf. Sci. 613, 79–93 (2016)CrossRef Kalisch, M., Michalak, M., Sikora, M., Wróbel, Ł., Przystałka, P.: Influence of outliers introduction on predictive models quality. Commun. Comput. Inf. Sci. 613, 79–93 (2016)CrossRef
10.
go back to reference Kalisch, M., Michalak, M., Sikora, M., Wróbel, Ł., Przystałka, P.: Data intensive vs sliding window outlier detection in the stream data — an experimental approach. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds.) ICAISC 2016. LNCS (LNAI), vol. 9693, pp. 73–87. Springer, Heidelberg (2016). doi:10.1007/978-3-319-39384-1_7 CrossRef Kalisch, M., Michalak, M., Sikora, M., Wróbel, Ł., Przystałka, P.: Data intensive vs sliding window outlier detection in the stream data — an experimental approach. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds.) ICAISC 2016. LNCS (LNAI), vol. 9693, pp. 73–87. Springer, Heidelberg (2016). doi:10.​1007/​978-3-319-39384-1_​7 CrossRef
11.
go back to reference Kuna, H., Garcia-Martinez, R., Villatoro, F.: Outlier detection in audit logs for application systems. Inf. Syst. 44, 22–33 (2014)CrossRef Kuna, H., Garcia-Martinez, R., Villatoro, F.: Outlier detection in audit logs for application systems. Inf. Syst. 44, 22–33 (2014)CrossRef
12.
go back to reference Pigott, T.: A review of methods for missing data. Educ. Res. Eval. 7(4), 353–383 (2001)CrossRef Pigott, T.: A review of methods for missing data. Educ. Res. Eval. 7(4), 353–383 (2001)CrossRef
13.
go back to reference Ramaswamy, S., Rastogi, R., Shim, K.: Efficient algorithms for mining outliers from large data sets. In: Proceedings of ACM SIGMOD International Conference on Management of Data, pp. 427–438 (2000) Ramaswamy, S., Rastogi, R., Shim, K.: Efficient algorithms for mining outliers from large data sets. In: Proceedings of ACM SIGMOD International Conference on Management of Data, pp. 427–438 (2000)
14.
go back to reference Sadik, S., Gruenwald, L.: Research issues in outlier detection for data streams. ACM SIGKDD Explor. Newsl. 1(15), 33–40 (2013) Sadik, S., Gruenwald, L.: Research issues in outlier detection for data streams. ACM SIGKDD Explor. Newsl. 1(15), 33–40 (2013)
Metadata
Title
Outlier Detection and Elimination in Stream Data – An Experimental Approach
Authors
Mateusz Kalisch
Marcin Michalak
Piotr Przystałka
Marek Sikora
Łukasz Wróbel
Copyright Year
2016
DOI
https://doi.org/10.1007/978-3-319-47160-0_38

Premium Partner