Skip to main content
Top

2024 | OriginalPaper | Chapter

A Metaheuristic-Based Subspace Search Approach for Outlier Detection in High-Dimensional Data Streams

Authors : Imen Souiden, Zaki Brahmi, Mohamed Nazih Omri

Published in: Advancements in Architectural, Engineering, and Construction Research and Practice

Publisher: Springer Nature Switzerland

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

The continuous progress in technology is leading to the widespread existence of data streams with high dimensions. Identifying outliers in this particular scenario presents a notably difficult task. The unique characteristics of data streams, combined with the effect of the dimensionality curse in high-dimensional space, create constrained mining requirements, and a current challenge is to simultaneously address them. A common approach to handle high dimensionality is to identify outliers only within subspaces of space of features that contain interesting knowledge, where outliers are typically found. However, in the realm of data streams, this area of study has not been well explored. In this article, our objective is to discover interesting subspaces for outlier detection while accommodating the needs of data streams, including limited time and memory, and addressing the adaptation to data changes (concept drift), as well as providing better performance than the closely related approaches. In this context, we used a metaheuristic-based approach (Adapted Binary Gravitational Search algorithm) to discover high-contrast subspaces comprised of independent features, within which the outlier detection will be performed. To deal with data streams, we adopted the sliding window structure together with a modified version of the N-Dimensional Kolmogorov–Smirnov WindoWin (NDKSWIN) concept drift detector. We conducted experiments on both synthetic and real-world data and the results demonstrated its effectiveness and superiority over the competitors.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
go back to reference Almusallam NY, Tari ZA, Bertok P, Zomaya AY (2017) Dimensionality reduction for intrusion detection systems in multi-data streams—a review and proposal of unsupervised feature selection scheme. Emergent Computation: a Festschrift for Selim G. Akl, pp 467−487 Almusallam NY, Tari ZA, Bertok P, Zomaya AY (2017) Dimensionality reduction for intrusion detection systems in multi-data streams—a review and proposal of unsupervised feature selection scheme. Emergent Computation: a Festschrift for Selim G. Akl, pp 467−487
go back to reference Bhatia S, Jain A, Li P, Kumar R, Hooi B (2021) MStream: fast anomaly detection in multi-aspect streams. In: Proceedings of the web conference 2021. pp 3371–3382 Bhatia S, Jain A, Li P, Kumar R, Hooi B (2021) MStream: fast anomaly detection in multi-aspect streams. In: Proceedings of the web conference 2021. pp 3371–3382
go back to reference Bhatia S, Jain A, Srivastava S, Kawaguchi K, Hooi B (2022) Memstream: memory-based streaming anomaly detection. In: Proceedings of the ACM web conference 2022. pp 610−621 Bhatia S, Jain A, Srivastava S, Kawaguchi K, Hooi B (2022) Memstream: memory-based streaming anomaly detection. In: Proceedings of the ACM web conference 2022. pp 610−621
go back to reference Bhushan A, Sharker MH, Karimi HA (2015) Incremental principal component analysis based outlier detection methods for spatiotemporal data streams. ISPRS Ann Photogramm, Remote Sens Spat Inf Sci 2:67–71CrossRef Bhushan A, Sharker MH, Karimi HA (2015) Incremental principal component analysis based outlier detection methods for spatiotemporal data streams. ISPRS Ann Photogramm, Remote Sens Spat Inf Sci 2:67–71CrossRef
go back to reference Bifet A, Gavalda R (2009) Adaptive learning from evolving data streams. In: Advances in intelligent data analysis VIII: 8th international symposium on intelligent data analysis. pp 249−260 Bifet A, Gavalda R (2009) Adaptive learning from evolving data streams. In: Advances in intelligent data analysis VIII: 8th international symposium on intelligent data analysis. pp 249−260
go back to reference Breunig MM, Kriegel HP, Ng RT, Sander J (2000) LOF: identifying density-based local outliers. In: Proceedings of the 2000 ACM SIGMOD international conference on management of data, pp 93–104 Breunig MM, Kriegel HP, Ng RT, Sander J (2000) LOF: identifying density-based local outliers. In: Proceedings of the 2000 ACM SIGMOD international conference on management of data, pp 93–104
go back to reference Dehghani A, Sarbishei O, Glatard T, Shihab E (2019) A quantitative comparison of overlapping and non-overlapping sliding windows for human activity recognition using inertial sensors. Sensors 5026 Dehghani A, Sarbishei O, Glatard T, Shihab E (2019) A quantitative comparison of overlapping and non-overlapping sliding windows for human activity recognition using inertial sensors. Sensors 5026
go back to reference Fkih F, Omri MN, others (2012) Learning the size of the sliding window for the collocations extraction: a ROC-based approach. In: Proceedings of the 2012 international conference on artificial intelligence (ICAI’12), pp 1071–1077 Fkih F, Omri MN, others (2012) Learning the size of the sliding window for the collocations extraction: a ROC-based approach. In: Proceedings of the 2012 international conference on artificial intelligence (ICAI’12), pp 1071–1077
go back to reference Fouché E, Böhm K (2019) Monte Carlo dependency estimation. In: Proceedings of the 31st international conference on scientific and statistical database management, pp 13–24 Fouché E, Böhm K (2019) Monte Carlo dependency estimation. In: Proceedings of the 31st international conference on scientific and statistical database management, pp 13–24
go back to reference Fouché E, Komiyama J, Böhm K (2019) Scaling multi-armed bandit algorithms. In: Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining. pp 1449–1459 Fouché E, Komiyama J, Böhm K (2019) Scaling multi-armed bandit algorithms. In: Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining. pp 1449–1459
go back to reference Khalique V, Kitagawa H (2021) VOA*: fast angle-based outlier detection over high-dimensional data streams. In: Pacific-Asia conference on knowledge discovery and data mining. pp 40−52 Khalique V, Kitagawa H (2021) VOA*: fast angle-based outlier detection over high-dimensional data streams. In: Pacific-Asia conference on knowledge discovery and data mining. pp 40−52
go back to reference Lazarevic A, Kumar V (2005) Feature bagging for outlier detection. In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery in data mining. pp 157–166 Lazarevic A, Kumar V (2005) Feature bagging for outlier detection. In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery in data mining. pp 157–166
go back to reference Manzoor E, Lamba H, Akoglu L (2018) Xstream: outlier detection in feature-evolving data streams. In: Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery and data mining. pp 1963−1972 Manzoor E, Lamba H, Akoglu L (2018) Xstream: outlier detection in feature-evolving data streams. In: Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery and data mining. pp 1963−1972
go back to reference Sathe S, Aggarwal CC (2016) Subspace outlier detection in linear time with randomized hashing. In: 2016 IEEE 16th international conference on data mining. pp 459–468 Sathe S, Aggarwal CC (2016) Subspace outlier detection in linear time with randomized hashing. In: 2016 IEEE 16th international conference on data mining. pp 459–468
go back to reference Sathe S, Aggarwal CC (2018) Subspace histograms for outlier detection in linear time. Knowl Inf Syst 1–25 Sathe S, Aggarwal CC (2018) Subspace histograms for outlier detection in linear time. Knowl Inf Syst 1–25
go back to reference Siegel S, Jr C, John N (1988) Nonparametric statistics for the behavioral sciences, 2nd edn. Nonparametric statistics for the behavioral sciences, Second, McGraw-Hill, New York Siegel S, Jr C, John N (1988) Nonparametric statistics for the behavioral sciences, 2nd edn. Nonparametric statistics for the behavioral sciences, Second, McGraw-Hill, New York
go back to reference Souiden I, Brahmi Z, Omri MN (2022a) Binary gravitational subspace search for outlier detection in high dimensional data streams. In: International conference on advanced data mining and applications. pp 157–169 Souiden I, Brahmi Z, Omri MN (2022a) Binary gravitational subspace search for outlier detection in high dimensional data streams. In: International conference on advanced data mining and applications. pp 157–169
go back to reference Vanea A, Emmanuel M, Keller F, Klemens B (2012) Instant selection of high contrast projections in multi-dimensional data streams. In: Proceedings of the workshop on instant interactive data mining (IID 2012) in conjunction with ECML PKDD Vanea A, Emmanuel M, Keller F, Klemens B (2012) Instant selection of high contrast projections in multi-dimensional data streams. In: Proceedings of the workshop on instant interactive data mining (IID 2012) in conjunction with ECML PKDD
go back to reference Yang D, Wang Y, Li Y, Ma X (2016) A variable Markovian based outlier detection method for multi-dimensional sequence over data stream. In: 2016 17th international conference on parallel and distributed computing, applications and technologies. pp 183–188 Yang D, Wang Y, Li Y, Ma X (2016) A variable Markovian based outlier detection method for multi-dimensional sequence over data stream. In: 2016 17th international conference on parallel and distributed computing, applications and technologies. pp 183–188
go back to reference Zhang J, Gao Q, Wang H (2008) SPOT: a system for detecting projected outliers from high-dimensional data streams. In: International conference on database and expert systems applications. pp 1628–1631 Zhang J, Gao Q, Wang H (2008) SPOT: a system for detecting projected outliers from high-dimensional data streams. In: International conference on database and expert systems applications. pp 1628–1631
go back to reference Zhang S, Ursekar V, Akoglu L (2022) Sparx: distributed outlier detection at scale. In: Proceedings of the 28th ACM SIGKDD conference on knowledge discovery and data mining. pp 4530−4540 Zhang S, Ursekar V, Akoglu L (2022) Sparx: distributed outlier detection at scale. In: Proceedings of the 28th ACM SIGKDD conference on knowledge discovery and data mining. pp 4530−4540
Metadata
Title
A Metaheuristic-Based Subspace Search Approach for Outlier Detection in High-Dimensional Data Streams
Authors
Imen Souiden
Zaki Brahmi
Mohamed Nazih Omri
Copyright Year
2024
DOI
https://doi.org/10.1007/978-3-031-59329-1_3