Zum Inhalt

Finding Ghosts in Your Data

Anomaly Detection Techniques with Examples in Python

  • 2022
  • Buch
insite
SUCHEN

Über dieses Buch

Entdecken Sie Schlüsselinformationen, die im Datenlärm vergraben sind, indem Sie eine Vielzahl von Anomalie-Erkennungstechniken erlernen und die Programmiersprache Python verwenden, um einen robusten Dienst zur Erkennung von Anomalien gegen eine Vielzahl von Datentypen aufzubauen. Das Buch beginnt mit einem Überblick darüber, was Anomalien und Ausreißer sind, und verwendet die Gestaltschule der Psychologie, um zu erklären, warum Menschen von Natur aus gut darin sind, Anomalien zu erkennen. Von dort aus werden Sie sich technischen Definitionen von Anomalien zuwenden und über "Ich weiß es, wenn ich es sehe" hinausgehen, um Dinge auf eine Weise zu definieren, die Computer verstehen können. Das Herzstück des Buches ist der Aufbau eines robusten, einsetzbaren Anomalieerkennungsdienstes in Python. Sie beginnen mit einem einfachen Service zur Erkennung von Anomalien, der im Laufe des Buches um eine Vielzahl wertvoller Anomalien-Erkennungstechniken erweitert wird, die beschreibende Statistiken, Clusterbildung und Zeitreihenszenarien abdecken. Schließlich werden Sie Ihren Anomalieerkennungsdienst frontal mit einem öffentlich zugänglichen Cloud-Angebot vergleichen und sehen, wie sie abschneiden. Die Anomalieerkennungstechniken und -beispiele in diesem Buch kombinieren Psychologie, Statistik, Mathematik und Python-Programmierung in einer Weise, die für Softwareentwickler leicht zugänglich ist. Sie vermitteln Ihnen ein Verständnis davon, was Anomalien sind und warum Sie von Natur aus ein begabter Anomaliedetektor sind. Dann helfen sie Ihnen, Ihre menschlichen Techniken in Algorithmen zu übersetzen, die verwendet werden können, um Computer zu programmieren, um den Prozess zu automatisieren. Sie werden Ihren eigenen Dienst zur Erkennung von Anomalien entwickeln, ihn mit einer Vielzahl von Techniken erweitern, darunter Clustertechniken für multivariate Analysen und Zeitreihentechniken zur Beobachtung von Daten im Laufe der Zeit, und ihn frontal mit einem kommerziellen Dienst vergleichen.Was Sie lernenVerstehen Sie die Intuition hinter Anomalien Konvertieren Sie Ihre Intuition in technische Beschreibungen anomaler DatenErkennungsanomalien mit statistischen Tools, wie Verteilungen, Varianz und Standardabweichung, robuste Statistiken und interquartile ReichweiteAnomalie-Erkennungstechniken im Bereich der Cluster- und ZeitreihenanalyseArbeiten Sie mit gängigen Python-Paketen für Ausreißererkennung und Zeitreihenanalyse, wie scikit-learn, PyOD und tslearnEntwickeln Sie von Die Leser müssen keine formalen Kenntnisse der Statistik mitbringen, da das Buch auf dem Weg dorthin relevante Konzepte vorstellt.

Inhaltsverzeichnis

Frontmatter

What Is an Anomaly?

Frontmatter
Chapter 1. The Importance of Anomalies and Anomaly Detection
Abstract
Before we set off on building an anomaly detector, it is important to understand what, specifically, an anomaly is. The first part of this chapter provides us with a basic understanding of anomalies. Then, in the second part of the chapter, we will look at use cases for anomaly detection across a variety of industries. In the third part of the chapter, we enumerate the three classes of anomaly detection that we will return to throughout the book, gaining a high-level understanding of each in preparation for deeper dives later. Finally, we will end this chapter with a few notes on what we should consider when building an anomaly detector of our own.
Kevin Feasel
Chapter 2. Humans Are Pattern Matchers
Abstract
In the prior chapter, we learned about outliers, noise, and anomalies. We also made use of humans’ intuitive ability to detect anomalies. In this chapter, we will learn more about how humans interpret and process information. In particular, we will focus on the findings of the Gestalt school of psychology.
Kevin Feasel
Chapter 3. Formalizing Anomaly Detection
Abstract
In the prior chapter, we looked at the sensory tools humans have to perform outlier and anomaly detection, taking the time to understand how they work. Computers have none of these sensory tools, and so we need to use other processes to allow them to detect outliers. Specifically, we need to implement at least one of the three approaches to outlier detection in order to give a computer some process for outlier detection. This chapter will begin to lay out the first of the three approaches to outlier detection that we will cover throughout the book: the statistical approach.
Kevin Feasel

Building an Anomaly Detector

Frontmatter
Chapter 4. Laying Out the Framework
Abstract
To this point, we have focused entirely on the theoretical aspects of outlier and anomaly detection. We will still need to delve into theory on several other occasions in later chapters, but we have enough to get started on developing a proper anomaly detection service.
Kevin Feasel
Chapter 5. Building a Test Suite
Abstract
Throughout the course of the prior chapter, we started to put together an application, stubbing out a series of useful methods. In this chapter, we will put together some tests, allowing us to ensure that the changes we make throughout the book will not break existing functionality or lead to undesirable results. First, we will look at a variety of tools available to us. Then, we will cover a few tips for writing testable Python code. Finally, we will create a set of unit tests and a set of integration tests, giving us the capability to run them at any time to ensure that we maintain product quality.
Kevin Feasel
Chapter 6. Implementing the First Methods
Abstract
The prior two chapters have given us a foundation for work, with Chapter 4 laying out the fundamental API calls and stub methods and then Chapter 5 following up with unit and integration tests. In this chapter, we will build upon that foundation and begin to implement our first outlier detection tests. Following the principle of “start with simple,” we will put into place some of the easiest tests: statistics-based, univariate outlier detection tests.
Kevin Feasel
Chapter 7. Extending the Ensemble
Abstract
Chapter 6 provided us with the first three outlier detection tests. In this chapter, we will build upon the prior work and include several additional tests. We will also refactor existing code, rethink a few design choices, and wrap up the core elements of univariate statistical analysis.
Kevin Feasel
Chapter 8. Visualize the Results
Abstract
Over the course of the last two chapters, we have interacted with our outlier detection service: we have built and extended functionality, created and updated tests, and diagnosed issues by reading a lot of JSON output. In this chapter, we will provide a more user-friendly method for integrating with the outlier detector. This will come in the form of a web application.
Kevin Feasel

Multivariate Anomaly Detection

Frontmatter
Chapter 9. Clustering and Anomalies
Abstract
This first chapter of Part III also serves as a bridge between Part II, univariate analysis, and Part III, multivariate analysis. In this chapter, we will look at how we can use clustering techniques to solve the problem we ended Chapter 7 with: we can intuitively understand that in a dataset with values { 1, 2, 3, 50, 97, 98, 99 }, the value 50 is an outlier but our outlier detection engine steadfastly refuses to believe it.
Kevin Feasel
Chapter 10. Connectivity-Based Outlier Factor (COF)
Abstract
The prior chapter provided us with an introduction to clustering techniques and the use of one such clustering technique, Gaussian Mixture Models. Over the course of this chapter, we will implement a separate technique for multivariate clustering: Connectivity-Based Outlier Factor.
Kevin Feasel
Chapter 11. Local Correlation Integral (LOCI)
Abstract
Chapter 10 introduced us to one multivariate clustering technique in Connectivity-Based Outlier Factor (COF). COF is a popular technique for multivariate outlier detection, but it does come with the downside that there is little immediate guidance outside of “higher outlier scores are worse.” We lack solid guidance on what, exactly, represents a sufficiently large outlier score. In this chapter, we introduce a complementary technique to COF that does come with its own in-built guidance: Local Correlation Integral.
Kevin Feasel
Chapter 12. Copula-Based Outlier Detection (COPOD)
Abstract
So far, we have looked at clustering-based models for multivariate outlier detection. In this chapter, we will review a novel nonclustering technique that uses a concept called copulas. First, we will define what a copula is and how we can perform outlier detection using it. Then, we will implement application changes to integrate the new technique. After that, we will update our tests and website. Finally, we will wrap up Part III with some concluding notes.
Kevin Feasel

Time Series Anomaly Detection

Frontmatter
Chapter 13. Time and Anomalies
Abstract
Now that we have entered Part IV of this book, it is time to introduce an important topic in anomaly detection: time series anomaly detection. Just as with Parts II and III, we will break up time series analysis based on the number of features: first, we will deal with single-series time series anomaly detection; then, after we have a model in place for single-series time series anomaly detection, we will deal with the multi-series case.
Kevin Feasel
Chapter 14. Change Point Detection
Abstract
The prior chapter gave us an introduction to time series analysis, allowing us to gain an appreciation for how it differs from standard univariate or multivariate analysis. One of the biggest differences in time series analysis is that we often do not care about a single outlier point itself; instead, we care about changes in overall system behavior—in particular, unexpected changes in system behavior. This is where the concept of change point detection becomes important.
Kevin Feasel
Chapter 15. An Introduction to Multi-series Anomaly Detection
Abstract
Chapters 13 and 14 dealt with one form of time series anomaly detection: the single time series. With that understanding in place, we will extend our focus to situations in which there are multiple series of data that we want to analyze in conjunction rather than independently. In this chapter, we will formalize our understanding of multi-series time series, focusing on how it differs from the single series approach we took earlier. Then, we will look at four common techniques for analyzing multi-series data. Before we wrap up this chapter, we will review some common problems when trying to analyze multi-series data before we implement one algorithm in Chapter 16.
Kevin Feasel
Chapter 16. Standard Deviation of Differences (DIFFSTD)
Abstract
In the prior chapter, we learned about several pairwise techniques we can use to perform multi-series time series analysis. In this chapter, we will introduce one more such technique and will use it as the first method in a multi-series time series ensemble.
Kevin Feasel
Chapter 17. Symbolic Aggregate Approximation (SAX)
Abstract
The prior two chapters introduced a variety of calculations intended to compare two time series and gain an understanding of how far from alignment the two series are. We subsequently adapted this concept to handle multiple series by finding the mean behavior over fairly short segments of approximately seven data points. This is what we implemented as our multi-series analysis technique. In this chapter, we will take a technique that was originally designed to find recurring patterns within a single series and use it to extend our multi-series time series outlier detection engine. Along the way, we will once again create an ensemble of results and use that to inform our scoring of outliers.
Kevin Feasel

Stacking Up to the Competition

Frontmatter
Chapter 18. Configuring Azure Cognitive Services Anomaly Detector
Abstract
Over the past dozen (or so) chapters, we have created a fairly complete outlier detection engine. With this in hand, it makes sense to compare our creation to some of the best general-purpose outlier detection engines available on the market today. In this chapter, we will review a few of the many options available for general-purpose outlier detection, focusing on cloud platform providers. Then, we will configure one of these for our own use: Microsoft’s Azure Cognitive Services Anomaly Detection service.
Kevin Feasel
Chapter 19. Performing a Bake-Off
Abstract
In this final chapter of the book, we will compare our anomaly detector service to the Azure Cognitive Services Anomaly Detector service. Importantly, the intent of this chapter is not to prove conclusively which is better; it is instead an opportunity to see how two services stack up and see if our service is competitive with a commercially available option. We will begin by laying out the parameters of comparison. Then, we will perform the bake-off, comparing how the two services handle the same datasets. Finally, we will close with a few thoughts on how we can improve our anomaly detection engine.
Kevin Feasel
Backmatter
Titel
Finding Ghosts in Your Data
Verfasst von
Kevin Feasel
Copyright-Jahr
2022
Verlag
Apress
Electronic ISBN
978-1-4842-8870-2
Print ISBN
978-1-4842-8869-6
DOI
https://doi.org/10.1007/978-1-4842-8870-2

Die PDF-Dateien dieses Buches entsprechen nicht vollständig den PDF/UA-Standards, bieten jedoch eingeschränkte Bildschirmleseunterstützung, beschriebene nicht-textuelle Inhalte (Bilder, Grafiken), Lesezeichen zur einfachen Navigation sowie durchsuchbaren und auswählbaren Text. Nutzer von unterstützenden Technologien können Schwierigkeiten bei der Navigation oder Interpretation der Inhalte in diesem Dokument haben. Wir sind uns der Bedeutung von Barrierefreiheit bewusst und freuen uns über Anfragen zur Barrierefreiheit unserer Produkte. Bei Fragen oder Bedarf an Barrierefreiheit kontaktieren Sie uns bitte unter accessibilitysupport@springernature.com

    Bildnachweise
    AvePoint Deutschland GmbH/© AvePoint Deutschland GmbH, NTT Data/© NTT Data, Wildix/© Wildix, arvato Systems GmbH/© arvato Systems GmbH, Ninox Software GmbH/© Ninox Software GmbH, Nagarro GmbH/© Nagarro GmbH, GWS mbH/© GWS mbH, CELONIS Labs GmbH, USU GmbH/© USU GmbH, G Data CyberDefense/© G Data CyberDefense, FAST LTA/© FAST LTA, Vendosoft/© Vendosoft, Kumavision/© Kumavision, Noriis Network AG/© Noriis Network AG, WSW Software GmbH/© WSW Software GmbH, tts GmbH/© tts GmbH, Asseco Solutions AG/© Asseco Solutions AG, AFB Gemeinnützige GmbH/© AFB Gemeinnützige GmbH