Novelty detection: a review—part 1: statistical approaches
Introduction
Detecting novel events is an important ability of any signal classification scheme. Given the fact that we can never train a machine learning system on all possible object classes whose data the system is likely to encounter, it becomes important that it is able to differentiate between known and unknown object information during testing. It has been realised in practice by several studies that the novelty detection is an extremely challenging task. It is for this reason that there exist several models of novelty detection that have been shown to perform well on different data. It is clearly evident that there is no single best model for novelty detection and the success depends not only on the type of method used but also statistical properties of data handled.
Several applications require the classifier to act as a detector rather as a classifier, that is, the requirement is to detect whether an input is part of the data that the classifier was trained on or it is in fact unknown. This technique is useful in applications such as fault detection [11], [14], [30], [53], radar target detection [7], detection of masses in mammograms [54], hand written digit recognition [55], Internet and e-commerce [34], statistical process control [22], and several others. Recently, there has been an increased interest in novelty detection as a number of research articles have appeared on autonomous systems based on adaptive machine learning. However, only a very few surveys have appeared, e.g. [40]. Much of earlier work and interest in novelty detection sprung from the study of control systems. High integrity systems could not use the traditional classification method for a number of reasons; abnormalities are very rare or there may be no data that describes the fault conditions. Novelty detection offered a solution to this problem by modelling normal data and using a distance measure and a threshold for determining abnormality. In recent years novelty detection has been used in a number of other applications especially signal processing and image analysis (e.g. biometrics). In these applications the problem becomes more complicated with multiple classes, high dimensionality, noisy features and quite often not enough samples. As such, novelty detection methods have tried to keep up with these problems to offer solutions that can be used in the real world. In this paper we review some of the currently used methods on novelty detection using statistical approaches.
There are several important issues related to novelty detection. We can summarise them in terms of the following principles.
- (a)
Principle of robustness and trade-off: a novelty detection method must be capable of robust performance on test data that maximises the exclusion of novel samples while minimising the exclusion of known samples. This trade-off should be, to a limited extent, predictable and under experimental control.
- (b)
Principle of uniform data scaling: in order to assist novelty detection, it should be possible that all test data and training data after normalisation lie within the same range [49].
- (c)
Principle of parameter minimisation: a novelty detection method should aim to minimise the number of parameters that are user set.
- (d)
Principle of generalisation: the system should be able to generalise without confusing generalised information as novel [55].
- (e)
Principle of independence: the novelty detection method should be independent of the number of features, and classes available and it should show reasonable performance in the context of imbalanced data set, low number of samples, and noise.
- (f)
Principle of adaptability: a system that recognises novel samples during test should be able to use this information for retraining [47].
- (g)
Principle of computational complexity: a number of novelty detection applications are online and therefore the computational complexity of a novelty detection mechanism should be as less as possible.
Section snippets
Statistical approaches
Statistical approaches are mostly based on modelling data based on its statistical properties and using this information to estimate whether a test samples comes from the same distribution or not. The techniques used vary in terms of their complexity [40]. The simplest approach can be based on constructing a density function for data of a known class, and then assuming that data is normal computing the probability of a test sample of belonging to that class. The probability estimate can be
Conclusion
In this paper we have presented a survey of novelty detection using statistical approaches. Most of such research is driven by modelling data distributions and then estimating the probability of test data to belong to such distributions. In such model-based approaches, one does need to specify or make assumptions on the nature of training data. In addition, the amount and quality of training data becomes very important in the robust determination of training data distribution parameters.
References (64)
- et al.
Rejection based classifier for face detection
Pattern Recognition Lett.
(2002) - et al.
Multiclassificationreject criteria for the Bayesian combiner
Pattern Recognition
(1999) - et al.
Reject option with multiple thresholds
Pattern Recognition
(2000) - et al.
On-line control chart pattern detection and discrimination—a neural network approach
Artificial Intell. Eng.
(1999) - et al.
Two-phase clustering algorithm for outliers detection
Pattern Recognition Lett.
(2001) - et al.
EvIdenta functional magnetic resonance image analysis system
Artif. Intell. Med.
(2001) - L.D. Baker, T. Hofmann, A.K. McCallum, Y. Yang, A hierarchical probabilistic model for novelty detection in text,...
- et al.
Outliers in Statistical Data
(1994) - J.C. Bezdek, R. Ehrlich, W. Full, FCM: the fuzzy c-means clustering algorithm, Computers and Geosciences, Vol. 10,...
- C. Bishop, Novelty detection and neural network validation, Proceedings of the IEE Conference on Vision and Image...
A linear programming approach to novelty detection
On optimum recognition error and reject tradeoff
IEEE Trans. Inform. Theory
A method for improving classification reliability of multilayer perceptrons
IEEE Trans. Neural Networks
Nearest neighbor pattern classification
IEEE Trans. Inform. Theory
Novelty-detection in time series data using ideas from immunology, Proceedings of the International Conference on Intelligent Systems
Pattern Classification
Limiting forms of the frequency distribution of the largest and smallest member of a sample
Proc. Camb. Philos. Soc.
The error-reject tradeoff
Open Systems Inform. Dynamics
The nearest neighbour classification with a reject option
IEEE Trans. Systems Sci. Cybernet.
Distance-based outliersalgorithms and applications
VLDB J.
Cited by (1230)
Anomaly detection based on Artificial Intelligence of Things: A Systematic Literature Mapping
2024, Internet of Things (Netherlands)Characterizing the variability of footstep-induced structural vibrations for open-world person identification
2023, Mechanical Systems and Signal ProcessingMeta-survey on outlier and anomaly detection
2023, NeurocomputingA review of machine learning methods applied to structural dynamics and vibroacoustic
2023, Mechanical Systems and Signal Processing