Abstract
The Mahalanobis distance between pairs of multivariate observations is used as a measure of similarity between the observations. The theoretical distribution is derived, and the result is used for judging on the degree of isolation of an observation. In case of spatially dependent data where spatial coordinates are available, different exploratory tools are introduced for studying the degree of isolation of an observation from a fraction of its neighbors, and thus to identify local multivariate outliers.
Similar content being viewed by others
References
Aitchison J (1986) The statistical analysis of compositional data. Chapman and Hall, London
Anselin L (1995) Local indicators of spatial association. Geogr Anal 27(2):93–115
Anselin L (1996) The Moran scatterplot as an ESDA tool to assess local instability in spatial association. In: Fischer M, Scholten H, Unwin D (eds) Spatial analytical perspectives on GIS. Taylor and Francis, London, pp 111–125
Anselin L, Syabri I, Smirnov O (2002) Visualizing multivariate spatial correlation with dynamically linked windows. In: Anselin L, Rey S (eds) New tools for spatial data analysis: proceedings of a workshop, Center for Spatially Integrated Social Science, University of California, Santa Barbara (CD-ROM)
Atkinson AC, Mulira H-M (1993) The stalactite plot for the detection of multivariate outliers. J Stat Comput 3(1):27–35
Atkinson AC, Riani M, Cerioli A (2004) Exploring multivariate data with the forward search. Springer, New York
Breunig MM, Kriegel H-P, Ng RT, Sander J (2000) LOF: identifying density-based local outliers. In: Proceedings of the ACM SIGMOD (2000) international conference on management of data, Dallas, TX, pp 93–104
Cerioli A (2010) Multivariate outlier detection with high-breakdown estimators. J Am Stat Assoc 105:147–156
Cerioli A, Riani M (1999) The ordering of spatial data and the detection of multiple outliers. J Comput Graph Stat 8:239–258
Cerioli A, Farcomeni A, Riani M (2012) Robust distances for outlier-free goodness-of-fit testing. Comput Stat Data Anal (in press)
Chauvet P (1982) The variogram cloud. In: Proceedings of the 17th APCIM symposium, Colorado Scholl of Mines, Golden, April 19–23, 1982, pp 757–764
Chiu AL, Fu AW (2003) Enhancements on local outlier detection. In: Proceedings of the seventh international database engineering and applications, symposium (IDEAS’03), pp 298–307
Cressie N (1993) Statistics for spatial data. Wiley, New York
Dale MRT, Fortin M-J (2009) Spatial autocorrelation and statistical tests: some solutions. J Agric Biol Environ Stat 14(2):188–206
Evans M, Hastings N, Peacock B (1993) Statistical distributions, 2nd edn. Wiley, New York
Filzmoser P, Gschwandtner M (2012) mvoutlier: multivariate outlier detection based on robust methods. R package version 1.9.8. http://CRAN.R-project.org/package=mvoutlier
Filzmoser P, Hron K (2008) Outlier detection for compositional data using robust methods. Math Geosci 40(3):233–248
Filzmoser P, Garrett RG, Reimann C (2005) Multivariate outlier detection in exploration geochemistry. Comput Geosci 31:579–587
Filzmoser P, Hron K, Reimann C (2012) Interpretation of multivariate outliers for compositional data. Comput Geosci 39:77–85
Guerry A-M (1833) Essai sur la statistique morale de la France. Crochard, Paris. English translation: HP Whitt and VW Reinking, Edwin Mellen Press, Lewiston, 2002
Hardin J, Rocke DM (2005) The distribution of robust distances. J Comput Graph Stat 14:910–927
Haslett J, Bradley R, Craig P, Unwin A, Wills G (1991) Dynamic graphics for exploring spatial data with applications to locating global and local anomalies. Am Stat 45(3):234–242
Lu CT, Chen D, Kou Y (2004) Multivariate spatial outlier detection. Int J Artif Intell Tools 13(4):801–812
Mahalanobis PC (1936) On the generalised distance in statistics. In: Proceedings of the National Institute of Science of India A2, pp 49–55
Maronna R, Martin D, Yohai V (2006) Robust statistics: theory and methods. Wiley Canada Ltd, Toronto
Papadimitriou S, Kitawaga H, Gibbons PB, Faloutsos C (2003) LOCI: fast outlier detection using the local correlation integral. In: Dayal U, Ramamritham K, Vijayaraman TM (eds) Proceedings of the 19th international conference on data engineering, March 5–8, 2003, Bangalore, India, IEEE Computer Society, pp 315–326
Reimann C, Siewers U, Tarvainen T, Bityukova L, Eriksson J, Gilucis A, Gregorauskiene V, Lukashev VK, Matinian NN, Pasieczna A (2003) Agricultural soils in northern Europe: a geochemical atlas. In: Geologisches Jahrbuch. Schweizerbart’sche Verlagsbuchhandlung, Stuttgart
Riani M, Atkinson AC, Cerioli A (2009) Finding an unknown number of multivariate outliers. J R Stat Soc Ser B 71:447–466
Rousseeuw PJ, Leroy AM (2003) Robust regression and outlier detection. Wiley, New York
Rousseeuw PJ, Van Driessen K (1999) A fast algorithm for the minimum covariance determinant estimator. Technometrics 41(3):212–223
Rousseeuw PJ, Van Zomeren BC (1990) Unmasking multivariate outliers and leverage points. J Am Stat Assoc 85(411):633–651
Acknowledgments
The ideas are not limited to data with spatial coordinates; they could also be extended to time series data, or data with spatial and temporal dependence. These will be tasks for future research. This work was supported by the French Agence Nationale de la Recherche through the ModULand project (ANR-11-BSH1-005).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Filzmoser, P., Ruiz-Gazen, A. & Thomas-Agnan, C. Identification of local multivariate outliers. Stat Papers 55, 29–47 (2014). https://doi.org/10.1007/s00362-013-0524-z
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00362-013-0524-z