Skip to main content
Log in

Identification of local multivariate outliers

  • Regular Article
  • Published:
Statistical Papers Aims and scope Submit manuscript

Abstract

The Mahalanobis distance between pairs of multivariate observations is used as a measure of similarity between the observations. The theoretical distribution is derived, and the result is used for judging on the degree of isolation of an observation. In case of spatially dependent data where spatial coordinates are available, different exploratory tools are introduced for studying the degree of isolation of an observation from a fraction of its neighbors, and thus to identify local multivariate outliers.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

References

  • Aitchison J (1986) The statistical analysis of compositional data. Chapman and Hall, London

    Book  MATH  Google Scholar 

  • Anselin L (1995) Local indicators of spatial association. Geogr Anal 27(2):93–115

    Article  Google Scholar 

  • Anselin L (1996) The Moran scatterplot as an ESDA tool to assess local instability in spatial association. In: Fischer M, Scholten H, Unwin D (eds) Spatial analytical perspectives on GIS. Taylor and Francis, London, pp 111–125

    Google Scholar 

  • Anselin L, Syabri I, Smirnov O (2002) Visualizing multivariate spatial correlation with dynamically linked windows. In: Anselin L, Rey S (eds) New tools for spatial data analysis: proceedings of a workshop, Center for Spatially Integrated Social Science, University of California, Santa Barbara (CD-ROM)

  • Atkinson AC, Mulira H-M (1993) The stalactite plot for the detection of multivariate outliers. J Stat Comput 3(1):27–35

    Article  Google Scholar 

  • Atkinson AC, Riani M, Cerioli A (2004) Exploring multivariate data with the forward search. Springer, New York

    Book  MATH  Google Scholar 

  • Breunig MM, Kriegel H-P, Ng RT, Sander J (2000) LOF: identifying density-based local outliers. In: Proceedings of the ACM SIGMOD (2000) international conference on management of data, Dallas, TX, pp 93–104

  • Cerioli A (2010) Multivariate outlier detection with high-breakdown estimators. J Am Stat Assoc 105:147–156

    Article  MathSciNet  Google Scholar 

  • Cerioli A, Riani M (1999) The ordering of spatial data and the detection of multiple outliers. J Comput Graph Stat 8:239–258

    MathSciNet  Google Scholar 

  • Cerioli A, Farcomeni A, Riani M (2012) Robust distances for outlier-free goodness-of-fit testing. Comput Stat Data Anal (in press)

  • Chauvet P (1982) The variogram cloud. In: Proceedings of the 17th APCIM symposium, Colorado Scholl of Mines, Golden, April 19–23, 1982, pp 757–764

  • Chiu AL, Fu AW (2003) Enhancements on local outlier detection. In: Proceedings of the seventh international database engineering and applications, symposium (IDEAS’03), pp 298–307

  • Cressie N (1993) Statistics for spatial data. Wiley, New York

    Google Scholar 

  • Dale MRT, Fortin M-J (2009) Spatial autocorrelation and statistical tests: some solutions. J Agric Biol Environ Stat 14(2):188–206

    Article  MathSciNet  Google Scholar 

  • Evans M, Hastings N, Peacock B (1993) Statistical distributions, 2nd edn. Wiley, New York

    MATH  Google Scholar 

  • Filzmoser P, Gschwandtner M (2012) mvoutlier: multivariate outlier detection based on robust methods. R package version 1.9.8. http://CRAN.R-project.org/package=mvoutlier

  • Filzmoser P, Hron K (2008) Outlier detection for compositional data using robust methods. Math Geosci 40(3):233–248

    Article  MATH  Google Scholar 

  • Filzmoser P, Garrett RG, Reimann C (2005) Multivariate outlier detection in exploration geochemistry. Comput Geosci 31:579–587

    Article  Google Scholar 

  • Filzmoser P, Hron K, Reimann C (2012) Interpretation of multivariate outliers for compositional data. Comput Geosci 39:77–85

    Article  Google Scholar 

  • Guerry A-M (1833) Essai sur la statistique morale de la France. Crochard, Paris. English translation: HP Whitt and VW Reinking, Edwin Mellen Press, Lewiston, 2002

  • Hardin J, Rocke DM (2005) The distribution of robust distances. J Comput Graph Stat 14:910–927

    Article  MathSciNet  Google Scholar 

  • Haslett J, Bradley R, Craig P, Unwin A, Wills G (1991) Dynamic graphics for exploring spatial data with applications to locating global and local anomalies. Am Stat 45(3):234–242

    Google Scholar 

  • Lu CT, Chen D, Kou Y (2004) Multivariate spatial outlier detection. Int J Artif Intell Tools 13(4):801–812

    Article  Google Scholar 

  • Mahalanobis PC (1936) On the generalised distance in statistics. In: Proceedings of the National Institute of Science of India A2, pp 49–55

  • Maronna R, Martin D, Yohai V (2006) Robust statistics: theory and methods. Wiley Canada Ltd, Toronto

    Book  Google Scholar 

  • Papadimitriou S, Kitawaga H, Gibbons PB, Faloutsos C (2003) LOCI: fast outlier detection using the local correlation integral. In: Dayal U, Ramamritham K, Vijayaraman TM (eds) Proceedings of the 19th international conference on data engineering, March 5–8, 2003, Bangalore, India, IEEE Computer Society, pp 315–326

  • Reimann C, Siewers U, Tarvainen T, Bityukova L, Eriksson J, Gilucis A, Gregorauskiene V, Lukashev VK, Matinian NN, Pasieczna A (2003) Agricultural soils in northern Europe: a geochemical atlas. In: Geologisches Jahrbuch. Schweizerbart’sche Verlagsbuchhandlung, Stuttgart

  • Riani M, Atkinson AC, Cerioli A (2009) Finding an unknown number of multivariate outliers. J R Stat Soc Ser B 71:447–466

    Article  MATH  MathSciNet  Google Scholar 

  • Rousseeuw PJ, Leroy AM (2003) Robust regression and outlier detection. Wiley, New York

    Google Scholar 

  • Rousseeuw PJ, Van Driessen K (1999) A fast algorithm for the minimum covariance determinant estimator. Technometrics 41(3):212–223

    Article  Google Scholar 

  • Rousseeuw PJ, Van Zomeren BC (1990) Unmasking multivariate outliers and leverage points. J Am Stat Assoc 85(411):633–651

    Article  Google Scholar 

Download references

Acknowledgments

The ideas are not limited to data with spatial coordinates; they could also be extended to time series data, or data with spatial and temporal dependence. These will be tasks for future research. This work was supported by the French Agence Nationale de la Recherche through the ModULand project (ANR-11-BSH1-005).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Peter Filzmoser.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Filzmoser, P., Ruiz-Gazen, A. & Thomas-Agnan, C. Identification of local multivariate outliers. Stat Papers 55, 29–47 (2014). https://doi.org/10.1007/s00362-013-0524-z

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00362-013-0524-z

Keywords

Navigation