Skip to main content
Erschienen in: Advances in Data Analysis and Classification 3/2018

15.12.2017 | Regular Article

Outlier detection in interval data

verfasst von: A. Pedro Duarte Silva, Peter Filzmoser, Paula Brito

Erschienen in: Advances in Data Analysis and Classification | Ausgabe 3/2018

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

A multivariate outlier detection method for interval data is proposed that makes use of a parametric approach to model the interval data. The trimmed maximum likelihood principle is adapted in order to robustly estimate the model parameters. A simulation study demonstrates the usefulness of the robust estimates for outlier detection, and new diagnostic plots allow gaining deeper insight into the structure of real world interval data.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Literatur
Zurück zum Zitat Billard B, Diday E (2003) From the statistics of data to the statistics of knowledge: symbolic data analysis. J Am Stat Assoc 98(462):470–487MathSciNetCrossRef Billard B, Diday E (2003) From the statistics of data to the statistics of knowledge: symbolic data analysis. J Am Stat Assoc 98(462):470–487MathSciNetCrossRef
Zurück zum Zitat Bock H-H, Diday E (2000) Analysis of symbolic data. Exploratory methods for extracting statistical information from complex data. Springer, HeidelbergMATH Bock H-H, Diday E (2000) Analysis of symbolic data. Exploratory methods for extracting statistical information from complex data. Springer, HeidelbergMATH
Zurück zum Zitat Brito P (2014) Symbolic data analysis: another look at the interaction of data mining and statistics. WIREs Data Min Knowl Discov 4(4):281–295CrossRef Brito P (2014) Symbolic data analysis: another look at the interaction of data mining and statistics. WIREs Data Min Knowl Discov 4(4):281–295CrossRef
Zurück zum Zitat Brito P, Duarte Silva AP (2012) Modelling interval data with Normal and Skew-Normal distributions. J Appl Stat 39(1):3–20MathSciNetCrossRef Brito P, Duarte Silva AP (2012) Modelling interval data with Normal and Skew-Normal distributions. J Appl Stat 39(1):3–20MathSciNetCrossRef
Zurück zum Zitat Cerioli A (2010) Multivariate outlier detection with high-breakdown estimators. J Am Stat Assoc 105(489):147–156MathSciNetCrossRef Cerioli A (2010) Multivariate outlier detection with high-breakdown estimators. J Am Stat Assoc 105(489):147–156MathSciNetCrossRef
Zurück zum Zitat De Carvalho FAT, Brito P, Bock H-H (2006) Dynamic clustering for interval data based on \(L_2\) distance. Comput Stat 21(2):231–250MathSciNetCrossRef De Carvalho FAT, Brito P, Bock H-H (2006) Dynamic clustering for interval data based on \(L_2\) distance. Comput Stat 21(2):231–250MathSciNetCrossRef
Zurück zum Zitat De Carvalho FAT, Lechevallier Y (2009) Partitional clustering algorithms for symbolic interval data based on single adaptive distances. Pattern Recogn 42(7):1223–1236CrossRef De Carvalho FAT, Lechevallier Y (2009) Partitional clustering algorithms for symbolic interval data based on single adaptive distances. Pattern Recogn 42(7):1223–1236CrossRef
Zurück zum Zitat Dias S, Brito P (2017) Off the beaten track: a new linear model for interval data. Eur J Oper Res 258(3):1118–1130MathSciNetCrossRef Dias S, Brito P (2017) Off the beaten track: a new linear model for interval data. Eur J Oper Res 258(3):1118–1130MathSciNetCrossRef
Zurück zum Zitat Diday E, Noirhomme-Fraiture M (2008) Symbolic data analysis and the SODAS software. Wiley, ChichesterMATH Diday E, Noirhomme-Fraiture M (2008) Symbolic data analysis and the SODAS software. Wiley, ChichesterMATH
Zurück zum Zitat Douzal-Chouakria A, Billard L, Diday E (2011) Principal component analysis for interval-valued observations. Stat Anal Data Min 4(2):229–246MathSciNetCrossRef Douzal-Chouakria A, Billard L, Diday E (2011) Principal component analysis for interval-valued observations. Stat Anal Data Min 4(2):229–246MathSciNetCrossRef
Zurück zum Zitat Duarte Silva AP, Brito P (2015) Discriminant analysis of interval data: an assessment of parametric and distance-based approaches. J Classif 32(3):516–541MathSciNetCrossRef Duarte Silva AP, Brito P (2015) Discriminant analysis of interval data: an assessment of parametric and distance-based approaches. J Classif 32(3):516–541MathSciNetCrossRef
Zurück zum Zitat Filzmoser P (2004) A multivariate outlier detection method. In: S. Aivazian, P. Filzmoser and Yu. Kharin, editors, In Proceedings of the 7th international conference on computer data analysis and modeling, vol 1, 18–22, Belarusian State University, Minsk Filzmoser P (2004) A multivariate outlier detection method. In: S. Aivazian, P. Filzmoser and Yu. Kharin, editors, In Proceedings of the 7th international conference on computer data analysis and modeling, vol 1, 18–22, Belarusian State University, Minsk
Zurück zum Zitat Filzmoser P, Reimann C, Garrett RG (2005) Multivariate outlier detection in exploration geochemistry. Comput Geosci 31:579–587CrossRef Filzmoser P, Reimann C, Garrett RG (2005) Multivariate outlier detection in exploration geochemistry. Comput Geosci 31:579–587CrossRef
Zurück zum Zitat Hadi AS, Luceño A (1997) Maximum trimmed likelihood estimators: a unified approach, examples, and algorithms. Comput Stat Data Anal 25(3):251–272MathSciNetCrossRef Hadi AS, Luceño A (1997) Maximum trimmed likelihood estimators: a unified approach, examples, and algorithms. Comput Stat Data Anal 25(3):251–272MathSciNetCrossRef
Zurück zum Zitat Hubert M, Rousseeuw PJ, Van Aelst S (2008) High-breakdown robust multivariate methods. Stat Sci 23(1):92–119MathSciNetCrossRef Hubert M, Rousseeuw PJ, Van Aelst S (2008) High-breakdown robust multivariate methods. Stat Sci 23(1):92–119MathSciNetCrossRef
Zurück zum Zitat Korkmaz S, Goksuluk D, Zararsiz G (2014) MVN: an R package for assessing multivariate normality. R J 6(2):151–162 Korkmaz S, Goksuluk D, Zararsiz G (2014) MVN: an R package for assessing multivariate normality. R J 6(2):151–162
Zurück zum Zitat Le-Rademacher J, Billard L (2011) Likelihood functions and some maximum likelihood estimators for symbolic data. J Stat Plan Inference 141:1593–1602MathSciNetCrossRef Le-Rademacher J, Billard L (2011) Likelihood functions and some maximum likelihood estimators for symbolic data. J Stat Plan Inference 141:1593–1602MathSciNetCrossRef
Zurück zum Zitat Le-Rademacher J, Billard L (2012) Symbolic covariance principal component analysis and visualization for interval-valued data. J Comput Gr Stat 21(2):413–432MathSciNetCrossRef Le-Rademacher J, Billard L (2012) Symbolic covariance principal component analysis and visualization for interval-valued data. J Comput Gr Stat 21(2):413–432MathSciNetCrossRef
Zurück zum Zitat Li S, Lee R, Lang S-D (2006) Detecting outliers in interval data. In Proceedings of the 44th annual southeast regional conference, ACM, pp 290–295 Li S, Lee R, Lang S-D (2006) Detecting outliers in interval data. In Proceedings of the 44th annual southeast regional conference, ACM, pp 290–295
Zurück zum Zitat Lima Neto E, De Carvalho FAT (2008) Centre and range method for fitting a linear regression model to symbolic interval data. Comput Stat Data Anal 52(3):1500–1515MathSciNetCrossRef Lima Neto E, De Carvalho FAT (2008) Centre and range method for fitting a linear regression model to symbolic interval data. Comput Stat Data Anal 52(3):1500–1515MathSciNetCrossRef
Zurück zum Zitat Lima Neto E, De Carvalho FAT (2010) Constrained linear regression models for symbolic interval-valued variables. Comput Stat Data Anal 54(2):333–347MathSciNetCrossRef Lima Neto E, De Carvalho FAT (2010) Constrained linear regression models for symbolic interval-valued variables. Comput Stat Data Anal 54(2):333–347MathSciNetCrossRef
Zurück zum Zitat Lima Neto E, Cordeiro GM, De Carvalho FAT (2011) Bivariate symbolic regression models for interval-valued variables. J Stat Comput Simul 81(11):1727–1744MathSciNetCrossRef Lima Neto E, Cordeiro GM, De Carvalho FAT (2011) Bivariate symbolic regression models for interval-valued variables. J Stat Comput Simul 81(11):1727–1744MathSciNetCrossRef
Zurück zum Zitat Neykov N, Filzmoser P, Dimova R, Neytchev P (2007) Robust fitting of mixtures using the trimmed likelihood estimator. Comput Stat Data Anal 52(1):299–308MathSciNetCrossRef Neykov N, Filzmoser P, Dimova R, Neytchev P (2007) Robust fitting of mixtures using the trimmed likelihood estimator. Comput Stat Data Anal 52(1):299–308MathSciNetCrossRef
Zurück zum Zitat Neykov NM, Müller CH (2003) Breakdown point and computation of trimmed likelihood estimators in generalized linear models. In: Dutter R, Filzmoser P, Gather U, Rousseeuw PJ (eds) Developments in robust statistics. Physica-Verlag, Heidelberg, pp 277–286CrossRef Neykov NM, Müller CH (2003) Breakdown point and computation of trimmed likelihood estimators in generalized linear models. In: Dutter R, Filzmoser P, Gather U, Rousseeuw PJ (eds) Developments in robust statistics. Physica-Verlag, Heidelberg, pp 277–286CrossRef
Zurück zum Zitat Noirhomme-Fraiture M, Brito P (2011) Far beyond the classical data models: symbolic data analysis. Stat Anal Data Min 4(2):157–170MathSciNetCrossRef Noirhomme-Fraiture M, Brito P (2011) Far beyond the classical data models: symbolic data analysis. Stat Anal Data Min 4(2):157–170MathSciNetCrossRef
Zurück zum Zitat Ramos-Guajardo AB, Grzegorzewski P (2016) Distance-based linear discriminant analysis for interval-valued data. Inf Sci 372:591–607CrossRef Ramos-Guajardo AB, Grzegorzewski P (2016) Distance-based linear discriminant analysis for interval-valued data. Inf Sci 372:591–607CrossRef
Zurück zum Zitat Rousseeuw PJ, Van Driessen K (1999) A fast algorithm for the minimum covariance determinant estimator. Technometrics 41(3):212–223CrossRef Rousseeuw PJ, Van Driessen K (1999) A fast algorithm for the minimum covariance determinant estimator. Technometrics 41(3):212–223CrossRef
Zurück zum Zitat Rousseeuw PJ, Van Zomeren BC (1990) Unmasking multivariate outliers and leverage points. J Am Stat Assoc 85(411):633–639CrossRef Rousseeuw PJ, Van Zomeren BC (1990) Unmasking multivariate outliers and leverage points. J Am Stat Assoc 85(411):633–639CrossRef
Zurück zum Zitat Van Rijsbergen CJ (1979) Information retrieval, 2nd edn. Butterworth, LondonMATH Van Rijsbergen CJ (1979) Information retrieval, 2nd edn. Butterworth, LondonMATH
Zurück zum Zitat Viattchenin D (2012) Detecting outliers in interval-valued data using heuristic possibilistic clustering. J Comput Sci Control Syst 5(2):39–44 Viattchenin D (2012) Detecting outliers in interval-valued data using heuristic possibilistic clustering. J Comput Sci Control Syst 5(2):39–44
Metadaten
Titel
Outlier detection in interval data
verfasst von
A. Pedro Duarte Silva
Peter Filzmoser
Paula Brito
Publikationsdatum
15.12.2017
Verlag
Springer Berlin Heidelberg
Erschienen in
Advances in Data Analysis and Classification / Ausgabe 3/2018
Print ISSN: 1862-5347
Elektronische ISSN: 1862-5355
DOI
https://doi.org/10.1007/s11634-017-0305-y

Weitere Artikel der Ausgabe 3/2018

Advances in Data Analysis and Classification 3/2018 Zur Ausgabe