Skip to main content
Top

2013 | OriginalPaper | Chapter

Data Glitches: Monsters in Your Data

Author : Tamraparni Dasu

Published in: Handbook of Data Quality

Publisher: Springer Berlin Heidelberg

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Data types and data structures are becoming increasingly complex as they keep pace with evolving technologies and applications. The result is an increase in the number and complexity of data quality problems. Data glitches, a common name for data quality problems, can be simple and stand alone, or highly complex with spatial and temporal correlations. In this chapter, we provide an overview of a comprehensive and measurable data quality process. To begin, we define and classify complex glitch types, and describe detection and cleaning techniques. We present metrics for assessing data quality and for choosing cleaning strategies subject to a variety of considerations. The process culminates in a “clean” data set that is acceptable to the end user. We conclude with an overview of significant literature in this area, and a discussion of opportunities for practice, application, and further research.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Barnett V, Lewis T (1994) Outliers in statistical data. Wiley, ChichesterMATH Barnett V, Lewis T (1994) Outliers in statistical data. Wiley, ChichesterMATH
2.
go back to reference Berti-Equille L, Dasu T (2009) Advances in data quality mining. Tutorial, KDD Berti-Equille L, Dasu T (2009) Advances in data quality mining. Tutorial, KDD
3.
go back to reference Berti-Equille L, Dasu T, Srivastava D (2011) Discovery of complex glitch patterns: a novel approach to quantitative data cleaning. In: 2011 IEEE 27th international conference on data engineering (ICDE) Berti-Equille L, Dasu T, Srivastava D (2011) Discovery of complex glitch patterns: a novel approach to quantitative data cleaning. In: 2011 IEEE 27th international conference on data engineering (ICDE)
4.
go back to reference Chandola V, Banerjee A, Kumar V (2009) Anomaly detection: a survey. ACM Comput Surv 41(3), Article 15, 58 p Chandola V, Banerjee A, Kumar V (2009) Anomaly detection: a survey. ACM Comput Surv 41(3), Article 15, 58 p
5.
6.
go back to reference Dasu T, Johnson T, Muthukrishnan S, Shkapenyuk V (2002) Mining database structure; or, how to build a data quality browser. In: Proceedings of the SIGMOD Dasu T, Johnson T, Muthukrishnan S, Shkapenyuk V (2002) Mining database structure; or, how to build a data quality browser. In: Proceedings of the SIGMOD
7.
go back to reference Dasu T, Loh JM (2012) Statistical distortion: consequences of data cleaning. PVLDB 5(11):1674–1683 Dasu T, Loh JM (2012) Statistical distortion: consequences of data cleaning. PVLDB 5(11):1674–1683
8.
go back to reference Elmagarmid AK, Ipeirotis PG, Verykios VS (2007) Duplicate record detection a survey. IEEE Trans Knowledge Data Eng 19(1):1–16CrossRef Elmagarmid AK, Ipeirotis PG, Verykios VS (2007) Duplicate record detection a survey. IEEE Trans Knowledge Data Eng 19(1):1–16CrossRef
9.
go back to reference Golab L, Saha A, Karloff H, Srivastava D, Korn P (2009) Sequential dependencies. PVLDB 2(1):574–585 Golab L, Saha A, Karloff H, Srivastava D, Korn P (2009) Sequential dependencies. PVLDB 2(1):574–585
10.
go back to reference Kriegel H, Kroger P, Zimek A (2009) Outlier detection techniques. Tutorial, PAKDD Kriegel H, Kroger P, Zimek A (2009) Outlier detection techniques. Tutorial, PAKDD
11.
go back to reference Liu X, Dong XL, Ooi BC, Srivastava D (2011) Online data fusion. PVLDB 4(11):932–943 Liu X, Dong XL, Ooi BC, Srivastava D (2011) Online data fusion. PVLDB 4(11):932–943
12.
13.
go back to reference Redman T (1997) Data quality for the information age. Artech House, Norwood Redman T (1997) Data quality for the information age. Artech House, Norwood
Metadata
Title
Data Glitches: Monsters in Your Data
Author
Tamraparni Dasu
Copyright Year
2013
Publisher
Springer Berlin Heidelberg
DOI
https://doi.org/10.1007/978-3-642-36257-6_8

Premium Partner