Skip to main content

2015 | OriginalPaper | Buchkapitel

Prior Data Quality Management in Data Mining Process

verfasst von : Mamadou S. Camara, Djasrabe Naguingar, Alassane Bah

Erschienen in: New Trends in Networking, Computing, E-learning, Systems Sciences, and Engineering

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Data Mining (DM) projects are implemented by following the knowledge discovery process. Several techniques for detecting and handling data quality problems such as missing data, outliers, inconsistent data or time-variant data, can be found in the literature of DM and Data Warehousing (DW). Tasks that are related to the quality of data are mostly in the Data Understanding and in the Data Preparation phases of the DM process. The main limitation in the application of the data quality management techniques is the complexity caused by a lack of anticipation in the detection and resolution of the problems. A DM process model designed for the prior management of data quality is proposed in this work. In this model, the DM process is defined in relation to the Software Engineering (SE) process; the two processes are combined in parallel. The main contribution of this DM process is the anticipation and the automation of all activities necessary to remove data quality problems.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Sharma, S., K.-M. Osei-Bryson, and G.M. Kasper, Evaluation of an integrated Knowledge Discovery and Data Mining process model. Expert Systems with Applications, 2012. 39(13): p. 11335–11348.CrossRef Sharma, S., K.-M. Osei-Bryson, and G.M. Kasper, Evaluation of an integrated Knowledge Discovery and Data Mining process model. Expert Systems with Applications, 2012. 39(13): p. 11335–11348.CrossRef
2.
Zurück zum Zitat Larose, D.T., Data Mining Methods and Models. 2006: John Wiley & Sons. Larose, D.T., Data Mining Methods and Models. 2006: John Wiley & Sons.
3.
Zurück zum Zitat SPSS, CRISP-DM 1.0: Step-by-step data mining guide. 2000, SPSS Inc. SPSS, CRISP-DM 1.0: Step-by-step data mining guide. 2000, SPSS Inc.
4.
Zurück zum Zitat Malinowski, E. and E. Zimanyi, A conceptual model for temporal data warehouses and its transformation to the ER and the object-relational models. Data & Knowledge Engineering, 2008. 64: p. 101–133.CrossRef Malinowski, E. and E. Zimanyi, A conceptual model for temporal data warehouses and its transformation to the ER and the object-relational models. Data & Knowledge Engineering, 2008. 64: p. 101–133.CrossRef
5.
Zurück zum Zitat Kimball, R. and M. Ross, The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling. Second ed. 2002: John Wiley & Sons, Inc. Kimball, R. and M. Ross, The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling. Second ed. 2002: John Wiley & Sons, Inc.
6.
Zurück zum Zitat Tsikriktsis, N., A review of techniques for treating missing data in OM survey research. Journal of Operations Management, 2005. 24: p. 53–62.CrossRef Tsikriktsis, N., A review of techniques for treating missing data in OM survey research. Journal of Operations Management, 2005. 24: p. 53–62.CrossRef
7.
Zurück zum Zitat Little, R.J.A. and D.B. Rubin, Statistical Analysis with Missing Data. 1987, New York: J. Wiley & Sons.MATH Little, R.J.A. and D.B. Rubin, Statistical Analysis with Missing Data. 1987, New York: J. Wiley & Sons.MATH
8.
Zurück zum Zitat Zha, Y., et al., Dealing with missing data based on data envelopment analysis and halo effect. Applied Mathematical Modelling, 2013. 37: p. 6135–6145.CrossRefMathSciNet Zha, Y., et al., Dealing with missing data based on data envelopment analysis and halo effect. Applied Mathematical Modelling, 2013. 37: p. 6135–6145.CrossRefMathSciNet
9.
Zurück zum Zitat Hawkins, D., Identifications of Outliers. 1980, London: Chapman and Hall.CrossRef Hawkins, D., Identifications of Outliers. 1980, London: Chapman and Hall.CrossRef
10.
Zurück zum Zitat Jiang, F., Y. Sui, and C. Cao, A hybrid approach to outlier detection based on boundary region. Pattern Recognition Letters, 2011. 32: p. 1860–1870.CrossRef Jiang, F., Y. Sui, and C. Cao, A hybrid approach to outlier detection based on boundary region. Pattern Recognition Letters, 2011. 32: p. 1860–1870.CrossRef
11.
Zurück zum Zitat Li, X. and F. Rao, Outlier Detection Using the Information Entropy of Neighborhood Rough Sets. Journal of Information & Computational Science, 2012. 9(12): p. 3339-3350. Li, X. and F. Rao, Outlier Detection Using the Information Entropy of Neighborhood Rough Sets. Journal of Information & Computational Science, 2012. 9(12): p. 3339-3350.
12.
Zurück zum Zitat Barnett, V. and T. Lewis, Outliers in Statistical Data. 1994: Wiley. Barnett, V. and T. Lewis, Outliers in Statistical Data. 1994: Wiley.
13.
Zurück zum Zitat Johnson, T., I. Kwok, and R. Ng. Fast Computation of 2-Dimensional Depth Contours in the 4th International Conference on Knowledge Discovery and Data Mining. 1998. New York. Johnson, T., I. Kwok, and R. Ng. Fast Computation of 2-Dimensional Depth Contours in the 4th International Conference on Knowledge Discovery and Data Mining. 1998. New York.
14.
Zurück zum Zitat Jain, A.K., M.N. Murty, and P.J. Flynn, Data clustering: A review. ACM Computing Surveys, 1999. 31(3): p. 264-323CrossRef Jain, A.K., M.N. Murty, and P.J. Flynn, Data clustering: A review. ACM Computing Surveys, 1999. 31(3): p. 264-323CrossRef
15.
Zurück zum Zitat Angiulli, F., R. Ben-Eliyahu–Zohary, and L. Palopoli, Outlier detection using default reasoning. Artificial Intelligence, 2008.172: p. 96–115.CrossRefMathSciNet Angiulli, F., R. Ben-Eliyahu–Zohary, and L. Palopoli, Outlier detection using default reasoning. Artificial Intelligence, 2008.172: p. 96–115.CrossRefMathSciNet
16.
Zurück zum Zitat Arenas, M., L. Bertossi, and J. Chomicki, Consistent Query Answers in Inconsistent Databases, in ACM Symposium on Principles of Database Systems (PODS). 1999, ACM Press. p. 68–79. Arenas, M., L. Bertossi, and J. Chomicki, Consistent Query Answers in Inconsistent Databases, in ACM Symposium on Principles of Database Systems (PODS). 1999, ACM Press. p. 68–79.
17.
Zurück zum Zitat Bertossi, L., Consistent Query Answering in Databases. ACM SIGMOD Record, 2006. 35(2): p. 68–76.CrossRef Bertossi, L., Consistent Query Answering in Databases. ACM SIGMOD Record, 2006. 35(2): p. 68–76.CrossRef
18.
Zurück zum Zitat García-Garcia, J. and C. Ordonez, Extended aggregations for databases with referential integrity issues. Data & Knowledge Engineering, 2010. 69: p. 73–95.CrossRef García-Garcia, J. and C. Ordonez, Extended aggregations for databases with referential integrity issues. Data & Knowledge Engineering, 2010. 69: p. 73–95.CrossRef
19.
Zurück zum Zitat Caniupan, M., L. Bravo, and C.A. Hurtado, Repairing inconsistent dimensions in data warehouses. Data & Knowledge Engineering, 2012. 79–80: p. 17–39.CrossRef Caniupan, M., L. Bravo, and C.A. Hurtado, Repairing inconsistent dimensions in data warehouses. Data & Knowledge Engineering, 2012. 79–80: p. 17–39.CrossRef
20.
Zurück zum Zitat Caniupan, M.M. and A. Placencia, Data Warehouse Fixer: Fixing Inconsistencies in Data Warehouses in 30th International Conference of the Chilean Computer Science Society. 2011, IEEE Curico p. 28–32. Caniupan, M.M. and A. Placencia, Data Warehouse Fixer: Fixing Inconsistencies in Data Warehouses in 30th International Conference of the Chilean Computer Science Society. 2011, IEEE Curico p. 28–32.
21.
Zurück zum Zitat Snodgrass, R.T., Developing Time-Oriented Database Applications in SQL. 2000, San Francisco: Morgan Kaufmann Publishers, Inc. Snodgrass, R.T., Developing Time-Oriented Database Applications in SQL. 2000, San Francisco: Morgan Kaufmann Publishers, Inc.
22.
Zurück zum Zitat Johnston, T. and R. Weis, Managing Time in Databases: A Comprehensive Approach. 2010: Morgan Kaufmann. Johnston, T. and R. Weis, Managing Time in Databases: A Comprehensive Approach. 2010: Morgan Kaufmann.
23.
Zurück zum Zitat Mitsa, T., Temporal Data Mining. 2010: Taylor & Francis. Mitsa, T., Temporal Data Mining. 2010: Taylor & Francis.
24.
Zurück zum Zitat Hsu, W., M.L. Lee, and J. Wang, Temporal and Spatio-temporal Data Mining. 2008: IGI Global Snippet. Hsu, W., M.L. Lee, and J. Wang, Temporal and Spatio-temporal Data Mining. 2008: IGI Global Snippet.
25.
Zurück zum Zitat Sommerville, I., Software Engineering. Ninth Edition ed. 2011: Addison-Wesley. Sommerville, I., Software Engineering. Ninth Edition ed. 2011: Addison-Wesley.
26.
Zurück zum Zitat Marban, O., et al., Toward datamining engineering: A software engineering approach. Information Systems, 2009. 34: p. 87–107.CrossRef Marban, O., et al., Toward datamining engineering: A software engineering approach. Information Systems, 2009. 34: p. 87–107.CrossRef
27.
Zurück zum Zitat Ian, H.W. and F. Eibe, Data Mining Practical Machine Learning Tools and Techniques. 2005: Morgan Kaufmann Publishers. Ian, H.W. and F. Eibe, Data Mining Practical Machine Learning Tools and Techniques. 2005: Morgan Kaufmann Publishers.
28.
Zurück zum Zitat Boettcher, S.G. and C. Dethlefsen, deal: A Package for Learning Bayesian Networks. Journal of Statistical Software, 2003. 8(20): p. 1–40. Boettcher, S.G. and C. Dethlefsen, deal: A Package for Learning Bayesian Networks. Journal of Statistical Software, 2003. 8(20): p. 1–40.
29.
Zurück zum Zitat Han, J., M. Kamber, and J. Pei, Data Mining: Concepts and Techniques. second edition ed. 2005: The Morgan Kaufmann Series in Data Management Systems. Han, J., M. Kamber, and J. Pei, Data Mining: Concepts and Techniques. second edition ed. 2005: The Morgan Kaufmann Series in Data Management Systems.
30.
Zurück zum Zitat Conallen, J., Building Web Applications with Uml 2002, Boston, MA, USA: Addison-Wesley Longman Publishing. Conallen, J., Building Web Applications with Uml 2002, Boston, MA, USA: Addison-Wesley Longman Publishing.
31.
Zurück zum Zitat Ramakrishnan, R. and J. Gehrke, Database Management Systems. 2000, Berkeley: Osborne/McGraw-Hill. Ramakrishnan, R. and J. Gehrke, Database Management Systems. 2000, Berkeley: Osborne/McGraw-Hill.
Metadaten
Titel
Prior Data Quality Management in Data Mining Process
verfasst von
Mamadou S. Camara
Djasrabe Naguingar
Alassane Bah
Copyright-Jahr
2015
DOI
https://doi.org/10.1007/978-3-319-06764-3_37

Neuer Inhalt