Skip to main content

2019 | OriginalPaper | Buchkapitel

Big Data Quality: A Data Quality Profiling Model

verfasst von : Ikbal Taleb, Mohamed Adel Serhani, Rachida Dssouli

Erschienen in: Services – SERVICES 2019

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Big Data is becoming a standard data model, and it is gaining wide adoption in the digital universe. Estimating the Quality of Big Data is recognized to be essential for data management and data governance. To ensure a fast and efficient data quality assessment represented by its dimensions, we need to extend the data profiling model to incorporate also quality profiling. The latter encompasses more value-added quality processes that go beyond data and its corresponding metadata. In this paper, we propose a Data Quality Profiling Model (BDQPM) for Big Data that involves several modules such as sampling, profiling, exploratory quality profiling, quality profile repository (QPREPO), and the data quality profile (DQP). Thus, the QPREPO plays an important role in managing many quality-related elements such as data quality dimensions and their related metrics, pre-defined quality actions scenarios, pre-processing activities (PPA), their related functions (PPAF), and the data quality profile. Our exploratory quality profiling method discovers a set of PPAF from systematic predefined quality actions scenarios to leverage the quality trends of any data set and show the cause and effects of such a process on the data. Such a quality overview is considered as a preliminary quality profile of the data. We conducted a series of experiments to test different features of the BDQPM including sampling and profiling, quality evaluation, and exploratory quality profiling for Big Data quality enhancement. The results prove that quality profiling tracks quality at the earlier stage of Big data life cycle leading to quality improvement and enforcement insights from exploratory quality profiling methodology.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
4.
Zurück zum Zitat Batini, C., Cappiello, C., Francalanci, C., Maurino, A.: Methodologies for data quality assessment and improvement. ACM Comput. Surv. 41, 1–52 (2009)CrossRef Batini, C., Cappiello, C., Francalanci, C., Maurino, A.: Methodologies for data quality assessment and improvement. ACM Comput. Surv. 41, 1–52 (2009)CrossRef
5.
Zurück zum Zitat Chester, J.: Cookie wars: how new data profiling and targeting techniques threaten citizens and consumers in the “Big Data” era. In: Gutwirth, S., Leenes, R., De Hert, P., Poullet, Y. (eds.) European Data Protection: in Good Health, pp. 53–77. Springer, Dordrecht (2012). https://doi.org/10.1007/978-94-007-2903-2_4CrossRef Chester, J.: Cookie wars: how new data profiling and targeting techniques threaten citizens and consumers in the “Big Data” era. In: Gutwirth, S., Leenes, R., De Hert, P., Poullet, Y. (eds.) European Data Protection: in Good Health, pp. 53–77. Springer, Dordrecht (2012). https://​doi.​org/​10.​1007/​978-94-007-2903-2_​4CrossRef
7.
Zurück zum Zitat Naumann, F.: Big Data Profiling (2014) Naumann, F.: Big Data Profiling (2014)
8.
Zurück zum Zitat Géczy, P.: Big data characteristics. The Macrotheme Review 3, 94–104 (2014) Géczy, P.: Big data characteristics. The Macrotheme Review 3, 94–104 (2014)
9.
Zurück zum Zitat Glowalla, P., Balazy, P., Basten, D., Sunyaev, A.: Process-driven data quality management – an application of the combined conceptual life cycle model. Presented at the 2014 47th Hawaii International Conference on System Sciences (HICSS), pp. 4700–4709 (2014). https://doi.org/10.1109/HICSS.2014.575 Glowalla, P., Balazy, P., Basten, D., Sunyaev, A.: Process-driven data quality management – an application of the combined conceptual life cycle model. Presented at the 2014 47th Hawaii International Conference on System Sciences (HICSS), pp. 4700–4709 (2014). https://​doi.​org/​10.​1109/​HICSS.​2014.​575
11.
Zurück zum Zitat Hasan, O., Habegger, B., Brunie, L., Bennani, N., Damiani, E.: A discussion of privacy challenges in user profiling with big data techniques: the EEXCESS use case. In: BigDataCongress, pp. 25–30 (2013) Hasan, O., Habegger, B., Brunie, L., Bennani, N., Damiani, E.: A discussion of privacy challenges in user profiling with big data techniques: the EEXCESS use case. In: BigDataCongress, pp. 25–30 (2013)
12.
Zurück zum Zitat Eembi, N.B.C., Ishak, I.B., Sidi, F., Affendey, L.S., Mamat, A.: A systematic review on the profiling of digital news portal for big data veracity. Proc. Comput. Sci. 72, 390–397 (2015)CrossRef Eembi, N.B.C., Ishak, I.B., Sidi, F., Affendey, L.S., Mamat, A.: A systematic review on the profiling of digital news portal for big data veracity. Proc. Comput. Sci. 72, 390–397 (2015)CrossRef
14.
Zurück zum Zitat Loshin, D.: Rapid Data Quality Assessment Using Data Profiling, vol. 15 (2010) Loshin, D.: Rapid Data Quality Assessment Using Data Profiling, vol. 15 (2010)
15.
Zurück zum Zitat Maier, M., Serebrenik, A., Vanderfeesten, I.T.P.: Towards a Big Data Reference Architecture. University of Eindhoven (2013) Maier, M., Serebrenik, A., Vanderfeesten, I.T.P.: Towards a Big Data Reference Architecture. University of Eindhoven (2013)
16.
Zurück zum Zitat McNeil, B.J., Pedersen, S.H., Gatsonis, C.: Current issues in profiling quality of care. Inquiry 29, 298–307 (1992) McNeil, B.J., Pedersen, S.H., Gatsonis, C.: Current issues in profiling quality of care. Inquiry 29, 298–307 (1992)
17.
Zurück zum Zitat Naumann, F.: Data profiling revisited. ACM SIGMOD Rec. 42, 40–49 (2014)CrossRef Naumann, F.: Data profiling revisited. ACM SIGMOD Rec. 42, 40–49 (2014)CrossRef
18.
Zurück zum Zitat Oliveira, P., Rodrigues, F., Henriques, P.R.: A formal definition of data quality problems. In: IQ (2005) Oliveira, P., Rodrigues, F., Henriques, P.R.: A formal definition of data quality problems. In: IQ (2005)
19.
Zurück zum Zitat Prabha, M.S., Sarojini, B.: Survey on Big Data and Cloud Computing, pp. 119–122. IEEE (2017) Prabha, M.S., Sarojini, B.: Survey on Big Data and Cloud Computing, pp. 119–122. IEEE (2017)
20.
Zurück zum Zitat Sidi, F., Shariat Panahy, P.H., Affendey, L.S., Jabar, M.A., Ibrahim, H., Mustapha, A.: Data quality: a survey of data quality dimensions. In: CAMP 2012, pp 300–304 (2012) Sidi, F., Shariat Panahy, P.H., Affendey, L.S., Jabar, M.A., Ibrahim, H., Mustapha, A.: Data quality: a survey of data quality dimensions. In: CAMP 2012, pp 300–304 (2012)
21.
Zurück zum Zitat Talwalkar AKA The Big Data Bootstrap. 20 Talwalkar AKA The Big Data Bootstrap. 20
Metadaten
Titel
Big Data Quality: A Data Quality Profiling Model
verfasst von
Ikbal Taleb
Mohamed Adel Serhani
Rachida Dssouli
Copyright-Jahr
2019
DOI
https://doi.org/10.1007/978-3-030-23381-5_5

Neuer Inhalt