Skip to main content

2019 | OriginalPaper | Buchkapitel

A DaQL to Monitor Data Quality in Machine Learning Applications

verfasst von : Lisa Ehrlinger, Verena Haunschmid, Davide Palazzini, Christian Lettner

Erschienen in: Database and Expert Systems Applications

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Machine learning models can only be as good as the data used to train them. Despite this obvious correlation, there is little research about data quality measurement to ensure the reliability and trustworthiness of machine learning models. Especially in industrial settings, where sensors produce large amounts of highly volatile data, a one-time measurement of the data quality is not sufficient since errors in new data should be detected as early as possible. Thus, in this paper, we present DaQL (Data Quality Library), a generally-applicable tool to continuously monitor the quality of data to increase the prediction accuracy of machine learning models. We demonstrate and evaluate DaQL within an industrial real-world machine learning application at Siemens.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
2.
Zurück zum Zitat Chapman, A.D.: Principles of data quality. Technical report, Global Biodiversity Information Facility Material (2005) Chapman, A.D.: Principles of data quality. Technical report, Global Biodiversity Information Facility Material (2005)
3.
Zurück zum Zitat Chasparis, G., Zellinger, W., Haunschmid, V., Riedenbauer, M., Stumptner, R.: On the optimization of material usage in power transformer manufacturing. In: Proceedings of the 8th International Conference on Intelligent Systems. IEEE (2016) Chasparis, G., Zellinger, W., Haunschmid, V., Riedenbauer, M., Stumptner, R.: On the optimization of material usage in power transformer manufacturing. In: Proceedings of the 8th International Conference on Intelligent Systems. IEEE (2016)
4.
Zurück zum Zitat Ehrlinger, L., Werth, B., Wöß, W.: Automated continuous data quality measurement with QuaIIe. Int. J. Adv. Softw. 11(3 & 4), 400–417 (2018) Ehrlinger, L., Werth, B., Wöß, W.: Automated continuous data quality measurement with QuaIIe. Int. J. Adv. Softw. 11(3 & 4), 400–417 (2018)
5.
Zurück zum Zitat Ehrlinger, L., Wöß, W.: Automated data quality monitoring. In: Proceedings of the 22nd MIT International Conference on Information Quality (ICIQ 2017), pp. 15.1–15.9 (2017) Ehrlinger, L., Wöß, W.: Automated data quality monitoring. In: Proceedings of the 22nd MIT International Conference on Information Quality (ICIQ 2017), pp. 15.1–15.9 (2017)
6.
Zurück zum Zitat Gerstl, A., Karisch, S.E.: Cost optimization for the slitting of core laminations for power transformers. Ann. Oper. Res. 69, 157–169 (1997)CrossRef Gerstl, A., Karisch, S.E.: Cost optimization for the slitting of core laminations for power transformers. Ann. Oper. Res. 69, 157–169 (1997)CrossRef
7.
Zurück zum Zitat Pigott, T.D.: A review of methods for missing data. Educ. Res. Eval. 7(4), 353–383 (2001)CrossRef Pigott, T.D.: A review of methods for missing data. Educ. Res. Eval. 7(4), 353–383 (2001)CrossRef
8.
Zurück zum Zitat Pushkarev, V., Neumann, H., Varol, C., Talburt, J.R.: An overview of open source data quality tools. In: Proceedings of Information and Knowledge Engineering Conference, pp. 370–376 (2010) Pushkarev, V., Neumann, H., Varol, C., Talburt, J.R.: An overview of open source data quality tools. In: Proceedings of Information and Knowledge Engineering Conference, pp. 370–376 (2010)
9.
Zurück zum Zitat Redman, T.C.: The impact of poor data quality on the typical enterprise. Commun. ACM 41(2), 79–82 (1998)CrossRef Redman, T.C.: The impact of poor data quality on the typical enterprise. Commun. ACM 41(2), 79–82 (1998)CrossRef
10.
Zurück zum Zitat Sebastian-Coleman, L.: Measuring Data Quality for Ongoing Improvement: A Data Quality Assessment Framework. Newnes, New York (2012) Sebastian-Coleman, L.: Measuring Data Quality for Ongoing Improvement: A Data Quality Assessment Framework. Newnes, New York (2012)
11.
Zurück zum Zitat Selvege, M.Y., Judah, S., Jain, A.: Magic quadrant for data quality tools. Technical report, Gartner, October 2017 Selvege, M.Y., Judah, S., Jain, A.: Magic quadrant for data quality tools. Technical report, Gartner, October 2017
12.
Zurück zum Zitat Sessions, V., Valtorta, M.: The effects of data quality on machine learning algorithms. In: Proceedings of the 11th International Conference on Information Quality (ICIQ 2006), vol. 6, pp. 485–498 (2006) Sessions, V., Valtorta, M.: The effects of data quality on machine learning algorithms. In: Proceedings of the 11th International Conference on Information Quality (ICIQ 2006), vol. 6, pp. 485–498 (2006)
13.
Zurück zum Zitat Wang, R.Y., Strong, D.M.: Beyond accuracy: what data quality means to data consumers. J. Manage. Inf. Syst. 12(4), 5–33 (1996)CrossRef Wang, R.Y., Strong, D.M.: Beyond accuracy: what data quality means to data consumers. J. Manage. Inf. Syst. 12(4), 5–33 (1996)CrossRef
Metadaten
Titel
A DaQL to Monitor Data Quality in Machine Learning Applications
verfasst von
Lisa Ehrlinger
Verena Haunschmid
Davide Palazzini
Christian Lettner
Copyright-Jahr
2019
DOI
https://doi.org/10.1007/978-3-030-27615-7_17

Premium Partner