Skip to main content

2019 | OriginalPaper | Buchkapitel

Automated Schema Quality Measurement in Large-Scale Information Systems

verfasst von : Lisa Ehrlinger, Wolfram Wöß

Erschienen in: Data Quality and Trust in Big Data

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Assessing the quality of information system schemas is crucial, because an unoptimized or erroneous schema design has a strong impact on the quality of the stored data, e.g., it may lead to inconsistencies and anomalies at the data-level. Even if the initial schema had an ideal design, changes during the life cycle can negatively affect the schema quality and have to be tackled. Especially in Big Data environments there are two major challenges: large schemas, where manual verification of schema and data quality is very arduous, and the integration of heterogeneous schemas from different data models, whose quality cannot be compared directly. Thus, we present a domain-independent approach for automatically measuring the quality of large and heterogeneous (logical) schemas. In contrast to existing approaches, we provide a fully automatable workflow that also enables regular reassessment. Our implementation allows to measure the quality dimensions correctness, completeness, pertinence, minimality, readability, and normalization.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Redman, T.C.: The impact of poor data quality on the typical enterprise. Commun. ACM 41(2), 79–82 (1998)CrossRef Redman, T.C.: The impact of poor data quality on the typical enterprise. Commun. ACM 41(2), 79–82 (1998)CrossRef
2.
Zurück zum Zitat Otto, B., Österle, H.: Corporate Data Quality: Prerequisite for Successful Business Models. Springer Gabler, Berlin (2016)CrossRef Otto, B., Österle, H.: Corporate Data Quality: Prerequisite for Successful Business Models. Springer Gabler, Berlin (2016)CrossRef
4.
Zurück zum Zitat Wand, Y., Wang, R.Y.: Anchoring data quality dimensions in ontological foundations. Commun. ACM 39(11), 86–95 (1996)CrossRef Wand, Y., Wang, R.Y.: Anchoring data quality dimensions in ontological foundations. Commun. ACM 39(11), 86–95 (1996)CrossRef
5.
Zurück zum Zitat Batini, C., Scannapieco, M.: Data and Information Quality: Concepts, Methodologies and Techniques. Springer (2016) Batini, C., Scannapieco, M.: Data and Information Quality: Concepts, Methodologies and Techniques. Springer (2016)
6.
Zurück zum Zitat Vossen, G.: Datenmodelle, Datenbanksprachen und Datenbankmanagementsysteme [Data Models, Database Languages, and Database Management Systems]. Oldenbourg Verlag (2008) Vossen, G.: Datenmodelle, Datenbanksprachen und Datenbankmanagementsysteme [Data Models, Database Languages, and Database Management Systems]. Oldenbourg Verlag (2008)
7.
Zurück zum Zitat Kruse, S.: Scalable data profiling - distributed discovery and analysis of structural metadata. Ph.D. thesis, Universität Potsdam (2018) Kruse, S.: Scalable data profiling - distributed discovery and analysis of structural metadata. Ph.D. thesis, Universität Potsdam (2018)
8.
Zurück zum Zitat Coelho, F., Aillos, A., Pilot, S., Valeev, S.: On the quality of relational database schemas in open-source software. Int. J. Adv. Softw. 4(3 & 4), 11 (2012) Coelho, F., Aillos, A., Pilot, S., Valeev, S.: On the quality of relational database schemas in open-source software. Int. J. Adv. Softw. 4(3 & 4), 11 (2012)
9.
Zurück zum Zitat Batista, M.C.M., Salgado, A.C.: Information quality measurement in data integration schemas. In: Proceedings of the Fifth International Workshop on Quality in Databases, QDB 2007, at the VLDB 2007 Conference, pp. 61–72. ACM (2007) Batista, M.C.M., Salgado, A.C.: Information quality measurement in data integration schemas. In: Proceedings of the Fifth International Workshop on Quality in Databases, QDB 2007, at the VLDB 2007 Conference, pp. 61–72. ACM (2007)
10.
Zurück zum Zitat Ehrlinger, L., Werth, B., Wöß, W.: QuaIIe: a data quality assessment tool for integrated information systems. In: Proceedings of the Tenth International Conference on Advances in Databases, Knowledge, and Data Applications (DBKDA 2018), pp. 21–31 (2018) Ehrlinger, L., Werth, B., Wöß, W.: QuaIIe: a data quality assessment tool for integrated information systems. In: Proceedings of the Tenth International Conference on Advances in Databases, Knowledge, and Data Applications (DBKDA 2018), pp. 21–31 (2018)
11.
Zurück zum Zitat Herden, O.: Measuring quality of database schema by reviewing - concept, criteria and tool. In: Proceedings of 5th International Workshop on Quantitative Approaches in Object-Oriented Software Engineering, pp. 59–70 (2001) Herden, O.: Measuring quality of database schema by reviewing - concept, criteria and tool. In: Proceedings of 5th International Workshop on Quantitative Approaches in Object-Oriented Software Engineering, pp. 59–70 (2001)
13.
Zurück zum Zitat Feilmayr, C., Wöß, W.: An analysis of ontologies and their success factors for application to business. Data Knowl. Eng. 101, 1–23 (2016)CrossRef Feilmayr, C., Wöß, W.: An analysis of ontologies and their success factors for application to business. Data Knowl. Eng. 101, 1–23 (2016)CrossRef
14.
Zurück zum Zitat Euzenat, J., Shvaiko, P.: Ontology Matching. Springer-Verlag New York Inc., Secaucus (2007)MATH Euzenat, J., Shvaiko, P.: Ontology Matching. Springer-Verlag New York Inc., Secaucus (2007)MATH
15.
Zurück zum Zitat Melnik, S., Garcia-Molina, H., Rahm, E.: Similarity flooding: a versatile graph matching algorithm and its application to schema matching. In: Proceedings of the 18th International Conference on Data Engineering, ICDE 2002, pp. 117–128. IEEE Computer Society, Washington, DC (2002) Melnik, S., Garcia-Molina, H., Rahm, E.: Similarity flooding: a versatile graph matching algorithm and its application to schema matching. In: Proceedings of the 18th International Conference on Data Engineering, ICDE 2002, pp. 117–128. IEEE Computer Society, Washington, DC (2002)
16.
Zurück zum Zitat Ehrlinger, L., Wöß, W.: Semi-automatically generated hybrid ontologies for information integration. In: Joint Proceedings of the Posters and Demos Track of 11th International Conference on Semantic Systems, pp. 100–104. CEUR Workshop Proceedings (2015) Ehrlinger, L., Wöß, W.: Semi-automatically generated hybrid ontologies for information integration. In: Joint Proceedings of the Posters and Demos Track of 11th International Conference on Semantic Systems, pp. 100–104. CEUR Workshop Proceedings (2015)
17.
Zurück zum Zitat Cohen, W., Ravikumar, P., Fienberg, S.: A comparison of string metrics for matching names and records. In: KDD Workshop on Data Cleaning and Object Consolidation, vol. 3, pp. 73–78 (2003) Cohen, W., Ravikumar, P., Fienberg, S.: A comparison of string metrics for matching names and records. In: KDD Workshop on Data Cleaning and Object Consolidation, vol. 3, pp. 73–78 (2003)
18.
Zurück zum Zitat Logan, J.R., Gorman, P.N., Middleton, B.: Measuring the quality of medical records: a method for comparing completeness and correctness of clinical encounter data. In: American Medical Informatics Association Annual Symposium, AMIA 2001, Washington, DC, USA, 3–7 November 2001, pp. 408–4012 (2001) Logan, J.R., Gorman, P.N., Middleton, B.: Measuring the quality of medical records: a method for comparing completeness and correctness of clinical encounter data. In: American Medical Informatics Association Annual Symposium, AMIA 2001, Washington, DC, USA, 3–7 November 2001, pp. 408–4012 (2001)
19.
Zurück zum Zitat Naumann, F., Freytag, J.C., Leser, U.: Completeness of integrated information sources. Inf. Syst. 29(7), 583–615 (2004)CrossRef Naumann, F., Freytag, J.C., Leser, U.: Completeness of integrated information sources. Inf. Syst. 29(7), 583–615 (2004)CrossRef
20.
Zurück zum Zitat Heinrich, B., Hristova, D., Klier, M., Schiller, A., Szubartowicz, M.: Requirements for data quality metrics. J. Data Inf. Qual. 9(2), 12:1–12:32 (2018) Heinrich, B., Hristova, D., Klier, M., Schiller, A., Szubartowicz, M.: Requirements for data quality metrics. J. Data Inf. Qual. 9(2), 12:1–12:32 (2018)
21.
Zurück zum Zitat Ehrlinger, L., Wöß, W.: A novel data quality metric for minimality. In: Hacid, H., Sheng, Q.Z., Yoshida, T., Sarkheyli, A., Zhou, R. (eds.) WISE 2018. LNCS, vol. 10042, pp. 1–15. Springer, Cham (2019) Ehrlinger, L., Wöß, W.: A novel data quality metric for minimality. In: Hacid, H., Sheng, Q.Z., Yoshida, T., Sarkheyli, A., Zhou, R. (eds.) WISE 2018. LNCS, vol. 10042, pp. 1–15. Springer, Cham (2019)
23.
Zurück zum Zitat Sadiq, S., et al.: Data quality: the role of empiricism. ACM SIGMOD Rec. 46(4), 35–43 (2018)CrossRef Sadiq, S., et al.: Data quality: the role of empiricism. ACM SIGMOD Rec. 46(4), 35–43 (2018)CrossRef
24.
Zurück zum Zitat Batini, C., Lenzerini, M., Navathe, S.B.: A comparative analysis of methodologies for database schema integration. ACM Comput. Surv. 18(4), 323–364 (1986)CrossRef Batini, C., Lenzerini, M., Navathe, S.B.: A comparative analysis of methodologies for database schema integration. ACM Comput. Surv. 18(4), 323–364 (1986)CrossRef
Metadaten
Titel
Automated Schema Quality Measurement in Large-Scale Information Systems
verfasst von
Lisa Ehrlinger
Wolfram Wöß
Copyright-Jahr
2019
DOI
https://doi.org/10.1007/978-3-030-19143-6_2