Skip to main content

2020 | OriginalPaper | Buchkapitel

Data Quality for Medical Data Lakelands

verfasst von : Johann Eder, Vladimir A. Shekhovtsov

Erschienen in: Future Data and Security Engineering

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Medical research requires biological material and data. Medical studies based on data with unknown or questionable quality are useless or even dangerous, as evidenced by recent examples of withdrawn studies. Medical data sets consist of highly sensitive personal data, which has to be protected carefully and is only available for research after approval of ethics committees. These data sets, therefore, cannot be stored in central data warehouses or even in a common data lake but remain in a multitude of data lakes, which we call Data Lakelands. An example for such a Medical Data Lakelands are the collections of samples and their annotations in the European federation of biobanks (BBMRI-ERIC). We discuss the quality dimensions for data sets for medical research and the requirements for providers of data sets in terms of both quality of meta-data and meta-data of data quality documentation with the aim to support researchers to effectively and efficiently identify suitable data sets for medical studies.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat LOINC Users’ Guide, version 2.68. loinc.org (2020) LOINC Users’ Guide, version 2.68. loinc.org (2020)
3.
Zurück zum Zitat Almeida, J., Santos, M., Polónia, D., Rocha, N.P.: Analysis of the data consistency of medical imaging information systems: an exploratory study. Procedia Comput. Sci. 164, 508–515 (2019)CrossRef Almeida, J., Santos, M., Polónia, D., Rocha, N.P.: Analysis of the data consistency of medical imaging information systems: an exploratory study. Procedia Comput. Sci. 164, 508–515 (2019)CrossRef
4.
Zurück zum Zitat Asslaber, M., et al.: The genome Austria tissue bank (GATIB). Pathology 74, 251–258 (2007) Asslaber, M., et al.: The genome Austria tissue bank (GATIB). Pathology 74, 251–258 (2007)
5.
Zurück zum Zitat Batini, C., Scannapieco, M.: Data and information quality: dimensions, principles and techniques (2016) Batini, C., Scannapieco, M.: Data and information quality: dimensions, principles and techniques (2016)
6.
Zurück zum Zitat Brackenbury, W., et al.: Draining the data swamp: a similarity-based approach. In: Proceedings of the Workshop on Human-In-the-Loop Data Analytics, pp. 1–7 (2018) Brackenbury, W., et al.: Draining the data swamp: a similarity-based approach. In: Proceedings of the Workshop on Human-In-the-Loop Data Analytics, pp. 1–7 (2018)
7.
Zurück zum Zitat Bruce, T.R., Hillmann, D.I.: The continuum of metadata quality: defining, expressing, exploiting. In: Metadata in Practice, ALA editions (2004) Bruce, T.R., Hillmann, D.I.: The continuum of metadata quality: defining, expressing, exploiting. In: Metadata in Practice, ALA editions (2004)
8.
9.
Zurück zum Zitat Eder, J., Gottweis, H., Zatloukal, K.: IT solutions for privacy protection in biobanking. Public Health Genomics 15, 254–262 (2012)CrossRef Eder, J., Gottweis, H., Zatloukal, K.: IT solutions for privacy protection in biobanking. Public Health Genomics 15, 254–262 (2012)CrossRef
12.
Zurück zum Zitat Golfarelli, M., Rizzi, S.: From star schemas to big data: 20\(+\) years of data warehouse research. In: Flesca, S., Greco, S., Masciari, E., Saccà, D. (eds.) A Comprehensive Guide Through the Italian Database Research Over the Last 25 Years. SBD, vol. 31, pp. 93–107. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-61893-7_6CrossRef Golfarelli, M., Rizzi, S.: From star schemas to big data: 20\(+\) years of data warehouse research. In: Flesca, S., Greco, S., Masciari, E., Saccà, D. (eds.) A Comprehensive Guide Through the Italian Database Research Over the Last 25 Years. SBD, vol. 31, pp. 93–107. Springer, Cham (2018). https://​doi.​org/​10.​1007/​978-3-319-61893-7_​6CrossRef
13.
Zurück zum Zitat Greiver, M., Barnsley, J., Glazier, R.H., Harvey, B.J., Moineddin, R.: Measuring data reliability for preventive services in electronic medical records. BMC Health Serv. Res. 12(1), 116 (2012)CrossRef Greiver, M., Barnsley, J., Glazier, R.H., Harvey, B.J., Moineddin, R.: Measuring data reliability for preventive services in electronic medical records. BMC Health Serv. Res. 12(1), 116 (2012)CrossRef
14.
Zurück zum Zitat Hai, R., Geisler, S., Quix, C.: Constance: an intelligent data lake system. In: Proceedings of the 2016 International Conference on Management of Data, pp. 2097–2100 (2016) Hai, R., Geisler, S., Quix, C.: Constance: an intelligent data lake system. In: Proceedings of the 2016 International Conference on Management of Data, pp. 2097–2100 (2016)
16.
Zurück zum Zitat Henriksen, A., et al.: Using fitness trackers and smartwatches to measure physical activity in research: analysis of consumer wrist-worn wearables. J. Med. Internet Res. 20(3), e110 (2018)CrossRef Henriksen, A., et al.: Using fitness trackers and smartwatches to measure physical activity in research: analysis of consumer wrist-worn wearables. J. Med. Internet Res. 20(3), e110 (2018)CrossRef
17.
Zurück zum Zitat Hofer-Picout, P., et al.: Conception and implementation of an Austrian biobank directory integration framework. Biopreservation Biobanking 15(4), 332–340 (2017)CrossRef Hofer-Picout, P., et al.: Conception and implementation of an Austrian biobank directory integration framework. Biopreservation Biobanking 15(4), 332–340 (2017)CrossRef
18.
Zurück zum Zitat Holub, P., Swertz, M., Reihs, R., van Enckevort, D., Müller, H., Litton, J.-E.: BBMRI-ERIC directory: 515 biobanks with over 60 million biological samples. Biopreservation biobanking 14(6), 559–562 (2016)CrossRef Holub, P., Swertz, M., Reihs, R., van Enckevort, D., Müller, H., Litton, J.-E.: BBMRI-ERIC directory: 515 biobanks with over 60 million biological samples. Biopreservation biobanking 14(6), 559–562 (2016)CrossRef
19.
Zurück zum Zitat Inmon, B.: Data lake architecture: designing the data lake and avoiding the garbage dump. Technics publications (2016) Inmon, B.: Data lake architecture: designing the data lake and avoiding the garbage dump. Technics publications (2016)
20.
Zurück zum Zitat Kalfoglou, Y., Schorlemmer, M.: Ontology mapping: the state of the art. Knowl. Eng. Rev. 18(1), 1–31 (2003)CrossRef Kalfoglou, Y., Schorlemmer, M.: Ontology mapping: the state of the art. Knowl. Eng. Rev. 18(1), 1–31 (2003)CrossRef
21.
Zurück zum Zitat Király, P., Büchler, M.: Measuring completeness as metadata quality metric in Europeana. In: 2018 IEEE International Conference on Big Data (Big Data), pp. 2711–2720. IEEE (2018) Király, P., Büchler, M.: Measuring completeness as metadata quality metric in Europeana. In: 2018 IEEE International Conference on Big Data (Big Data), pp. 2711–2720. IEEE (2018)
22.
Zurück zum Zitat Kyriacou, D.N.: Reliability and validity of diagnostic tests. Acad. Emerg. Med. 8(4), 404–405 (2001)CrossRef Kyriacou, D.N.: Reliability and validity of diagnostic tests. Acad. Emerg. Med. 8(4), 404–405 (2001)CrossRef
23.
Zurück zum Zitat Lemke, A.A., Wolf, W.A., Hebert-Beirne, J., Smith, M.E.: Public and biobank participant attitudes toward genetic research participation and data sharing. Public Health Genomics 13(6), 368–377 (2010)CrossRef Lemke, A.A., Wolf, W.A., Hebert-Beirne, J., Smith, M.E.: Public and biobank participant attitudes toward genetic research participation and data sharing. Public Health Genomics 13(6), 368–377 (2010)CrossRef
24.
Zurück zum Zitat Litton, J.-E.: BBMRI-ERIC. Bioreservation Biobanking 16(3) (2018) Litton, J.-E.: BBMRI-ERIC. Bioreservation Biobanking 16(3) (2018)
25.
Zurück zum Zitat Lorence, D.: Measuring disparities in information capture timeliness across healthcare settings: effects on data quality. J. Med. Syst. 27(5), 425–433 (2003)CrossRef Lorence, D.: Measuring disparities in information capture timeliness across healthcare settings: effects on data quality. J. Med. Syst. 27(5), 425–433 (2003)CrossRef
26.
Zurück zum Zitat Lozano, L.M., García-Cueto, E., Muñiz, J.: Effect of the number of response categories on the reliability and validity of rating scales. Methodology 4(2), 73–79 (2008)CrossRef Lozano, L.M., García-Cueto, E., Muñiz, J.: Effect of the number of response categories on the reliability and validity of rating scales. Methodology 4(2), 73–79 (2008)CrossRef
27.
Zurück zum Zitat Mandrekar, J.N.: Simple statistical measures for diagnostic accuracy assessment. J. Thorac. Oncol. 5(6), 763–764 (2010)CrossRef Mandrekar, J.N.: Simple statistical measures for diagnostic accuracy assessment. J. Thorac. Oncol. 5(6), 763–764 (2010)CrossRef
28.
Zurück zum Zitat Margaritopoulos, M., Margaritopoulos, T., Mavridis, I., Manitsaris, A.: Quantifying and measuring metadata completeness. J. Am. Soc. Inf. Sci. Technol. 63(4), 724–737 (2012)CrossRef Margaritopoulos, M., Margaritopoulos, T., Mavridis, I., Manitsaris, A.: Quantifying and measuring metadata completeness. J. Am. Soc. Inf. Sci. Technol. 63(4), 724–737 (2012)CrossRef
29.
Zurück zum Zitat Mavrogiorgou, A., Kiourtis, A., Kyriazis, D.: Delivering reliability of data sources in IoT healthcare ecosystems. In: 2019 25th Conference of Open Innovations Association (FRUCT), pp. 211–219. IEEE (2019) Mavrogiorgou, A., Kiourtis, A., Kyriazis, D.: Delivering reliability of data sources in IoT healthcare ecosystems. In: 2019 25th Conference of Open Innovations Association (FRUCT), pp. 211–219. IEEE (2019)
30.
Zurück zum Zitat Merino-Martinez, R., et al.: Toward global biobank integration by implementation of the minimum information about biobank data sharing (MIABIS 2.0 Core). Biopreservation Biobanking 14(4), 298–306 (2016)CrossRef Merino-Martinez, R., et al.: Toward global biobank integration by implementation of the minimum information about biobank data sharing (MIABIS 2.0 Core). Biopreservation Biobanking 14(4), 298–306 (2016)CrossRef
31.
Zurück zum Zitat Müller, H., Dagher, G., Loibner, M., Stumptner, C., Kungl, P., Zatloukal, K.: Biobanks for life sciences and personalized medicine: importance of standardization, biosafety, biosecurity, and data management. Curr. Opin. Biotechnol. 65, 45–51 (2020)CrossRef Müller, H., Dagher, G., Loibner, M., Stumptner, C., Kungl, P., Zatloukal, K.: Biobanks for life sciences and personalized medicine: importance of standardization, biosafety, biosecurity, and data management. Curr. Opin. Biotechnol. 65, 45–51 (2020)CrossRef
33.
Zurück zum Zitat Nargesian, F., Zhu, E., Miller, R.J., Pu, K.Q., Arocena, P.C.: Data lake management: challenges and opportunities. Proc. VLDB Endow. 12(12), 1986–1989 (2019)CrossRef Nargesian, F., Zhu, E., Miller, R.J., Pu, K.Q., Arocena, P.C.: Data lake management: challenges and opportunities. Proc. VLDB Endow. 12(12), 1986–1989 (2019)CrossRef
34.
Zurück zum Zitat Olson, J.E.: Data Quality: The Accuracy Dimension. Morgan Kaufmann, Burlington (2003) Olson, J.E.: Data Quality: The Accuracy Dimension. Morgan Kaufmann, Burlington (2003)
35.
Zurück zum Zitat Pichler, H., Eder, J.: Supporting the donation of health records to biobanks for medical research. In: Holzinger, A., Goebel, R., Mengel, M., Müller, H. (eds.) Artificial Intelligence and Machine Learning for Digital Pathology. LNCS (LNAI), vol. 12090, pp. 38–55. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-50402-1_3CrossRef Pichler, H., Eder, J.: Supporting the donation of health records to biobanks for medical research. In: Holzinger, A., Goebel, R., Mengel, M., Müller, H. (eds.) Artificial Intelligence and Machine Learning for Digital Pathology. LNCS (LNAI), vol. 12090, pp. 38–55. Springer, Cham (2020). https://​doi.​org/​10.​1007/​978-3-030-50402-1_​3CrossRef
36.
Zurück zum Zitat Radulovic, F., Mihindukulasooriya, N., García-Castro, R., Gómez-Pérez, A.: A comprehensive quality model for Linked Data. Semantic Web, Preprint (2017) Radulovic, F., Mihindukulasooriya, N., García-Castro, R., Gómez-Pérez, A.: A comprehensive quality model for Linked Data. Semantic Web, Preprint (2017)
38.
Zurück zum Zitat Skatova, A., Ng, E., Goulding, J.: Data donation: sharing personal data for public good. Application of Digital Innovation. N-Lab, London, England (2014) Skatova, A., Ng, E., Goulding, J.: Data donation: sharing personal data for public good. Application of Digital Innovation. N-Lab, London, England (2014)
39.
Zurück zum Zitat Spjuth, O., et al.: Harmonising and linking biomedical and clinical data across disparate data archives to enable integrative cross-biobank research. Eur. J. Hum. Genet. 24(4), 521–528 (2016)CrossRef Spjuth, O., et al.: Harmonising and linking biomedical and clinical data across disparate data archives to enable integrative cross-biobank research. Eur. J. Hum. Genet. 24(4), 521–528 (2016)CrossRef
42.
Zurück zum Zitat Stvilia, B., Gasser, L., Twidale, M.B., Shreeves, S.L., Cole, T.W.: Metadata quality for federated collections. In: Proceedings of the Ninth International Conference on Information Quality (ICIQ-04), pp. 111–125 (2004) Stvilia, B., Gasser, L., Twidale, M.B., Shreeves, S.L., Cole, T.W.: Metadata quality for federated collections. In: Proceedings of the Ninth International Conference on Information Quality (ICIQ-04), pp. 111–125 (2004)
43.
Zurück zum Zitat Tayi, G.K., Ballou, D.P.: Examining data quality. Commun. ACM 41(2), 54–57 (1998)CrossRef Tayi, G.K., Ballou, D.P.: Examining data quality. Commun. ACM 41(2), 54–57 (1998)CrossRef
45.
Zurück zum Zitat van Ommen, G.-J.B., et al.: BBMRI-ERIC as a resource for pharmaceutical and life science industries: the development of biobank-based expert Centres. Eur. J. Hum. Genet. 23(7), 893–900 (2015)CrossRef van Ommen, G.-J.B., et al.: BBMRI-ERIC as a resource for pharmaceutical and life science industries: the development of biobank-based expert Centres. Eur. J. Hum. Genet. 23(7), 893–900 (2015)CrossRef
47.
Zurück zum Zitat Zatloukal, K., Hainaut, P.: Human tissue biobanks as instruments for drug discovery and development: impact on personalized medicine. Biomark. Med. 4(6), 895–903 (2010)CrossRef Zatloukal, K., Hainaut, P.: Human tissue biobanks as instruments for drug discovery and development: impact on personalized medicine. Biomark. Med. 4(6), 895–903 (2010)CrossRef
Metadaten
Titel
Data Quality for Medical Data Lakelands
verfasst von
Johann Eder
Vladimir A. Shekhovtsov
Copyright-Jahr
2020
DOI
https://doi.org/10.1007/978-3-030-63924-2_2

Premium Partner