Skip to main content

2015 | OriginalPaper | Buchkapitel

Capturing Interactive Data Transformation Operations Using Provenance Workflows

verfasst von : Tope Omitola, André Freitas, Edward Curry, Séan O’Riain, Nicholas Gibbins, Nigel Shadbolt

Erschienen in: The Semantic Web: ESWC 2012 Satellite Events

Verlag: Springer Berlin Heidelberg

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The ready availability of data is leading to the increased opportunity of their re-use for new applications and for analyses. Most of these data are not necessarily in the format users want, are usually heterogeneous, and highly dynamic, and this necessitates data transformation efforts to re-purpose them. Interactive data transformation (IDT) tools are becoming easily available to lower these barriers to data transformation efforts. This paper describes a principled way to capture data lineage of interactive data transformation processes. We provide a formal model of IDT, its mapping to a provenance representation, and its implementation and validation on Google Refine. Provision of the data transformation process sequences allows assessment of data quality and ensures portability between IDT and other data transformation platforms. The proposed model showed a high level of coverage against a set of requirements used for evaluating systems that provide provenance management solutions.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Buneman, P.: Curated databases. In: Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data (SIGMOD) (2006) Buneman, P.: Curated databases. In: Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data (SIGMOD) (2006)
2.
Zurück zum Zitat Curry, E., Freitas, A., O’Riain, S.: The role of community-driven data curation for enterprises. In: Wood, D. (ed.) Linking Enterprise Data, pp. 25–47. Springer, Boston (2010)CrossRef Curry, E., Freitas, A., O’Riain, S.: The role of community-driven data curation for enterprises. In: Wood, D. (ed.) Linking Enterprise Data, pp. 25–47. Springer, Boston (2010)CrossRef
3.
Zurück zum Zitat Buneman, P., Chapman, A., Cheney, J., Vansummeren, S.: A provenance model for manually curated data. In: Moreau, L., Foster, I. (eds.) IPAW 2006. LNCS, vol. 4145, pp. 162–170. Springer, Heidelberg (2006) CrossRef Buneman, P., Chapman, A., Cheney, J., Vansummeren, S.: A provenance model for manually curated data. In: Moreau, L., Foster, I. (eds.) IPAW 2006. LNCS, vol. 4145, pp. 162–170. Springer, Heidelberg (2006) CrossRef
4.
Zurück zum Zitat Kandel, S., Paepcke, A., Hellerstein, J., Heer, J.: Wrangler: interactive visual specification of data transformation scripts. In: ACM Human Factors in Computing Systems (CHI) (2011) Kandel, S., Paepcke, A., Hellerstein, J., Heer, J.: Wrangler: interactive visual specification of data transformation scripts. In: ACM Human Factors in Computing Systems (CHI) (2011)
5.
Zurück zum Zitat Raman, V., Hellerstein, J.: Potter’s wheel: an interactive data cleaning system. In: Proceedings of the 27th International Conference on Very Large Data Bases (2001) Raman, V., Hellerstein, J.: Potter’s wheel: an interactive data cleaning system. In: Proceedings of the 27th International Conference on Very Large Data Bases (2001)
6.
Zurück zum Zitat Freitas, A., Kämpgen, B., Oliveira, J.G., O’Riain, S., Curry, E.: Representing interoperable provenance descriptions for ETL workflows. In: Proceedings of the 3rd International Workshop on Role of Semantic Web in Provenance Management (SWPM 2012), Extended Semantic Web Conference (ESWC), Heraklion, Crete (2012) Freitas, A., Kämpgen, B., Oliveira, J.G., O’Riain, S., Curry, E.: Representing interoperable provenance descriptions for ETL workflows. In: Proceedings of the 3rd International Workshop on Role of Semantic Web in Provenance Management (SWPM 2012), Extended Semantic Web Conference (ESWC), Heraklion, Crete (2012)
7.
Zurück zum Zitat Davidson, S., Kosky, A., Buneman, P.: Semantics of database transformations. In: Thalheim, B., Libkin, L. (eds.) Semantics in Databases 1995. LNCS, vol. 1358, pp. 55–91. Springer, Heidelberg (1998) CrossRef Davidson, S., Kosky, A., Buneman, P.: Semantics of database transformations. In: Thalheim, B., Libkin, L. (eds.) Semantics in Databases 1995. LNCS, vol. 1358, pp. 55–91. Springer, Heidelberg (1998) CrossRef
8.
Zurück zum Zitat Omitola, T., Zuo, L., Gutteridge, C., Millard, I., Glaser, H., Gibbins, N., Shadbolt, N.: Tracing the provenance of linked data using voiD. In: The International Conference on Web Intelligence, Mining and Semantics (WIMS 2011) (2011) Omitola, T., Zuo, L., Gutteridge, C., Millard, I., Glaser, H., Gibbins, N., Shadbolt, N.: Tracing the provenance of linked data using voiD. In: The International Conference on Web Intelligence, Mining and Semantics (WIMS 2011) (2011)
9.
Zurück zum Zitat Deelman, E., Gannon, D., Shields, M., Taylor, I.: Workflows and e-Science: an overview of workflow system features and capabilities. Future Gener. Comput. Syst. 25(5), 528–540 (2009)CrossRef Deelman, E., Gannon, D., Shields, M., Taylor, I.: Workflows and e-Science: an overview of workflow system features and capabilities. Future Gener. Comput. Syst. 25(5), 528–540 (2009)CrossRef
10.
Zurück zum Zitat Newhouse, S., Schopf, J.M., Richards, A., Atkinson, M.: Study of user priorities for e-Infrastructure for e-Research (SUPER). In: UK e-Science Technical report Series Report UKeS-2007-01 (2007) Newhouse, S., Schopf, J.M., Richards, A., Atkinson, M.: Study of user priorities for e-Infrastructure for e-Research (SUPER). In: UK e-Science Technical report Series Report UKeS-2007-01 (2007)
Metadaten
Titel
Capturing Interactive Data Transformation Operations Using Provenance Workflows
verfasst von
Tope Omitola
André Freitas
Edward Curry
Séan O’Riain
Nicholas Gibbins
Nigel Shadbolt
Copyright-Jahr
2015
Verlag
Springer Berlin Heidelberg
DOI
https://doi.org/10.1007/978-3-662-46641-4_3

Neuer Inhalt