Skip to main content

2015 | OriginalPaper | Buchkapitel

Looking Inside the Black-Box: Capturing Data Provenance Using Dynamic Instrumentation

verfasst von : Manolis Stamatogiannakis, Paul Groth, Herbert Bos

Erschienen in: Provenance and Annotation of Data and Processes

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Knowing the provenance of a data item helps in ascertaining its trustworthiness. Various approaches have been proposed to track or infer data provenance. However, these approaches either treat an executing program as a black-box, limiting the fidelity of the captured provenance, or require developers to modify the program to make it provenance-aware. In this paper, we introduce DataTracker, a new approach to capturing data provenance based on taint tracking, a technique widely used in the security and reverse engineering fields. Our system is able to identify data provenance relations through dynamic instrumentation of unmodified binaries, without requiring access to, or knowledge of, their source code. Hence, we can track provenance for a variety of well-known applications. Because DataTracker looks inside the executing program, it captures high-fidelity and accurate data provenance.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
The source code of DataTracker is available on: http://​github.​com/​m000/​dtracker.
 
4
For simplicity, we prefer the term “file” over the more accurate “file-like resource”.
 
6
A common placeholder text which has been used by typesetters since the 1500s.
 
Literatur
1.
Zurück zum Zitat Bosman, E., Slowinska, A., Bos, H.: Minemu: the world’s fastest taint tracker. In: Sommer, R., Balzarotti, D., Maier, G. (eds.) RAID 2011. LNCS, vol. 6961, pp. 1–20. Springer, Heidelberg (2011) CrossRef Bosman, E., Slowinska, A., Bos, H.: Minemu: the world’s fastest taint tracker. In: Sommer, R., Balzarotti, D., Maier, G. (eds.) RAID 2011. LNCS, vol. 6961, pp. 1–20. Springer, Heidelberg (2011) CrossRef
2.
Zurück zum Zitat Bowers, S., McPhillips, T.M., Ludaescher, B.: Provenance in collection-oriented scientific workflows. Concurr. Comput. Pract. Exper. 20(5), 519–529 (2008)CrossRef Bowers, S., McPhillips, T.M., Ludaescher, B.: Provenance in collection-oriented scientific workflows. Concurr. Comput. Pract. Exper. 20(5), 519–529 (2008)CrossRef
3.
Zurück zum Zitat Bruening, D.L.: Efficient, transparent, and comprehensive runtime code manipulation. Ph.D. thesis, MIT, Cambridge, MA, USA (2004) Bruening, D.L.: Efficient, transparent, and comprehensive runtime code manipulation. Ph.D. thesis, MIT, Cambridge, MA, USA (2004)
4.
Zurück zum Zitat Cavallaro, L., Saxena, P., Sekar, R.: On the limits of information flow techniques for malware analysis and containment. In: Zamboni, D. (ed.) DIMVA 2008. LNCS, vol. 5137, pp. 143–163. Springer, Heidelberg (2008) CrossRef Cavallaro, L., Saxena, P., Sekar, R.: On the limits of information flow techniques for malware analysis and containment. In: Zamboni, D. (ed.) DIMVA 2008. LNCS, vol. 5137, pp. 143–163. Springer, Heidelberg (2008) CrossRef
5.
Zurück zum Zitat Cheney, J., Chiticariu, L., Tan, W.C.: Provenance in databases: why, how, and where. Found. Trends Databases 1(4), 379–474 (2009)CrossRef Cheney, J., Chiticariu, L., Tan, W.C.: Provenance in databases: why, how, and where. Found. Trends Databases 1(4), 379–474 (2009)CrossRef
6.
Zurück zum Zitat Clause, J., Li, W., Orso, A.: Dytan: a generic dynamic taint analysis framework. In: Proceedings of ISSTA 2007, London, UK (2007) Clause, J., Li, W., Orso, A.: Dytan: a generic dynamic taint analysis framework. In: Proceedings of ISSTA 2007, London, UK (2007)
7.
Zurück zum Zitat Costa, M., Crowcroft, J., Castro, M., Rowstron, A., Zhou, L., Zhang, L., Barham, P.: Vigilante: end-to-end containment of internet worm epidemics. ACM TOCS 26(4), 1–68 (2008)CrossRef Costa, M., Crowcroft, J., Castro, M., Rowstron, A., Zhou, L., Zhang, L., Barham, P.: Vigilante: end-to-end containment of internet worm epidemics. ACM TOCS 26(4), 1–68 (2008)CrossRef
8.
Zurück zum Zitat Crandall, J.R., Chong, F.T.: Minos: control data attack prevention orthogonal to memory model. In: Proceedings of MICRO 37, Portland, OR, USA (2004) Crandall, J.R., Chong, F.T.: Minos: control data attack prevention orthogonal to memory model. In: Proceedings of MICRO 37, Portland, OR, USA (2004)
9.
Zurück zum Zitat Davidson, S.B., Freire, J.: Provenance and scientific workflows: challenges and opportunities. In: Proceedings of SIGMOD 2008, Vancouver, Canada (2008) Davidson, S.B., Freire, J.: Provenance and scientific workflows: challenges and opportunities. In: Proceedings of SIGMOD 2008, Vancouver, Canada (2008)
10.
Zurück zum Zitat Denning, D.E., Denning, P.J.: Certification of programs for secure information flow. Commun. ACM 20(7), 504–513 (1977)MATHCrossRef Denning, D.E., Denning, P.J.: Certification of programs for secure information flow. Commun. ACM 20(7), 504–513 (1977)MATHCrossRef
11.
Zurück zum Zitat Freire, J.-L., Silva, C.T., Callahan, S.P., Santos, E., Scheidegger, C.E., Vo, H.T.: Managing rapidly-evolving scientific workflows. In: Moreau, L., Foster, I. (eds.) IPAW 2006. LNCS, vol. 4145, pp. 10–18. Springer, Heidelberg (2006) CrossRef Freire, J.-L., Silva, C.T., Callahan, S.P., Santos, E., Scheidegger, C.E., Vo, H.T.: Managing rapidly-evolving scientific workflows. In: Moreau, L., Foster, I. (eds.) IPAW 2006. LNCS, vol. 4145, pp. 10–18. Springer, Heidelberg (2006) CrossRef
12.
Zurück zum Zitat Frew, J., Metzger, D., Slaughter, P.: Automatic capture and reconstruction of computational provenance. Concurr. Comput. Pract. Exper. 20(5), 485–496 (2008)CrossRef Frew, J., Metzger, D., Slaughter, P.: Automatic capture and reconstruction of computational provenance. Concurr. Comput. Pract. Exper. 20(5), 485–496 (2008)CrossRef
13.
Zurück zum Zitat Gessiou, E., Pappas, V., Athanasopoulos, E., Keromytis, A.D., Ioannidis, S.: Towards a universal data provenance framework using dynamic instrumentation. In: Gritzalis, D., Furnell, S., Theoharidou, M. (eds.) SEC 2012. IFIP AICT, vol. 376, pp. 103–114. Springer, Heidelberg (2012) CrossRef Gessiou, E., Pappas, V., Athanasopoulos, E., Keromytis, A.D., Ioannidis, S.: Towards a universal data provenance framework using dynamic instrumentation. In: Gritzalis, D., Furnell, S., Theoharidou, M. (eds.) SEC 2012. IFIP AICT, vol. 376, pp. 103–114. Springer, Heidelberg (2012) CrossRef
14.
Zurück zum Zitat Holland, D.A., Seltzer, M.I., Braun, U., Muniswamy-Reddy, K.K.: PASSing the provenance challenge. Concurr. Comput. Pract. Exper. 20(5), 531–540 (2008)CrossRef Holland, D.A., Seltzer, M.I., Braun, U., Muniswamy-Reddy, K.K.: PASSing the provenance challenge. Concurr. Comput. Pract. Exper. 20(5), 531–540 (2008)CrossRef
15.
Zurück zum Zitat Kang, M.G., McCamant, S., Poosankam, P., Song, D.: DTA++: dynamic taint analysis with targeted control-flow propagation. In: Proceedings of NDSS 2011, San Diego, CA, USA (2011) Kang, M.G., McCamant, S., Poosankam, P., Song, D.: DTA++: dynamic taint analysis with targeted control-flow propagation. In: Proceedings of NDSS 2011, San Diego, CA, USA (2011)
16.
Zurück zum Zitat Kemerlis, V.P., Portokalidis, G., Jee, K., Keromytis, A.D.: libdft: Practical dynamic data flow tracking for commodity systems. In: Proceedings of VEE 2012, London, UK (2012) Kemerlis, V.P., Portokalidis, G., Jee, K., Keromytis, A.D.: libdft: Practical dynamic data flow tracking for commodity systems. In: Proceedings of VEE 2012, London, UK (2012)
17.
Zurück zum Zitat Kim, J., Deelman, E., Gil, Y., Mehta, G., Ratnakar, V.: Provenance trails in the wings-pegasus system. Concurr. Comput. Pract. Exper. 20(5), 587–597 (2008)CrossRef Kim, J., Deelman, E., Gil, Y., Mehta, G., Ratnakar, V.: Provenance trails in the wings-pegasus system. Concurr. Comput. Pract. Exper. 20(5), 587–597 (2008)CrossRef
18.
Zurück zum Zitat Lattner, C., Adve, V.: LLVM: a compilation framework for lifelong program analysis & transformation. In: Proceedings of CGO 2004, Palo Alto, CA, USA (2004) Lattner, C., Adve, V.: LLVM: a compilation framework for lifelong program analysis & transformation. In: Proceedings of CGO 2004, Palo Alto, CA, USA (2004)
19.
Zurück zum Zitat Luk, C.K., Cohn, R., Muth, R., Patil, H., Klauser, A., Lowney, G., Wallace, S., Reddi, V.J., Hazelwood, K.: Pin: building customized program analysis tools with dynamic instrumentation. In: Proceedings of PLDI 2005, Chicago, IL, USA (2005) Luk, C.K., Cohn, R., Muth, R., Patil, H., Klauser, A., Lowney, G., Wallace, S., Reddi, V.J., Hazelwood, K.: Pin: building customized program analysis tools with dynamic instrumentation. In: Proceedings of PLDI 2005, Chicago, IL, USA (2005)
20.
Zurück zum Zitat Macko, P., Seltzer, M.: A General-purpose provenance library. In: Proceedings of USENIX TaPP 2012, Boston, MA, USA (2012) Macko, P., Seltzer, M.: A General-purpose provenance library. In: Proceedings of USENIX TaPP 2012, Boston, MA, USA (2012)
21.
Zurück zum Zitat Magliacane, S.: Reconstructing provenance. In: Cudré-Mauroux, P., et al. (eds.) ISWC 2012, Part II. LNCS, vol. 7650, pp. 399–406. Springer, Heidelberg (2012) CrossRef Magliacane, S.: Reconstructing provenance. In: Cudré-Mauroux, P., et al. (eds.) ISWC 2012, Part II. LNCS, vol. 7650, pp. 399–406. Springer, Heidelberg (2012) CrossRef
22.
Zurück zum Zitat McCamant, S., Ernst, M.D.: Quantitative information-flow tracking for C and related languages. Technical report, MIT-CSAIL-TR-2006-076, MIT, Cambridge, MA, USA (2006) McCamant, S., Ernst, M.D.: Quantitative information-flow tracking for C and related languages. Technical report, MIT-CSAIL-TR-2006-076, MIT, Cambridge, MA, USA (2006)
23.
Zurück zum Zitat Miles, S., Groth, P., Munroe, S., Moreau, L.: PrIMe: a methodology for developing provenance-aware applications. ACM TOSEM 20(3), 8:1–8:42 (2009) Miles, S., Groth, P., Munroe, S., Moreau, L.: PrIMe: a methodology for developing provenance-aware applications. ACM TOSEM 20(3), 8:1–8:42 (2009)
24.
25.
Zurück zum Zitat Moreau, L., Groth, P.: Provenance: an introduction to PROV. Synth. Lect. Semant. Web: Theory Technol. 3(4) (2013) Moreau, L., Groth, P.: Provenance: an introduction to PROV. Synth. Lect. Semant. Web: Theory Technol. 3(4) (2013)
26.
Zurück zum Zitat Moreau, L., Missier, P.: PROV-DM: The PROV Data Model. Recommendation REC-prov-dm-20130430, W3C (2013) Moreau, L., Missier, P.: PROV-DM: The PROV Data Model. Recommendation REC-prov-dm-20130430, W3C (2013)
27.
Zurück zum Zitat Nethercote, N., Seward, J.: Valgrind: a framework for heavyweight dynamic binary instrumentation. In: Proceedings of PLDI 2007, San Diego, CA, USA (2007) Nethercote, N., Seward, J.: Valgrind: a framework for heavyweight dynamic binary instrumentation. In: Proceedings of PLDI 2007, San Diego, CA, USA (2007)
28.
Zurück zum Zitat De Nies, T., Coppens, S., Van Deursen, D., Mannens, E., Van de Walle, R.: Automatic discovery of high-level provenance using semantic similarity. In: Groth, P., Frew, J. (eds.) IPAW 2012. LNCS, vol. 7525, pp. 97–110. Springer, Heidelberg (2012) CrossRef De Nies, T., Coppens, S., Van Deursen, D., Mannens, E., Van de Walle, R.: Automatic discovery of high-level provenance using semantic similarity. In: Groth, P., Frew, J. (eds.) IPAW 2012. LNCS, vol. 7525, pp. 97–110. Springer, Heidelberg (2012) CrossRef
29.
Zurück zum Zitat Oinn, T., Greenwood, M., et al.: Taverna: lessons in creating a workflow environment for the life sciences. Concurr. Comput. Pract. Exper. 18(10), 1067–1100 (2006)CrossRef Oinn, T., Greenwood, M., et al.: Taverna: lessons in creating a workflow environment for the life sciences. Concurr. Comput. Pract. Exper. 18(10), 1067–1100 (2006)CrossRef
30.
Zurück zum Zitat Simmhan, Y.L., Plale, B., Gannon, D.: A survey of data provenance in e-science. SIGMOD Rec. 34(3), 31–36 (2005)CrossRef Simmhan, Y.L., Plale, B., Gannon, D.: A survey of data provenance in e-science. SIGMOD Rec. 34(3), 31–36 (2005)CrossRef
31.
Zurück zum Zitat Simmhan, Y.L., Plale, B., Gannon, D.: Karma2: provenance management for data driven workflows. Int. J. Web Serv. Res. 5(2), 1–22 (2008)CrossRef Simmhan, Y.L., Plale, B., Gannon, D.: Karma2: provenance management for data driven workflows. Int. J. Web Serv. Res. 5(2), 1–22 (2008)CrossRef
32.
Zurück zum Zitat Slowinska, A., Bos, H.: Pointless tainting?: evaluating the practicality of pointer tainting. In: Proceedings of EuroSys 2009, Nuremberg, Germany (2009) Slowinska, A., Bos, H.: Pointless tainting?: evaluating the practicality of pointer tainting. In: Proceedings of EuroSys 2009, Nuremberg, Germany (2009)
33.
Zurück zum Zitat Srivastava, A., Eustace, A.: ATOM: a system for building customized program analysis tools. In: Proceedings of PLDI 1994, Orlando, FL, USA (1994) Srivastava, A., Eustace, A.: ATOM: a system for building customized program analysis tools. In: Proceedings of PLDI 1994, Orlando, FL, USA (1994)
34.
Zurück zum Zitat Vahdat, A., Anderson, T.: Transparent result caching. In: Proceedings of USENIX ATC 1998, New Orleans, LA, USA (1998) Vahdat, A., Anderson, T.: Transparent result caching. In: Proceedings of USENIX ATC 1998, New Orleans, LA, USA (1998)
35.
Zurück zum Zitat Widom, J.: Trio a system for data uncertainty and lineage. In: Aggarwal, C.C. (ed.) Managing and Mining Uncertain Data, vol. 35. Springer, New York (2009) Widom, J.: Trio a system for data uncertainty and lineage. In: Aggarwal, C.C. (ed.) Managing and Mining Uncertain Data, vol. 35. Springer, New York (2009)
Metadaten
Titel
Looking Inside the Black-Box: Capturing Data Provenance Using Dynamic Instrumentation
verfasst von
Manolis Stamatogiannakis
Paul Groth
Herbert Bos
Copyright-Jahr
2015
DOI
https://doi.org/10.1007/978-3-319-16462-5_12