Skip to main content
Top
Published in: Empirical Software Engineering 4/2020

29-05-2020

Software provenance tracking at the scale of public source code

Authors: Guillaume Rousseau, Roberto Di Cosmo, Stefano Zacchiroli

Published in: Empirical Software Engineering | Issue 4/2020

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

We study the possibilities to track provenance of software source code artifacts within the largest publicly accessible corpus of publicly available source code, the Software Heritage archive, with over 4 billions unique source code files and 1 billion commits capturing their development histories across 50 million software projects. We perform a systematic and generic estimate of the replication factor across the different layers of this corpus, analysing how much the same artifacts (e.g., SLOC, files or commits) appear in different contexts (e.g., files, commits or source code repositories). We observe a combinatorial explosion in the number of identical source code files across different commits. To discuss the implication of these findings, we benchmark different data models for capturing software provenance information at this scale, and we identify a viable solution, based on the properties of isochrone subgraphs, that is deployable on commodity hardware, is incremental and appears to be maintainable for the foreseeable future. Using these properties, we quantify, at a scale never achieved previously, the growth rate of original, i.e. never-seen-before, source code files and commits, and find it to be exponential over a period of more than 40 years.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Footnotes
1
For example, hundreds of thousands of projects migrated from GitHub to GitLab.com in the days following the acquisition of GitHub by Microsoft in Summer 2018, see https://​about.​gitlab.​com/​2018/​06/​03/​movingtogitlab/​.
 
3
 
4
Some studies have analyzed up to a few million projects, but this is still a tiny fraction of all public source code.
 
Literature
go back to reference Abramatic J-F, Di Cosmo R, Zacchiroli S (2018) Building the universal archive of source code. Commun ACM 61(10):29–31CrossRef Abramatic J-F, Di Cosmo R, Zacchiroli S (2018) Building the universal archive of source code. Commun ACM 61(10):29–31CrossRef
go back to reference Alexandru CV, Panichella S, Gall HC (2017) Reducing redundancies in multi-revision code analysis. In: Pinzger M, Bavota G, Marcus A (eds) IEEE 24th International Conference on Software Analysis, Evolution and Reengineering, SANER 2017, Klagenfurt, Austria, February 20-24, 2017, pp 148–159 Alexandru CV, Panichella S, Gall HC (2017) Reducing redundancies in multi-revision code analysis. In: Pinzger M, Bavota G, Marcus A (eds) IEEE 24th International Conference on Software Analysis, Evolution and Reengineering, SANER 2017, Klagenfurt, Austria, February 20-24, 2017, pp 148–159
go back to reference Alexandru CV, Panichella S, Proksch S, Gall HC (2019) Redundancy-free analysis of multi-revision software artifacts. Empir Softw Eng 24(1):332–380CrossRef Alexandru CV, Panichella S, Proksch S, Gall HC (2019) Redundancy-free analysis of multi-revision software artifacts. Empir Softw Eng 24(1):332–380CrossRef
go back to reference Allamanis M, Sutton CA (2013) Mining source code repositories at massive scale using language modeling. In: Zimmermann T, Di Penta M, Kim S (eds) Proceedings of the 10th working conference on mining software repositories, MSR ’13, San Francisco, CA, USA, May 18-19, 2013, pp 207–216. IEEE Computer Society Allamanis M, Sutton CA (2013) Mining source code repositories at massive scale using language modeling. In: Zimmermann T, Di Penta M, Kim S (eds) Proceedings of the 10th working conference on mining software repositories, MSR ’13, San Francisco, CA, USA, May 18-19, 2013, pp 207–216. IEEE Computer Society
go back to reference Thomas J., Bergin T (2007) A history of the history of programming languages. Commun ACM 50(5):69–74CrossRef Thomas J., Bergin T (2007) A history of the history of programming languages. Commun ACM 50(5):69–74CrossRef
go back to reference Biazzini M, Baudry B (2014) May the fork be with you: novel metrics to analyze collaboration on github. In: Proceedings of the 5th international workshop on emerging trends in software metrics, pp 37–43. ACM Biazzini M, Baudry B (2014) May the fork be with you: novel metrics to analyze collaboration on github. In: Proceedings of the 5th international workshop on emerging trends in software metrics, pp 37–43. ACM
go back to reference Borges H, Hora A, Valente MT (2016) Understanding the factors that impact the popularity of github repositories. In 2016 IEEE international conference on software maintenance and evolution (ICSME), pp 334–344 Borges H, Hora A, Valente MT (2016) Understanding the factors that impact the popularity of github repositories. In 2016 IEEE international conference on software maintenance and evolution (ICSME), pp 334–344
go back to reference Brooks FP Jr (1978) The mythical man-month: essays on software engineering, 1st edn. Addison-Wesley Longman Publishing Co., Inc., Boston Brooks FP Jr (1978) The mythical man-month: essays on software engineering, 1st edn. Addison-Wesley Longman Publishing Co., Inc., Boston
go back to reference Caneill M, Germȧn DM, Zacchiroli S (2017) The Debsources dataset: Two decades of free and open source software. Empir Softw Eng 22(3):1405–1437CrossRef Caneill M, Germȧn DM, Zacchiroli S (2017) The Debsources dataset: Two decades of free and open source software. Empir Softw Eng 22(3):1405–1437CrossRef
go back to reference Capraro M, Riehle D (2017) Inner source definition, benefits, and challenges. ACM Comput Surv (CSUR) 49(4):67CrossRef Capraro M, Riehle D (2017) Inner source definition, benefits, and challenges. ACM Comput Surv (CSUR) 49(4):67CrossRef
go back to reference Crowston K, Wei K, Howison J, Wiggins A (2008) Free/libre open-source software development: What we know and what we do not know. ACM Comput Surv 44:27:1–7:35 Crowston K, Wei K, Howison J, Wiggins A (2008) Free/libre open-source software development: What we know and what we do not know. ACM Comput Surv 44:27:1–7:35
go back to reference Davies J, Germȧn DM, Godfrey MW, Hindle A (2013) Software bertillonage - determining the provenance of software development artifacts. Empir Softw Eng 18 (6):1195–1237CrossRef Davies J, Germȧn DM, Godfrey MW, Hindle A (2013) Software bertillonage - determining the provenance of software development artifacts. Empir Softw Eng 18 (6):1195–1237CrossRef
go back to reference Dorogovtsev SN, Mendes JFF (2002) Evolution of networks. Adv Phys 51 (4):1079–1187CrossRef Dorogovtsev SN, Mendes JFF (2002) Evolution of networks. Adv Phys 51 (4):1079–1187CrossRef
go back to reference Dyer R, Nguyen HA, Rajan H, Nguyen TN (2013) Boa: A language and infrastructure for analyzing ultra-large-scale software repositories. In: Proceedings of the 2013 International Conference on Software Engineering, pp 422–431. IEEE Press Dyer R, Nguyen HA, Rajan H, Nguyen TN (2013) Boa: A language and infrastructure for analyzing ultra-large-scale software repositories. In: Proceedings of the 2013 International Conference on Software Engineering, pp 422–431. IEEE Press
go back to reference Germán DM, Di Penta M, Guéhéneuc Y-G, Antoniol G (2009) Code siblings: Technical and legal implications of copying code between applications. In: Godfrey and Whitehead (Godfrey and Godfrey 2009), pp 81–90 Germán DM, Di Penta M, Guéhéneuc Y-G, Antoniol G (2009) Code siblings: Technical and legal implications of copying code between applications. In: Godfrey and Whitehead (Godfrey and Godfrey 2009), pp 81–90
go back to reference Gkortzis A, Mitropoulos D, Spinellis D (2018) Vulinoss: A dataset of security vulnerabilities in open-source systems. In: Zaidman et al. (Zaidman et al 2018), pp 18–21 Gkortzis A, Mitropoulos D, Spinellis D (2018) Vulinoss: A dataset of security vulnerabilities in open-source systems. In: Zaidman et al. (Zaidman et al 2018), pp 18–21
go back to reference Godfrey MW (2015) Understanding software artifact provenance. Sci Comput Program 97:86–90CrossRef Godfrey MW (2015) Understanding software artifact provenance. Sci Comput Program 97:86–90CrossRef
go back to reference Godfrey MW, German DM, Davies J, Hindle A (2011) Determining the provenance of software artifacts. In: Proceedings of the 5th international workshop on software clones, IWSC ’11. ACM, New York, pp 65–66 Godfrey MW, German DM, Davies J, Hindle A (2011) Determining the provenance of software artifacts. In: Proceedings of the 5th international workshop on software clones, IWSC ’11. ACM, New York, pp 65–66
go back to reference Godfrey MW, Godfrey J (eds) (2009) Proceedings of the 6th international working conference on mining software repositories, MSR 2009 (Co-located with ICSE). Proceedings,. IEEE Computer Society, Vancouver Godfrey MW, Godfrey J (eds) (2009) Proceedings of the 6th international working conference on mining software repositories, MSR 2009 (Co-located with ICSE). Proceedings,. IEEE Computer Society, Vancouver
go back to reference Gousios G, Pinzger M, van Deursen A (2014) An exploratory study of the pull-based software development model. In: Proceedings of the 36th international conference on software engineering, pp 345–355. ACM Gousios G, Pinzger M, van Deursen A (2014) An exploratory study of the pull-based software development model. In: Proceedings of the 36th international conference on software engineering, pp 345–355. ACM
go back to reference Grieco G, Luis Grinblat G, Uzal L, Rawat S, Feist J, Mounier L (2016) Toward large-scale vulnerability discovery using machine learning. In: Proceedings of the 6th ACM conference on data and application security and privacy, CODASPY ’16. ACM, New York, pp 85–96 Grieco G, Luis Grinblat G, Uzal L, Rawat S, Feist J, Mounier L (2016) Toward large-scale vulnerability discovery using machine learning. In: Proceedings of the 6th ACM conference on data and application security and privacy, CODASPY ’16. ACM, New York, pp 85–96
go back to reference Hassan AE (2008) The road ahead for mining software repositories. In: Frontiers of software maintenance FoSM 2008., pp 48–57. IEEE Hassan AE (2008) The road ahead for mining software repositories. In: Frontiers of software maintenance FoSM 2008., pp 48–57. IEEE
go back to reference Hatton L, Spinellis D, van Genuchten M (2017) The long-term growth rate of evolving software: Empirical results and implications. Journal of Software: Evolution and Process, 29(5) Hatton L, Spinellis D, van Genuchten M (2017) The long-term growth rate of evolving software: Empirical results and implications. Journal of Software: Evolution and Process, 29(5)
go back to reference Herraiz I, Rodríguez D, Robles G, Gonzȧlez-Barahona JM (2013) The evolution of the laws of software evolution: A discussion based on a systematic literature review. ACM Comput Surv 46(2):28:1–28:28CrossRef Herraiz I, Rodríguez D, Robles G, Gonzȧlez-Barahona JM (2013) The evolution of the laws of software evolution: A discussion based on a systematic literature review. ACM Comput Surv 46(2):28:1–28:28CrossRef
go back to reference Ishio T, Kula RG, Kanda T, German DM, Inoue K (2016) Software ingredients: Detection of Third-Party component reuse in java software release. In: 2016 IEEE/ACM, 13th working conference on mining software repositories (MSR), pp 339–350 Ishio T, Kula RG, Kanda T, German DM, Inoue K (2016) Software ingredients: Detection of Third-Party component reuse in java software release. In: 2016 IEEE/ACM, 13th working conference on mining software repositories (MSR), pp 339–350
go back to reference Jiang J, Lo D, He J, Xia X, Kochhar PS, Li Z (2017) Why and how developers fork what from whom in github. Empir Softw Eng 22(1):547–578CrossRef Jiang J, Lo D, He J, Xia X, Kochhar PS, Li Z (2017) Why and how developers fork what from whom in github. Empir Softw Eng 22(1):547–578CrossRef
go back to reference Lehman MM (1980) On understanding laws, evolution, and conservation in the large-program life cycle. J Syst Softw 1:213–221CrossRef Lehman MM (1980) On understanding laws, evolution, and conservation in the large-program life cycle. J Syst Softw 1:213–221CrossRef
go back to reference Leskovec J, Sosič R (2016) Snap: A general-purpose network analysis and graph-mining library. ACM Trans Intell Syst Technol (TIST) 8(1):1CrossRef Leskovec J, Sosič R (2016) Snap: A general-purpose network analysis and graph-mining library. ACM Trans Intell Syst Technol (TIST) 8(1):1CrossRef
go back to reference Levin DA, Pedersen PM, Shah AC (2009) Resolving license dependencies for aggregations of legally protectable content, June 2009. CIB: H04K1/00; G06Q10/00; G06Q50/00; H04L9/00 Levin DA, Pedersen PM, Shah AC (2009) Resolving license dependencies for aggregations of legally protectable content, June 2009. CIB: H04K1/00; G06Q10/00; G06Q50/00; H04L9/00
go back to reference Li F, Paxson V (2017) A large-scale empirical study of security patches. In: Proceedings of the 2017 ACM SIGSAC conference on computer and communications security, CCS ’17. ACM, New York, pp 2201–2215 Li F, Paxson V (2017) A large-scale empirical study of security patches. In: Proceedings of the 2017 ACM SIGSAC conference on computer and communications security, CCS ’17. ACM, New York, pp 2201–2215
go back to reference Lopes CV, Maj P, Martins P, Saini V, Yang D, Zitny J, Sajnani H, Vitek J (2017) Dėjȧvu: A map of code duplicates on github. PACMPL 1(OOPSLA) 28:1–84 Lopes CV, Maj P, Martins P, Saini V, Yang D, Zitny J, Sajnani H, Vitek J (2017) Dėjȧvu: A map of code duplicates on github. PACMPL 1(OOPSLA) 28:1–84
go back to reference Ma Y, Bogart C, Amreen S, Zaretzki R, Mockus A (2019) World of code: an infrastructure for mining the universe of open source VCS data. In: Storey et al. (Storey et al 2019), pp 143–154 Ma Y, Bogart C, Amreen S, Zaretzki R, Mockus A (2019) World of code: an infrastructure for mining the universe of open source VCS data. In: Storey et al. (Storey et al 2019), pp 143–154
go back to reference Markovtsev V, Long W (2018) Public git archive: A big code dataset for all. In: Zaidman et al. (Zaidman et al 2018), pp 34–37 Markovtsev V, Long W (2018) Public git archive: A big code dataset for all. In: Zaidman et al. (Zaidman et al 2018), pp 34–37
go back to reference Martinez M, Monperrus M (2015) Mining software repair models for reasoning on the search space of automated program fixing. Empir Softw Eng 20(1):176–205CrossRef Martinez M, Monperrus M (2015) Mining software repair models for reasoning on the search space of automated program fixing. Empir Softw Eng 20(1):176–205CrossRef
go back to reference Merkle RC (1987) A digital signature based on a conventional encryption function. In: Pomerance C (ed) Advances in cryptology - CRYPTO ’87, A conference on the theory and applications of cryptographic techniques, vol 293 of lecture notes in computer science, pp 369–378. Springer Merkle RC (1987) A digital signature based on a conventional encryption function. In: Pomerance C (ed) Advances in cryptology - CRYPTO ’87, A conference on the theory and applications of cryptographic techniques, vol 293 of lecture notes in computer science, pp 369–378. Springer
go back to reference Mockus A (2009) Amassing and indexing a large sample of version control systems: Towards the census of public source code history. In: Godfrey and Whitehead (Godfrey and Godfrey 2009), pp 11–20 Mockus A (2009) Amassing and indexing a large sample of version control systems: Towards the census of public source code history. In: Godfrey and Whitehead (Godfrey and Godfrey 2009), pp 11–20
go back to reference Mockus A (2009) Amassing and indexing a large sample of version control systems: Towards the census of public source code history. In: Proceedings of the 2009 6th IEEE international working conference on mining software repositories, MSR ’09. IEEE Computer Society, Washington, pp 11–20 Mockus A (2009) Amassing and indexing a large sample of version control systems: Towards the census of public source code history. In: Proceedings of the 2009 6th IEEE international working conference on mining software repositories, MSR ’09. IEEE Computer Society, Washington, pp 11–20
go back to reference Newman M, Barabasi A-L, Watts DJ (2006) The structure and dynamics of networks: (Princeton studies in complexity). Princeton University Press, Princeton Newman M, Barabasi A-L, Watts DJ (2006) The structure and dynamics of networks: (Princeton studies in complexity). Princeton University Press, Princeton
go back to reference Pietri A, Spinellis D, Zacchiroli S (2019) The software heritage graph dataset: Public software development under one roof. In Storey et al. (Storey et al 2019), pp 138–142 Pietri A, Spinellis D, Zacchiroli S (2019) The software heritage graph dataset: Public software development under one roof. In Storey et al. (Storey et al 2019), pp 138–142
go back to reference Rastogi A, Nagappan N (2016) Forking and the sustainability of the developer community participation–an empirical investigation on outcomes and reasons. In: 2016 IEEE 23rd international conference on software analysis, evolution, and Reengineering (SANER), vol 1, pp 102–111. IEEE Rastogi A, Nagappan N (2016) Forking and the sustainability of the developer community participation–an empirical investigation on outcomes and reasons. In: 2016 IEEE 23rd international conference on software analysis, evolution, and Reengineering (SANER), vol 1, pp 102–111. IEEE
go back to reference Rattan D, Bhatia R, Singh M (2013) Software clone detection: A systematic review. Inf Softw Technol 55(7):1165–1199CrossRef Rattan D, Bhatia R, Singh M (2013) Software clone detection: A systematic review. Inf Softw Technol 55(7):1165–1199CrossRef
go back to reference Rousseau G, Biais M (2010) Computer tool for managing digital documents. CIB: G06F17/30; G06F21/10; G06F21/64 Rousseau G, Biais M (2010) Computer tool for managing digital documents. CIB: G06F17/30; G06F21/10; G06F21/64
go back to reference Roy CK, Cordy JR (2007) A survey on software clone detection research Technical Report 115, Queen’s School of Computing Roy CK, Cordy JR (2007) A survey on software clone detection research Technical Report 115, Queen’s School of Computing
go back to reference Semura Y, Yoshida N, Choi E, Inoue K (2017) Ccfindersw: Clone detection tool with flexible multilingual tokenizatio. In: Lv J, Zhang HJ, Hinchey M, Liu X (eds) 24th Asia-Pacific software engineering conference, APSEC 2017. IEEE Computer Society, Nanjing, pp 654–659 Semura Y, Yoshida N, Choi E, Inoue K (2017) Ccfindersw: Clone detection tool with flexible multilingual tokenizatio. In: Lv J, Zhang HJ, Hinchey M, Liu X (eds) 24th Asia-Pacific software engineering conference, APSEC 2017. IEEE Computer Society, Nanjing, pp 654–659
go back to reference Spinellis D (2017) A repository of Unix history and evolution. Empir Softw Eng 22(3):1372–1404CrossRef Spinellis D (2017) A repository of Unix history and evolution. Empir Softw Eng 22(3):1372–1404CrossRef
go back to reference Squire M (2017) The lives and deaths of open source code forges. In: Morgan L (ed) Proceedings of the 13th international symposium on open collaboration, OpenSym Galway, Ireland, August 23-25, 2017, pp 15:1–15:8. ACM Squire M (2017) The lives and deaths of open source code forges. In: Morgan L (ed) Proceedings of the 13th international symposium on open collaboration, OpenSym Galway, Ireland, August 23-25, 2017, pp 15:1–15:8. ACM
go back to reference Stol K-J, Fitzgerald B (2014) Inner source–adopting open source development practices in organizations: a tutorial. IEEE Softw 32(4):60–67CrossRef Stol K-J, Fitzgerald B (2014) Inner source–adopting open source development practices in organizations: a tutorial. IEEE Softw 32(4):60–67CrossRef
go back to reference Storey M-AD, Adams B, Haiduc S (eds) (2019) Proceedings of the 16th international conference on mining software repositories, MSR 2019, 26-27. IEEE / ACM, Montreal Storey M-AD, Adams B, Haiduc S (eds) (2019) Proceedings of the 16th international conference on mining software repositories, MSR 2019, 26-27. IEEE / ACM, Montreal
go back to reference Svajlenko J, Roy CK (2017) Fast and flexible large-scale clone detection with cloneworks. In: Uchitel S, Orso A, Robillard MP (eds) Proceedings of the 39th International Conference on Software Engineering, ICSE 2017, Buenos Aires, Argentina, May 20-28, 2017 - companion Volume, pp 27–30. IEEE Computer Society Svajlenko J, Roy CK (2017) Fast and flexible large-scale clone detection with cloneworks. In: Uchitel S, Orso A, Robillard MP (eds) Proceedings of the 39th International Conference on Software Engineering, ICSE 2017, Buenos Aires, Argentina, May 20-28, 2017 - companion Volume, pp 27–30. IEEE Computer Society
go back to reference Thummalapenta S, Cerulo L, Aversano L, Di Penta M (2010) An empirical study on the maintenance of source code clones. Empir Softw Eng 15(1):1–34CrossRef Thummalapenta S, Cerulo L, Aversano L, Di Penta M (2010) An empirical study on the maintenance of source code clones. Empir Softw Eng 15(1):1–34CrossRef
go back to reference Thung F, Bissyande TF, Lo D, Jiang L (2013) Network structure of social coding in github. In: 2013 17th European Conference on Software Maintenance and Reengineering, pp 323–326. IEEE Thung F, Bissyande TF, Lo D, Jiang L (2013) Network structure of social coding in github. In: 2013 17th European Conference on Software Maintenance and Reengineering, pp 323–326. IEEE
go back to reference Tiwari NM, Upadhyaya G, Rajan H (2016) Candoia: A platform and ecosystem for mining software repositories tools. In: Dillon LK, Visser W, Williams L (eds) Proceedings of the 38th international conference on software engineering, ICSE 2016, pp 759–764. ACM Tiwari NM, Upadhyaya G, Rajan H (2016) Candoia: A platform and ecosystem for mining software repositories tools. In: Dillon LK, Visser W, Williams L (eds) Proceedings of the 38th international conference on software engineering, ICSE 2016, pp 759–764. ACM
go back to reference Tuunanen T, Koskinen Ji, Kärkkäinen T (2009) Automated software license analysis. Autom Softw Eng 16(3-4):455–490CrossRef Tuunanen T, Koskinen Ji, Kärkkäinen T (2009) Automated software license analysis. Autom Softw Eng 16(3-4):455–490CrossRef
go back to reference Vendome C. (2015) A large scale study of license usage on github. In: 2015 IEEE/ACM 37th IEEE international conference on software engineering, vol 2, pp 772–774 Vendome C. (2015) A large scale study of license usage on github. In: 2015 IEEE/ACM 37th IEEE international conference on software engineering, vol 2, pp 772–774
go back to reference Waldin R, Zhang J (2009) Determining a document similarity metric, July 2009. CIB: G06F17/30 Waldin R, Zhang J (2009) Determining a document similarity metric, July 2009. CIB: G06F17/30
go back to reference Wu Y, Manabe Y, Kanda T, Germȧn DM, Inoue K (2017) Analysis of license inconsistency in large collections of open source projects. Empir Softw Eng 22 (3):1194–1222CrossRef Wu Y, Manabe Y, Kanda T, Germȧn DM, Inoue K (2017) Analysis of license inconsistency in large collections of open source projects. Empir Softw Eng 22 (3):1194–1222CrossRef
go back to reference Zaidman A, Kamei Y, Hill E (eds) (2018) Proceedings of the 15th International Conference on Mining Software Repositories, MSR 2018. ACM, Gothenburg Zaidman A, Kamei Y, Hill E (eds) (2018) Proceedings of the 15th International Conference on Mining Software Repositories, MSR 2018. ACM, Gothenburg
go back to reference Zimmermann T, Premraj R, Zeller A (2007) Predicting defects for eclipse. In: International workshop on predictor models in software engineering, 2007 PROMISE’07: ICSE Workshops 2007, pp 9–9 Zimmermann T, Premraj R, Zeller A (2007) Predicting defects for eclipse. In: International workshop on predictor models in software engineering, 2007 PROMISE’07: ICSE Workshops 2007, pp 9–9
go back to reference Zimmermann T, Weißgerber P, Diehl S, Zeller A (2004) Mining version histories to guide software changes. In: Finkelstein A, Estublier J, Rosenblum DS (eds) 26th international conference on software engineering (ICSE 2004), 23-28 May 2004, Edinburgh, pp 563–572 Zimmermann T, Weißgerber P, Diehl S, Zeller A (2004) Mining version histories to guide software changes. In: Finkelstein A, Estublier J, Rosenblum DS (eds) 26th international conference on software engineering (ICSE 2004), 23-28 May 2004, Edinburgh, pp 563–572
Metadata
Title
Software provenance tracking at the scale of public source code
Authors
Guillaume Rousseau
Roberto Di Cosmo
Stefano Zacchiroli
Publication date
29-05-2020
Publisher
Springer US
Published in
Empirical Software Engineering / Issue 4/2020
Print ISSN: 1382-3256
Electronic ISSN: 1573-7616
DOI
https://doi.org/10.1007/s10664-020-09828-5

Other articles of this Issue 4/2020

Empirical Software Engineering 4/2020 Go to the issue

Premium Partner