Skip to main content
Top
Published in: Empirical Software Engineering 1/2024

01-02-2024

Unreproducible builds: time to fix, causes, and correlation with external ecosystem factors

Authors: Rahul Bajaj, Eduardo Fernandes, Bram Adams, Ahmed E. Hassan

Published in: Empirical Software Engineering | Issue 1/2024

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Context

A reproducible build occurs if, given the same source code, build instructions, and build environment (i.e., installed build dependencies), compiling a software project repeatedly generates the same build artifacts. Reproducible builds are essential to identify tampering attempts responsible for supply chain attacks, with most of the research on reproducible builds considering build reproducibility as a project-specific issue. In contrast, modern software projects are part of a larger ecosystem and depend on dozens of other projects, which begs the question of to what extent build reproducibility of a project is the responsibility of that project or perhaps something forced on it.

Objective

This empirical study aims at analyzing reproducible and unreproducible builds in Linux Distributions to systematically investigate the process of making builds reproducible in open-source distributions. Our study targets build performed on 11,528 and 597,066 Arch Linux and Debian packages, respectively.

Method

We compute the likelihood of unreproducible packages becoming reproducible (and vice versa) and identify the root causes behind unreproducible builds. Finally, we compute the correlation between the reproducibility status of packages and three ecosystem factors (i.e., factors outside the control of a given package).

Results

Arch Linux packages become reproducible a median of 30 days quicker when compared to Debian packages, while Debian packages remain reproducible for a median of 68 days longer once fixed. We identified a taxonomy of 16 root causes of unreproducible builds and found that the build reproducibility status of a package across different hardware architectures is statistically significantly different (strong effect size). At the same time, the status also differs between versions of a package for different distributions and depends on the build reproducibility of a package’s build dependencies, albeit with weaker effect sizes.

Conclusions

The ecosystem a project belongs to, plays an important role w.r.t. the project’s build reproducibility. Since these are outside a developer’s control, future work on (fixing) unreproducible builds should consider these ecosystem influences.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Footnotes
4
A source package like glibc, when built can produce multiple binary packages like libc6 and libc6-dev.
 
Literature
go back to reference Abdalkareem R, Nourry O, Wehaibi S, Mujahid S, Shihab E (2017) Why do developers use trivial packages? an empirical case study on npm. In: Proceedings of the 11th joint meeting on foundations of software engineering (ESEC/FSE). pp 385–395 Abdalkareem R, Nourry O, Wehaibi S, Mujahid S, Shihab E (2017) Why do developers use trivial packages? an empirical case study on npm. In: Proceedings of the 11th joint meeting on foundations of software engineering (ESEC/FSE). pp 385–395
go back to reference Adams B, Kavanagh R, Hassan AE, German DM (2016) An empirical study of integration activities in distributions of open source software. Empir Softw Eng 21(3):960–1001CrossRef Adams B, Kavanagh R, Hassan AE, German DM (2016) An empirical study of integration activities in distributions of open source software. Empir Softw Eng 21(3):960–1001CrossRef
go back to reference Allison PD (2010) Survival analysis using SAS: a practical guide, 2nd edn. SAS Institute Allison PD (2010) Survival analysis using SAS: a practical guide, 2nd edn. SAS Institute
go back to reference Brooks FP (1974) The mythical man-month. Datamation 20(12):44–52 Brooks FP (1974) The mythical man-month. Datamation 20(12):44–52
go back to reference Butler S, Gamalielsson J, Lundell B, Brax C, Mattsson A, Gustavsson T, Feist J, Kvarnström B, Lönroth E (2022) On business adoption and use of reproducible builds for open and closed source software. Software Qual J 1–33 Butler S, Gamalielsson J, Lundell B, Brax C, Mattsson A, Gustavsson T, Feist J, Kvarnström B, Lönroth E (2022) On business adoption and use of reproducible builds for open and closed source software. Software Qual J 1–33
go back to reference de Carné de Carnavalet X, Mannan M (2014) Challenges and implications of verifiable builds for security-critical open-source software. In: Proceedings of the 30th annual computer security applications conference (ACSAC). pp 16–25 de Carné de Carnavalet X, Mannan M (2014) Challenges and implications of verifiable builds for security-critical open-source software. In: Proceedings of the 30th annual computer security applications conference (ACSAC). pp 16–25
go back to reference Chowdhury MAR, Abdalkareem R, Shihab E, Adams B (2021) On the untriviality of trivial packages: An empirical study of npm javascript packages. IEEE Transactions on Software Engineering pp 1–15 Chowdhury MAR, Abdalkareem R, Shihab E, Adams B (2021) On the untriviality of trivial packages: An empirical study of npm javascript packages. IEEE Transactions on Software Engineering pp 1–15
go back to reference Claes M, Mens T, Di Cosmo R, Vouillon J (2015) A historical analysis of Debian package incompatibilities. In: Proceedings of the 12th working conference on mining software repositories (MSR). pp 212–223 Claes M, Mens T, Di Cosmo R, Vouillon J (2015) A historical analysis of Debian package incompatibilities. In: Proceedings of the 12th working conference on mining software repositories (MSR). pp 212–223
go back to reference Decan A, Mens T, Claes M (2016) On the topology of package dependency networks: A comparison of three programming language ecosystems. In: Proceedings of the 10th European conference on software architecture workshops (ECSAW). pp 21:1–21:4 Decan A, Mens T, Claes M (2016) On the topology of package dependency networks: A comparison of three programming language ecosystems. In: Proceedings of the 10th European conference on software architecture workshops (ECSAW). pp 21:1–21:4
go back to reference Decan A, Mens T, Constantinou E (2018) On the impact of security vulnerabilities in the NPM package dependency network. In: Proceedings of the 15th international conference on mining software repositories. pp 181–191 Decan A, Mens T, Constantinou E (2018) On the impact of security vulnerabilities in the NPM package dependency network. In: Proceedings of the 15th international conference on mining software repositories. pp 181–191
go back to reference Easterbrook S, Singer J, Storey MA, Damian D (2008) Selecting empirical methods for software engineering research. In: Guide to advanced empirical software engineering. Springer, pp 285–311 Easterbrook S, Singer J, Storey MA, Damian D (2008) Selecting empirical methods for software engineering research. In: Guide to advanced empirical software engineering. Springer, pp 285–311
go back to reference Fried L (1991) Team size and productivity in systems development bigger does not always mean better. J Inf Syst Manag 8(3):27–35 Fried L (1991) Team size and productivity in systems development bigger does not always mean better. J Inf Syst Manag 8(3):27–35
go back to reference Goeminne M, Mens T (2015) Towards a survival analysis of database framework usage in java projects. In: Proceedings of the 2015 IEEE international conference on software maintenance and evolution (ICSME), pp 551–555 Goeminne M, Mens T (2015) Towards a survival analysis of database framework usage in java projects. In: Proceedings of the 2015 IEEE international conference on software maintenance and evolution (ICSME), pp 551–555
go back to reference Goswami P, Gupta S, Li Z, Meng N, Yao D (2020) Investigating the reproducibility of NPM packages. In: Proceedings of the 2020 international conference on software maintenance and evolution (ICSME). pp 677–681 Goswami P, Gupta S, Li Z, Meng N, Yao D (2020) Investigating the reproducibility of NPM packages. In: Proceedings of the 2020 international conference on software maintenance and evolution (ICSME). pp 677–681
go back to reference Koen R, Olivier MS (2008) The use of file timestamps in digital forensics. In: ISSA. Citeseer, pp 1–16 Koen R, Olivier MS (2008) The use of file timestamps in digital forensics. In: ISSA. Citeseer, pp 1–16
go back to reference Lamb C, Zacchiroli S (2021) Reproducible builds: Increasing the integrity of software supply chains. IEEE Software 39(2):62–70CrossRef Lamb C, Zacchiroli S (2021) Reproducible builds: Increasing the integrity of software supply chains. IEEE Software 39(2):62–70CrossRef
go back to reference Maes-Bermejo M, Gallego M, Gortázar F, Robles G, Gonzalez-Barahona JM (2022) Revisiting the building of past snapshots-a replication and reproduction study. Empir Softw Eng (EMSE) 27(3):1–26 Maes-Bermejo M, Gallego M, Gortázar F, Robles G, Gonzalez-Barahona JM (2022) Revisiting the building of past snapshots-a replication and reproduction study. Empir Softw Eng (EMSE) 27(3):1–26
go back to reference Mancinelli F, Boender J, Di Cosmo R, Vouillon J, Durak B, Leroy X, Treinen R (2006) Managing the complexity of large free and open source package-based software distributions. In: Proceedings of the 21st international conference on automated software engineering (ASE). pp 199–208 Mancinelli F, Boender J, Di Cosmo R, Vouillon J, Durak B, Leroy X, Treinen R (2006) Managing the complexity of large free and open source package-based software distributions. In: Proceedings of the 21st international conference on automated software engineering (ASE). pp 199–208
go back to reference Mäntylä MV, Adams B, Khomh F, Engström E, Petersen K (2015) On rapid releases and software testing: A case study and a semi-systematic literature review. Empirical Software Engineering 20(5):1384–1425CrossRef Mäntylä MV, Adams B, Khomh F, Engström E, Petersen K (2015) On rapid releases and software testing: A case study and a semi-systematic literature review. Empirical Software Engineering 20(5):1384–1425CrossRef
go back to reference Mao A, Mason W, Suri S, Watts DJ (2016) An experimental study of team size and performance on a complex task. PloS one 11(4):e0153048CrossRef Mao A, Mason W, Suri S, Watts DJ (2016) An experimental study of team size and performance on a complex task. PloS one 11(4):e0153048CrossRef
go back to reference Massacci F, Jaeger T, Peisert S (2021) Solarwinds and the challenges of patching: Can we ever stop dancing with the devil? IEEE Secur Priv 19:14–19CrossRef Massacci F, Jaeger T, Peisert S (2021) Solarwinds and the challenges of patching: Can we ever stop dancing with the devil? IEEE Secur Priv 19:14–19CrossRef
go back to reference Maste E (2017) Reproducible builds in freebsd. In: Proceedings of 11th Asian conference on BSD based systems (AsiaBSDCon). pp 1–8 Maste E (2017) Reproducible builds in freebsd. In: Proceedings of 11th Asian conference on BSD based systems (AsiaBSDCon). pp 1–8
go back to reference McIntosh S, Adams B, Nagappan M, Hassan AE (2014) Mining co-change information to understand when build changes are necessary. In: Proceedings of the 2014 IEEE international conference on software maintenance and evolution (ICSME). pp 241–250 McIntosh S, Adams B, Nagappan M, Hassan AE (2014) Mining co-change information to understand when build changes are necessary. In: Proceedings of the 2014 IEEE international conference on software maintenance and evolution (ICSME). pp 241–250
go back to reference Michlmayr M, Hunt F, Probert D (2007) Release management in free software projects: Practices and problems. In: Proceedings of the 2007 international federation for information processing international conference on open source systems (IFIPAICT), vol 234. pp 295–300 Michlmayr M, Hunt F, Probert D (2007) Release management in free software projects: Practices and problems. In: Proceedings of the 2007 international federation for information processing international conference on open source systems (IFIPAICT), vol 234. pp 295–300
go back to reference Miller P (1998) Recursive make considered harmful. AUUGN Journal of AUUG Inc 19(1):14–25 Miller P (1998) Recursive make considered harmful. AUUGN Journal of AUUG Inc 19(1):14–25
go back to reference Mirhosseini S, Parnin C (2017) Can automated pull requests encourage software developers to upgrade out-of-date dependencies? In: 2017 32nd IEEE/ACM international conference on automated software engineering (ASE). pp 84–94 Mirhosseini S, Parnin C (2017) Can automated pull requests encourage software developers to upgrade out-of-date dependencies? In: 2017 32nd IEEE/ACM international conference on automated software engineering (ASE). pp 84–94
go back to reference Nagappan N, Ball T (2005) Use of relative code churn measures to predict system defect density. In: Proceedings of the 27th international conference on software engineering. pp 284–292 Nagappan N, Ball T (2005) Use of relative code churn measures to predict system defect density. In: Proceedings of the 27th international conference on software engineering. pp 284–292
go back to reference Nussbaum L, Zacchiroli S (2010) The ultimate Debian database: Consolidating bazaar metadata for quality assurance and data mining. In: 2010 7th IEEE working conference on mining software repositories (MSR 2010). pp 52–61 Nussbaum L, Zacchiroli S (2010) The ultimate Debian database: Consolidating bazaar metadata for quality assurance and data mining. In: 2010 7th IEEE working conference on mining software repositories (MSR 2010). pp 52–61
go back to reference Ohm M, Plate H, Sykosch A, Meier M (2020) Backstabber’s knife collection: A review of open source software supply chain attacks. In: Proceedings of the 2020 international conference on detection of intrusions and malware, and vulnerability assessment, vol 12223. pp 23–43 Ohm M, Plate H, Sykosch A, Meier M (2020) Backstabber’s knife collection: A review of open source software supply chain attacks. In: Proceedings of the 2020 international conference on detection of intrusions and malware, and vulnerability assessment, vol 12223. pp 23–43
go back to reference Ohm M, Sykosch A, Meier M (2020) Towards detection of software supply chain attacks by forensic artifacts. In: Proceedings of the 15th international conference on availability, reliability and security (ARES). pp 1–6 Ohm M, Sykosch A, Meier M (2020) Towards detection of software supply chain attacks by forensic artifacts. In: Proceedings of the 15th international conference on availability, reliability and security (ARES). pp 1–6
go back to reference Raymond E (1999) The cathedral and the bazaar. Knowl Technol Policy 12(3):23–49CrossRef Raymond E (1999) The cathedral and the bazaar. Knowl Technol Policy 12(3):23–49CrossRef
go back to reference Rea LM, Parker RA (2014) Designing and conducting survey research: A comprehensive guide, 1st edn. John Wiley & Sons Rea LM, Parker RA (2014) Designing and conducting survey research: A comprehensive guide, 1st edn. John Wiley & Sons
go back to reference Ren Z, Jiang H, Xuan J, Yang Z (2016) Automated localization for unreproducible builds. In: Proceedings of the 40th international conference on software engineering (ICSE). pp 71–81 Ren Z, Jiang H, Xuan J, Yang Z (2016) Automated localization for unreproducible builds. In: Proceedings of the 40th international conference on software engineering (ICSE). pp 71–81
go back to reference Samoladas I, Angelis L, Stamelos I (2010) Survival analysis on the duration of open source projects. Inf Softw Technol 52(9):902–922CrossRef Samoladas I, Angelis L, Stamelos I (2010) Survival analysis on the duration of open source projects. Inf Softw Technol 52(9):902–922CrossRef
go back to reference Shi Y, Wen M, Cogo FR, Chen B, Jiang ZMJ (2021) An experience report on producing verifiable builds for large-scale commercial systems. IEEE Transactions on Software Engineering Shi Y, Wen M, Cogo FR, Chen B, Jiang ZMJ (2021) An experience report on producing verifiable builds for large-scale commercial systems. IEEE Transactions on Software Engineering
go back to reference Thompson K (1984) Reflections on trusting trust. Commun ACM 27(8):761–763CrossRef Thompson K (1984) Reflections on trusting trust. Commun ACM 27(8):761–763CrossRef
go back to reference Vu DL, Pashchenko I, Massacci F, Plate H, Sabetta A (2020) Towards using source code repositories to identify software supply chain attacks, pp 2093–2095 Vu DL, Pashchenko I, Massacci F, Plate H, Sabetta A (2020) Towards using source code repositories to identify software supply chain attacks, pp 2093–2095
go back to reference Wang Z, Zhang H, Chen TH, Wang S (2021) Would you like a quick peek? Providing logging support to monitor data processing in big data applications. In: Proceedings of the 29th joint meeting on european software engineering conference and symposium on the foundations of software engineering (ESEC/FSE). pp 516–526 Wang Z, Zhang H, Chen TH, Wang S (2021) Would you like a quick peek? Providing logging support to monitor data processing in big data applications. In: Proceedings of the 29th joint meeting on european software engineering conference and symposium on the foundations of software engineering (ESEC/FSE). pp 516–526
go back to reference Wheeler DA (2005) Countering trusting trust through diverse double-compiling. In: Proceedings of the 21st annual computer security applications conference (ACSAC). pp 1–13 Wheeler DA (2005) Countering trusting trust through diverse double-compiling. In: Proceedings of the 21st annual computer security applications conference (ACSAC). pp 1–13
go back to reference Yan D, Niu Y, Liu K, Liu Z, Liu Z, Bissyandé TF (2021) Estimating the attack surface from residual vulnerabilities in open source software supply chain. In: Proceedings of the 21st international conference on software quality, reliability and security (QRS). pp 493–502 Yan D, Niu Y, Liu K, Liu Z, Liu Z, Bissyandé TF (2021) Estimating the attack surface from residual vulnerabilities in open source software supply chain. In: Proceedings of the 21st international conference on software quality, reliability and security (QRS). pp 493–502
go back to reference Zerouali A, Constantinou E, Mens T, Robles G, González-Barahona J (2018) An empirical analysis of technical lag in NPM package dependencies. In: International conference on software reuse. pp 95–110 Zerouali A, Constantinou E, Mens T, Robles G, González-Barahona J (2018) An empirical analysis of technical lag in NPM package dependencies. In: International conference on software reuse. pp 95–110
go back to reference Zerouali A, Mens T, Robles G, Gonzalez-Barahona JM (2019) On the diversity of software package popularity metrics: an empirical study of npm. In: Proceedings of the 26th international conference on software analysis, evolution and reengineering (SANER). pp 589–593 Zerouali A, Mens T, Robles G, Gonzalez-Barahona JM (2019) On the diversity of software package popularity metrics: an empirical study of npm. In: Proceedings of the 26th international conference on software analysis, evolution and reengineering (SANER). pp 589–593
Metadata
Title
Unreproducible builds: time to fix, causes, and correlation with external ecosystem factors
Authors
Rahul Bajaj
Eduardo Fernandes
Bram Adams
Ahmed E. Hassan
Publication date
01-02-2024
Publisher
Springer US
Published in
Empirical Software Engineering / Issue 1/2024
Print ISSN: 1382-3256
Electronic ISSN: 1573-7616
DOI
https://doi.org/10.1007/s10664-023-10399-4

Other articles of this Issue 1/2024

Empirical Software Engineering 1/2024 Go to the issue

Premium Partner