Skip to main content

2025 | Buch

Managing Software Supply Chains

Theory and Practice

verfasst von: Ying Wang, Shing-Chi Cheung, Hai Yu, Zhiliang Zhu

Verlag: Springer Nature Singapore

insite
SUCHEN

Über dieses Buch

Open-Source-Software-Lieferketten üben erheblichen Einfluss in der Softwarebranche aus und ziehen beträchtliches Interesse von Unternehmen, Forschern und politischen Entscheidungsträgern auf sich. Die Nutzung von Bibliotheken von Drittanbietern zur Erstellung von Softwareanwendungen ist eine gängige Praxis, die auf Kosteneinsparungen und die Verbesserung der Softwarequalität abzielt. Allerdings führt eine starke Abhängigkeit von externen Bibliotheken häufig zu einem Zustand der "Abhängigkeitshölle", der durch Probleme wie Inkompatibilitäten, widersprüchliche Versionen, aufgeblähte Abhängigkeiten und die Einbeziehung verletzlicher Bibliotheksversionen gekennzeichnet ist. Trotz umfangreicher Forschungsarbeiten zum Software-Abhängigkeitsmanagement und zur Entwicklung von Software-Lieferketten bleiben Fragen hinsichtlich der Unterschiede bei den Abhängigkeitsherausforderungen in Programmiersprachen-Ökosystemen bestehen und wie das Abhängigkeitsphänomen am besten aus einer ökosystemweiten Perspektive angegangen werden kann. Das Ziel dieses Buches besteht darin, (1) eine umfassende Literaturübersicht über Software-Lieferketten, (2) Diskussionen über die Modellierung von Software-Lieferketten und die Analyse ihres evolutionären Verhaltens anzubieten, (3) Strategien auf Ökosystemebene zur Diagnose verschiedener Abhängigkeitsprobleme und zur Automatisierung der Problemlösung mittels Kosten-Nutzen-Analyse und (4) die Bereitstellung eines Toolkits und Datensatzes zur Unterstützung zukünftiger Forschung und zur Unterstützung von Praktikern bei der Bewältigung der Herausforderungen des Abhängigkeitsmanagements. Die in diesem Buch skizzierten Methoden wurden bereits in hochrangigen Konferenzen und Fachzeitschriften vorgestellt, wobei einige Techniken offiziell in die Produkte der Microsoft Corporation und Huawei Technologies Co Ltd. integriert wurden. Dieses Buch soll den Lesern ein solides Verständnis der Grundlagen der Softwarelieferkette und praktische Anleitungen zur Umsetzung von Theorie und Techniken im industriellen Umfeld der realen Welt vermitteln. Das Buch richtet sich in erster Linie an Softwareentwickler und Studenten mit akademischem Hintergrund, die sich für Abhängigkeitsmanagement für Bibliotheken von Drittanbietern, Qualitätssicherung für Software-Lieferketten und die Entwicklung von Open-Source-Software-Ökosystemen interessieren. Es wird auch für Praktiker interessant sein, einschließlich Software-Ingenieure, Qualitätssicherungsfachleute und Software-Manager sowie allgemeine Leser. Alle werden von unseren systematischen Studien über das Abhängigkeitsphänomen Hölle in verschiedenen Programmiersprachen-Gemeinschaften und den damit verbundenen wertvollen Artefakten profitieren.

Inhaltsverzeichnis

Frontmatter

An Overview of Software Supply Chain Research

Frontmatter
Chapter 1. Introducing Software Supply Chain
Abstract
The significance of researching the open-source software supply chain lies in gaining a deep understanding and effectively managing dependency relationships, evolution trends, security, and sustainability in software development. This ultimately enhances the success rate, quality, and security of software projects. This introductory chapter first introduces four fundamental concepts related to the management and maintenance of the open-source software supply chain. Next, we comprehensively collected authoritative journal papers and conference papers from the past two decades (2001–2024) in the field, covering empirical studies, case studies, metric methods and analysis frameworks, techniques, tools, and other aspects. We provide an overview of the research status and trends in the open-source software supply chain by analyzing fundamental information such as publication dates, publishing journals, and the types of major contributions found in each paper.
Ying Wang, Shing-Chi Cheung, Hai Yu, Zhiliang Zhu

Modeling and Analyzing Software Supply Chain

Frontmatter
Chapter 2. Modeling Software Supply Chains and Analyzing Their Evolutionary Behaviors
Abstract
The supply chain formed by tens of thousands of interdependent open-source software libraries has brought convenience to software development while also posing unprecedented challenges in dependency management. The exponential growth in the scale of software libraries and their intricate dependency relationships has led to a significant increase in the complexity of the supply chain. On the one hand, defects with any software library in the supply chain can potentially impact other libraries in the chain, amplifying the magnitude of losses. On the other hand, any software library can be affected by problems originating from other libraries in the supply chain, making issue identification and resolution difficult. In this backdrop, many researchers focused on characterizing, abstracting, and analyzing the dependency relationships among software libraries and exploring the evolutionary behaviors of open-source software ecosystems based on the library dependency models.
Ying Wang, Shing-Chi Cheung, Hai Yu, Zhiliang Zhu

Dependency Issues in Software Supply Chain

Frontmatter
Chapter 3. Common Types of Dependency Issues
Abstract
The interplay and constraints of asynchronous evolution in open-source software libraries, along with build environment restrictions and complex dependency configurations, result in five common dependency issues. These include compatibility problems arising from API breaking changes, violations of open-source license agreements, conflicts due to multiple library versions, bloated dependencies with excessive or redundant components, and the propagation of vulnerable dependencies through outdated dependencies. Figure 3.1 depicts the causal relationship between evolutionary behaviors and these dependency issues. Extensive research has been conducted to understand and address these issues, emphasizing the importance of careful dependency management, resolving conflicts, and actively updating dependencies to ensure software functionality, compliance, performance, and security.
Ying Wang, Shing-Chi Cheung, Hai Yu, Zhiliang Zhu

Ecosystem-Level Techniques Combating Dependency Issues

Frontmatter
Chapter 4. Detecting Compatibility Issues for Machine Learning Libraries
Abstract
A Machine Learning (ML) pipeline configures the workflow of a learning task using the APIs provided by ML libraries. However, a pipeline’s performance can vary significantly across different configurations of ML library versions. Misconfigured pipelines can result in inferior performance, such as inefficient executions, numeric errors, and even crashes. A pipeline is subject to misconfiguration if it exhibits significantly inconsistent performance upon changes in the versions of its configured libraries or the combination of these libraries. We refer to such performance inconsistency as an ML library compatibility (MLC) issue. A systematic understanding of MLC issues helps configure effective ML pipelines and identify misconfigured ones. To this end, we conduct the first empirical study of MLC issues’ pervasiveness, impact, and root causes. To facilitate scalable in-depth analysis, we develop Piecer, an infrastructure that automatically generates a set of pipeline variants by varying different version combinations of ML libraries and detects their performance inconsistencies. We apply Piecer to the 3,380 pipelines that can be deployed out of the 11,363 ML pipelines collected from multiple ML competitions at Kaggle platform. The empirical study results show that 1,092 (32.3%) of the 3,380 pipelines manifest significant performance inconsistencies on at least one variant. We find that 399, 243, and 440 pipelines can achieve better competition scores, execution time, and memory usage, respectively, by adopting a different configuration. Based on our findings, we construct a repository containing 164 defective APIs and 106 API combinations from 418 library versions. The defective API repository facilitates future studies of automated detection techniques for MLC issues. Leveraging the repository, we captured MLC issues in 309 real-world ML pipelines.
Ying Wang, Shing-Chi Cheung, Hai Yu, Zhiliang Zhu
Chapter 5. Diagnosing NuGet Dependency Conflicts
Abstract
Developers usually suffer from dependency conflict (DC) issues, i.e., package dependency constraints are violated when a project’s platform or dependencies are changed. This problem is especially serious in .NET ecosystem due to its fragmented platforms (e.g., .NET Framework, .NET Core, and .NET Standard). Fixing DC issues is challenging due to the complexity of dependency constraints: Multiple DC issues often occur in one project, solving one DC issue usually causes another DC issue cropping up, and the exponential search space of possible dependency combinations is also a barrier.
In this chapter, we aim to help .NET developers tackle the DC issues. First, we empirically studied a set of real DC issues, learning their common fixing strategies and developers’ preferences in adopting these strategies. Based on these findings, we propose NuFix, an automated technique to repair DC issues. NuFix formulates the repair task as a binary integer linear optimization problem to effectively derive an optimal fix in line with the learnt developers’ preferences. The experiment results and expert validation show that NuFix can generate high-quality fixes for all the DC issues with 262 popular .NET projects. Encouragingly, 20 projects (including affected projects such as Dropbox) have approved and merged our generated fixes and shown great interests in our technique.
Ying Wang, Shing-Chi Cheung, Hai Yu, Zhiliang Zhu
Chapter 6. Streamlining Software Bloated Dependencies
Abstract
Numerous third-party libraries introduced into client projects are not actually required, resulting in modern software being gradually bloated. Software developers may spend much unnecessary effort to manage the bloated dependencies: keeping the library versions up-to-date, making sure that heterogeneous licenses are compatible, and resolving dependency conflict or vulnerability issues.
However, the prior debloating techniques can easily produce false alarms of bloated dependencies since they are less effective in analyzing Java reflections. Besides, the solutions given by the existing approaches for removing bloated dependencies may induce new issues that are not conducive to dependency management. To address the above limitations, in this chapter, we developed a technique, Slimming, to remove bloated dependencies from software projects reliably. Slimming statically analyzes the Java reflections that are commonly leveraged by popular frameworks (e.g., Spring Boot) and resolves the reflective targets via parsing configuration files (*.xml, *.yml, and *.properties). By modeling string manipulations, Slimming fully resolves the string arguments of our concerned reflection APIs to identify all the required dependencies. More importantly, it helps developers analyze the debloating solutions by weighing the benefits against the costs of dependency management. Our evaluation results show that the static reflection analysis capability of Slimming outperforms all the other existing techniques with 97.0% of Precision and 98.8% of Recall. Compared with the prior debloating techniques, Slimming can reliably remove the bloated dependencies with a 100% test passing ratio and improve the rationality of debloating solutions. In our large-scale study in the Maven ecosystem, Slimming reported 484 bloated dependencies to 66 open-source projects. Thirty-eight reports (57.6%) have been confirmed by developers.
Ying Wang, Shing-Chi Cheung, Hai Yu, Zhiliang Zhu
Chapter 7. Exploring Cross Ecosystem Vulnerability Impacts
Abstract
Vulnerabilities, referred to as CLV issues, are induced by cross-language invocations of vulnerable libraries. Such issues greatly increase the attack surface of Python/Java projects due to their pervasive use of C libraries. Existing Python/Java build tools in PyPI and Maven ecosystems fail to report the dependency on vulnerable libraries written in other languages such as C. CLV issues are easily missed by developers. In this chapter, we conduct the first empirical study on the status quo of CLV issues in PyPI and Maven ecosystems. It is found that 82,951 projects in these ecosystems are directly or indirectly dependent on libraries compiled from the C project versions that are identified to be vulnerable in CVE reports. Our study arouses the awareness of CLV issues in popular ecosystems and presents related analysis results.
The study also leads to the development of the first automated mechanism, Insight, which provides a turn-key solution to the identification of CLV issues in PyPI and Maven projects based on published CVE reports of vulnerable C projects. Insight automatically identifies if a PyPI or Maven project is using a C library compiled from vulnerable C project versions in published CVE reports. It also deduces the vulnerable APIs involved by analyzing the usage of various foreign function interfaces such as CFFI and JNI in the concerned PyPI or Maven project. Insight achieves a high detection rate of 88.4% on a popular CLV issue benchmark. Contributing to the open-source community, we report 226 CLV issues detected in the actively maintained PyPI and Maven projects that are directly dependent on vulnerable C library versions. Our reports are well received and appreciated by developers with queries on the availability of Insight. A total of 127 reported issues (56.2%) were quickly confirmed by developers, and 74.8% of them were fixed/under fixing by popular projects, such as Mongodb (2022. https://​www.​mongodb.​com/​. Accessed: 2022-03-01) and Eclipse/Sumo (2022. https://​www.​eclipse.​org/​sumo/​. Accessed: 2022-03-01).
Ying Wang, Shing-Chi Cheung, Hai Yu, Zhiliang Zhu
Chapter 8. Boosting the Propagation of Vulnerability Fixes in the npm Ecosystem
Abstract
Vulnerabilities are known reported security threats that affect a large amount of packages in the npm ecosystem. To mitigate these security threats, the open-source community strongly suggests vulnerable packages to timely publish vulnerability fixes and recommends affected packages to update their dependencies. However, there are still serious lags in the propagation of vulnerability fixes in the ecosystem. In our preliminary study on the latest versions of 356,283 active npm packages, we found that 20.0% of them can still introduce vulnerabilities via direct or transitive dependencies although the involved vulnerable packages have already published fix versions for over a year. Prior study by Chinthanet et al. (Empir Softw Eng 26(3):1–28, 2021) lays the groundwork for research on how to mitigate propagation lags of vulnerability fixes in an ecosystem. They conducted an empirical investigation to identify lags that might occur between the vulnerable package release and its fixing release. They found that factors such as the branch upon which a fix landed and the severity of the vulnerability had a small effect on its propagation trajectory throughout the ecosystem. To ensure quick adoption and propagation of a release that contains the fix, they gave several actionable advices to developers and researchers. However, it is still an open question how to design an effective technique to accelerate the propagation of vulnerability fixes. Motivated by this problem, in this chapter, we conducted an empirical study to learn the scale of packages that block the propagation of vulnerability fixes in the ecosystem and investigate their evolution characteristics. Furthermore, we distilled the remediation strategies that have better effects on mitigating the fix propagation lags. Leveraging our empirical findings, we propose an ecosystem-level technique, Plumber, for deriving feasible remediation strategies to boost the propagation of vulnerability fixes.
Ying Wang, Shing-Chi Cheung, Hai Yu, Zhiliang Zhu

Tools and Datasets

Frontmatter
Chapter 9. “League of Legends” Toolkit
Abstract
In this chapter, we provide a toolkit named “League of Legends” including five available tools to combat dependency issues in multiple ecosystems: Piecer (for machine learning pipelines/libraries), NuFix (for .NET), Slimming (for Java), Insight (across Java/Python to C/C++ ecosystems), and Plumber (for Node.js). We also provide datasets of library dependency metadata and vulnerability metadata to facilitate future research.
Ying Wang, Shing-Chi Cheung, Hai Yu, Zhiliang Zhu
Chapter 10. Epilogue
Abstract
In the realm of software industry security, managing open-source software artifacts and establishing/maintaining trustworthy, sustainable software library ecosystems are critical imperatives in the digital economy era.
Ying Wang, Shing-Chi Cheung, Hai Yu, Zhiliang Zhu
Metadaten
Titel
Managing Software Supply Chains
verfasst von
Ying Wang
Shing-Chi Cheung
Hai Yu
Zhiliang Zhu
Copyright-Jahr
2025
Verlag
Springer Nature Singapore
Electronic ISBN
978-981-9617-97-5
Print ISBN
978-981-9617-96-8
DOI
https://doi.org/10.1007/978-981-96-1797-5