research-article

A Preliminary Investigation of MLOps Practices in GitHub

Authors:
Fabio Calefato

University of Bari, Italy

University of Bari, Italy
View Profile

,
Filippo Lanubile

University of Bari, Italy

University of Bari, Italy
View Profile

,
Luigi Quaranta

University of Bari, Italy

University of Bari, Italy
View Profile

ESEM '22: Proceedings of the 16th ACM / IEEE International Symposium on Empirical Software Engineering and MeasurementSeptember 2022Pages 283–288https://doi.org/10.1145/3544902.3546636

Published:19 September 2022Publication History

ESEM '22: Proceedings of the 16th ACM / IEEE International Symposium on Empirical Software Engineering and Measurement

Pages 283–288

ABSTRACT

Background. The rapid and growing popularity of machine learning (ML) applications has led to an increasing interest in MLOps, that is, the practice of continuous integration and deployment (CI/CD) of ML-enabled systems. Aims. Since changes may affect not only the code but also the ML model parameters and the data themselves, the automation of traditional CI/CD needs to be extended to manage model retraining in production. Method. In this paper, we present an initial investigation of the MLOps practices implemented in a set of ML-enabled systems retrieved from GitHub, focusing on GitHub Actions and CML, two solutions to automate the development workflow. Results. Our preliminary results suggest that the adoption of MLOps workflows in open-source GitHub projects is currently rather limited. Conclusions. Issues are also identified, which can guide future research work.

References

Amine Barrak, Ellis E. Eghan, and Bram Adams. 2021. On the Co-evolution of ML Pipelines and Source Code - Empirical Study of DVC Projects. In 2021 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE, Honolulu, HI, USA, 422–433. https://doi.org/10.1109/SANER50967.2021.00046Google Scholar
Sumon Biswas, Md Johirul Islam, Yijia Huang, and Hridesh Rajan. 2019. Boa Meets Python: A Boa Dataset of Data Science Software in Python Language. In 2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR). IEEE, Montreal, QC, Canada, 577–581. https://doi.org/10.1109/MSR.2019.00086Google ScholarDigital Library
Christof Ebert, Gorka Gallardo, Josune Hernantes, and Nicolas Serrano. 2016. DevOps. Ieee Software 33, 3 (2016), 94–100. Publisher: IEEE.Google ScholarDigital Library
Danielle Gonzalez, Thomas Zimmermann, and Nachiappan Nagappan. 2020. The State of the ML-universe: 10 Years of Artificial Intelligence & Machine Learning Software Development on GitHub. In Proceedings of the 17th International Conference on Mining Software Repositories. ACM, Seoul Republic of Korea, 431–442. https://doi.org/10.1145/3379597.3387473Google ScholarDigital Library
Michael Hilton, Timothy Tunnell, Kai Huang, Darko Marinov, and Danny Dig. 2016. Usage, costs, and benefits of continuous integration in open-source projects. In Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering. ACM, Singapore Singapore, 426–437. https://doi.org/10.1145/2970276.2970358Google ScholarDigital Library
Eirini Kalliamvakou, Georgios Gousios, Kelly Blincoe, Leif Singer, Daniel M. German, and Daniela Damian. 2014. The promises and perils of mining GitHub. In Proceedings of the 11th Working Conference on Mining Software Repositories - MSR 2014. ACM Press, Hyderabad, India, 92–101. https://doi.org/10.1145/2597073.2597074Google ScholarDigital Library
Timothy Kinsman, Mairieli Wessel, Marco A. Gerosa, and Christoph Treude. 2021. How Do Software Developers Use GitHub Actions to Automate Their Workflows?. In 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR). IEEE, Madrid, Spain, 420–431. https://doi.org/10.1109/MSR52588.2021.00054Google ScholarCross Ref
Grace A. Lewis, Stephany Bellomo, and Ipek Ozkaya. 2021. Characterizing and Detecting Mismatch in Machine-Learning-Enabled Systems. In 2021 IEEE/ACM 1st Workshop on AI Engineering - Software Engineering for AI (WAIN). IEEE, Madrid, Spain. https://doi.org/10.1109/WAIN52551.2021.00028Google Scholar
Thomas A. Limoncelli. 2018. GitOps: A Path to More Self-Service IT: IaC + PR = GitOps. Queue 16, 3 (June 2018), 13–26. https://doi.org/10.1145/3236386.3237207 Place: New York, NY, USA Publisher: Association for Computing Machinery.Google ScholarDigital Library
Sasu Makinen, Henrik Skogstrom, Eero Laaksonen, and Tommi Mikkonen. 2021. Who Needs MLOps: What Data Scientists Seek to Accomplish and How Can MLOps Help?. In 2021 IEEE/ACM 1st Workshop on AI Engineering - Software Engineering for AI (WAIN). IEEE, Madrid, Spain, 109–112. https://doi.org/10.1109/WAIN52551.2021.00024Google ScholarCross Ref
Nuthan Munaiah, Steven Kroh, Craig Cabrey, and Meiyappan Nagappan. 2017. Curating GitHub for engineered software projects. Empirical Software Engineering 22, 6 (Dec. 2017), 3219–3253. https://doi.org/10.1007/s10664-017-9512-6Google ScholarDigital Library
Nadia Nahar, Shurui Zhou, Grace Lewis, and Christian Kästner. 2021. Collaboration Challenges in Building ML-Enabled Systems: Communication, Documentation, Engineering, and Process. arXiv:2110.10234 [cs] (Dec. 2021). http://arxiv.org/abs/2110.10234 arXiv:2110.10234.Google Scholar
Ipek Ozkaya. 2020. What is really different in engineering ai-enabled systems?IEEE Software 37, 4 (2020), 3–6. Publisher: IEEE.Google Scholar
Mark Treveil, Nicolas Omont, Clément Stenac, Kenji Lefevre, Du Phan, Joachim Zentici, Adrien Lavoillotte, Makoto Miyazaki, and Lynn Heidmann. 2020. Introducing MLOps. O’Reilly Media.Google Scholar

Index Terms

A Preliminary Investigation of MLOps Practices in GitHub
1. Computing methodologies
  1. Machine learning
2. Software and its engineering

Recommendations

ActionsRemaker: Reproducing GitHub Actions
ICSE '23: Proceedings of the 45th International Conference on Software Engineering: Companion Proceedings

Mining Continuous Integration and Continuous Delivery (CI/CD) has enabled new research opportunities for the software engineering (SE) research community. However, it remains a challenge to reproduce CI/CD build processes, which is crucial for several ...
Read More
GitHub Actions: The Impact on the Pull Request Process
Abstract
Software projects frequently use automation tools to perform repetitive activities in the distributed software development process. Recently, GitHub introduced GitHub Actions, a feature providing automated workflows for software projects. ...
Read More
Open source-style collaborative development practices in commercial projects using GitHub
ICSE '15: Proceedings of the 37th International Conference on Software Engineering - Volume 1

Researchers are currently drawn to study projects hosted on GitHub due to its popularity, ease of obtaining data, and its distinctive built-in social features. GitHub has been found to create a transparent development environment, which together with a ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ESEM '22: Proceedings of the 16th ACM / IEEE International Symposium on Empirical Software Engineering and Measurement
September 2022
318 pages
ISBN:9781450394277
DOI:10.1145/3544902
Editors:
Fernanda Madeiral,
Casper Lassenius,
Tayana Conte,
Tomi Männistö
Copyright © 2022 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 19 September 2022
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
CI/CD
CML
GitHub Actions
ML-enabled systems
automated workflows
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate130of594submissions,22%
Upcoming Conference
ESEM '24

Sponsor:

sigsoft

ACM / IEEE International Symposium on Empirical Software Engineering and Measurement

October 24 - 25, 2024

Barcelona , Spain
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 5
  Total Citations
  View Citations
- 469
  Total Downloads
- Downloads (Last 12 months)219
- Downloads (Last 6 weeks)28
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

A Preliminary Investigation of MLOps Practices in GitHub

ESEM '22: Proceedings of the 16th ACM / IEEE International Symposium on Empirical Software Engineering and Measurement

ABSTRACT

References

Cited By

Index Terms

Recommendations

ActionsRemaker: Reproducing GitHub Actions

GitHub Actions: The Impact on the Pull Request Process

Open source-style collaborative development practices in commercial projects using GitHub