ABSTRACT
Background. The rapid and growing popularity of machine learning (ML) applications has led to an increasing interest in MLOps, that is, the practice of continuous integration and deployment (CI/CD) of ML-enabled systems. Aims. Since changes may affect not only the code but also the ML model parameters and the data themselves, the automation of traditional CI/CD needs to be extended to manage model retraining in production. Method. In this paper, we present an initial investigation of the MLOps practices implemented in a set of ML-enabled systems retrieved from GitHub, focusing on GitHub Actions and CML, two solutions to automate the development workflow. Results. Our preliminary results suggest that the adoption of MLOps workflows in open-source GitHub projects is currently rather limited. Conclusions. Issues are also identified, which can guide future research work.
- Amine Barrak, Ellis E. Eghan, and Bram Adams. 2021. On the Co-evolution of ML Pipelines and Source Code - Empirical Study of DVC Projects. In 2021 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE, Honolulu, HI, USA, 422–433. https://doi.org/10.1109/SANER50967.2021.00046Google Scholar
- Sumon Biswas, Md Johirul Islam, Yijia Huang, and Hridesh Rajan. 2019. Boa Meets Python: A Boa Dataset of Data Science Software in Python Language. In 2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR). IEEE, Montreal, QC, Canada, 577–581. https://doi.org/10.1109/MSR.2019.00086Google ScholarDigital Library
- Christof Ebert, Gorka Gallardo, Josune Hernantes, and Nicolas Serrano. 2016. DevOps. Ieee Software 33, 3 (2016), 94–100. Publisher: IEEE.Google ScholarDigital Library
- Danielle Gonzalez, Thomas Zimmermann, and Nachiappan Nagappan. 2020. The State of the ML-universe: 10 Years of Artificial Intelligence & Machine Learning Software Development on GitHub. In Proceedings of the 17th International Conference on Mining Software Repositories. ACM, Seoul Republic of Korea, 431–442. https://doi.org/10.1145/3379597.3387473Google ScholarDigital Library
- Michael Hilton, Timothy Tunnell, Kai Huang, Darko Marinov, and Danny Dig. 2016. Usage, costs, and benefits of continuous integration in open-source projects. In Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering. ACM, Singapore Singapore, 426–437. https://doi.org/10.1145/2970276.2970358Google ScholarDigital Library
- Eirini Kalliamvakou, Georgios Gousios, Kelly Blincoe, Leif Singer, Daniel M. German, and Daniela Damian. 2014. The promises and perils of mining GitHub. In Proceedings of the 11th Working Conference on Mining Software Repositories - MSR 2014. ACM Press, Hyderabad, India, 92–101. https://doi.org/10.1145/2597073.2597074Google ScholarDigital Library
- Timothy Kinsman, Mairieli Wessel, Marco A. Gerosa, and Christoph Treude. 2021. How Do Software Developers Use GitHub Actions to Automate Their Workflows?. In 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR). IEEE, Madrid, Spain, 420–431. https://doi.org/10.1109/MSR52588.2021.00054Google ScholarCross Ref
- Grace A. Lewis, Stephany Bellomo, and Ipek Ozkaya. 2021. Characterizing and Detecting Mismatch in Machine-Learning-Enabled Systems. In 2021 IEEE/ACM 1st Workshop on AI Engineering - Software Engineering for AI (WAIN). IEEE, Madrid, Spain. https://doi.org/10.1109/WAIN52551.2021.00028Google Scholar
- Thomas A. Limoncelli. 2018. GitOps: A Path to More Self-Service IT: IaC + PR = GitOps. Queue 16, 3 (June 2018), 13–26. https://doi.org/10.1145/3236386.3237207 Place: New York, NY, USA Publisher: Association for Computing Machinery.Google ScholarDigital Library
- Sasu Makinen, Henrik Skogstrom, Eero Laaksonen, and Tommi Mikkonen. 2021. Who Needs MLOps: What Data Scientists Seek to Accomplish and How Can MLOps Help?. In 2021 IEEE/ACM 1st Workshop on AI Engineering - Software Engineering for AI (WAIN). IEEE, Madrid, Spain, 109–112. https://doi.org/10.1109/WAIN52551.2021.00024Google ScholarCross Ref
- Nuthan Munaiah, Steven Kroh, Craig Cabrey, and Meiyappan Nagappan. 2017. Curating GitHub for engineered software projects. Empirical Software Engineering 22, 6 (Dec. 2017), 3219–3253. https://doi.org/10.1007/s10664-017-9512-6Google ScholarDigital Library
- Nadia Nahar, Shurui Zhou, Grace Lewis, and Christian Kästner. 2021. Collaboration Challenges in Building ML-Enabled Systems: Communication, Documentation, Engineering, and Process. arXiv:2110.10234 [cs] (Dec. 2021). http://arxiv.org/abs/2110.10234 arXiv:2110.10234.Google Scholar
- Ipek Ozkaya. 2020. What is really different in engineering ai-enabled systems?IEEE Software 37, 4 (2020), 3–6. Publisher: IEEE.Google Scholar
- Mark Treveil, Nicolas Omont, Clément Stenac, Kenji Lefevre, Du Phan, Joachim Zentici, Adrien Lavoillotte, Makoto Miyazaki, and Lynn Heidmann. 2020. Introducing MLOps. O’Reilly Media.Google Scholar
Index Terms
- A Preliminary Investigation of MLOps Practices in GitHub
Recommendations
ActionsRemaker: Reproducing GitHub Actions
ICSE '23: Proceedings of the 45th International Conference on Software Engineering: Companion ProceedingsMining Continuous Integration and Continuous Delivery (CI/CD) has enabled new research opportunities for the software engineering (SE) research community. However, it remains a challenge to reproduce CI/CD build processes, which is crucial for several ...
GitHub Actions: The Impact on the Pull Request Process
AbstractSoftware projects frequently use automation tools to perform repetitive activities in the distributed software development process. Recently, GitHub introduced GitHub Actions, a feature providing automated workflows for software projects. ...
Open source-style collaborative development practices in commercial projects using GitHub
ICSE '15: Proceedings of the 37th International Conference on Software Engineering - Volume 1Researchers are currently drawn to study projects hosted on GitHub due to its popularity, ease of obtaining data, and its distinctive built-in social features. GitHub has been found to create a transparent development environment, which together with a ...
Comments