skip to main content
10.1145/3544902.3546636acmconferencesArticle/Chapter ViewAbstractPublication PagesesemConference Proceedingsconference-collections
research-article

A Preliminary Investigation of MLOps Practices in GitHub

Published:19 September 2022Publication History

ABSTRACT

Background. The rapid and growing popularity of machine learning (ML) applications has led to an increasing interest in MLOps, that is, the practice of continuous integration and deployment (CI/CD) of ML-enabled systems. Aims. Since changes may affect not only the code but also the ML model parameters and the data themselves, the automation of traditional CI/CD needs to be extended to manage model retraining in production. Method. In this paper, we present an initial investigation of the MLOps practices implemented in a set of ML-enabled systems retrieved from GitHub, focusing on GitHub Actions and CML, two solutions to automate the development workflow. Results. Our preliminary results suggest that the adoption of MLOps workflows in open-source GitHub projects is currently rather limited. Conclusions. Issues are also identified, which can guide future research work.

References

  1. Amine Barrak, Ellis E. Eghan, and Bram Adams. 2021. On the Co-evolution of ML Pipelines and Source Code - Empirical Study of DVC Projects. In 2021 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE, Honolulu, HI, USA, 422–433. https://doi.org/10.1109/SANER50967.2021.00046Google ScholarGoogle Scholar
  2. Sumon Biswas, Md Johirul Islam, Yijia Huang, and Hridesh Rajan. 2019. Boa Meets Python: A Boa Dataset of Data Science Software in Python Language. In 2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR). IEEE, Montreal, QC, Canada, 577–581. https://doi.org/10.1109/MSR.2019.00086Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Christof Ebert, Gorka Gallardo, Josune Hernantes, and Nicolas Serrano. 2016. DevOps. Ieee Software 33, 3 (2016), 94–100. Publisher: IEEE.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Danielle Gonzalez, Thomas Zimmermann, and Nachiappan Nagappan. 2020. The State of the ML-universe: 10 Years of Artificial Intelligence & Machine Learning Software Development on GitHub. In Proceedings of the 17th International Conference on Mining Software Repositories. ACM, Seoul Republic of Korea, 431–442. https://doi.org/10.1145/3379597.3387473Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Michael Hilton, Timothy Tunnell, Kai Huang, Darko Marinov, and Danny Dig. 2016. Usage, costs, and benefits of continuous integration in open-source projects. In Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering. ACM, Singapore Singapore, 426–437. https://doi.org/10.1145/2970276.2970358Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Eirini Kalliamvakou, Georgios Gousios, Kelly Blincoe, Leif Singer, Daniel M. German, and Daniela Damian. 2014. The promises and perils of mining GitHub. In Proceedings of the 11th Working Conference on Mining Software Repositories - MSR 2014. ACM Press, Hyderabad, India, 92–101. https://doi.org/10.1145/2597073.2597074Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Timothy Kinsman, Mairieli Wessel, Marco A. Gerosa, and Christoph Treude. 2021. How Do Software Developers Use GitHub Actions to Automate Their Workflows?. In 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR). IEEE, Madrid, Spain, 420–431. https://doi.org/10.1109/MSR52588.2021.00054Google ScholarGoogle ScholarCross RefCross Ref
  8. Grace A. Lewis, Stephany Bellomo, and Ipek Ozkaya. 2021. Characterizing and Detecting Mismatch in Machine-Learning-Enabled Systems. In 2021 IEEE/ACM 1st Workshop on AI Engineering - Software Engineering for AI (WAIN). IEEE, Madrid, Spain. https://doi.org/10.1109/WAIN52551.2021.00028Google ScholarGoogle Scholar
  9. Thomas A. Limoncelli. 2018. GitOps: A Path to More Self-Service IT: IaC + PR = GitOps. Queue 16, 3 (June 2018), 13–26. https://doi.org/10.1145/3236386.3237207 Place: New York, NY, USA Publisher: Association for Computing Machinery.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Sasu Makinen, Henrik Skogstrom, Eero Laaksonen, and Tommi Mikkonen. 2021. Who Needs MLOps: What Data Scientists Seek to Accomplish and How Can MLOps Help?. In 2021 IEEE/ACM 1st Workshop on AI Engineering - Software Engineering for AI (WAIN). IEEE, Madrid, Spain, 109–112. https://doi.org/10.1109/WAIN52551.2021.00024Google ScholarGoogle ScholarCross RefCross Ref
  11. Nuthan Munaiah, Steven Kroh, Craig Cabrey, and Meiyappan Nagappan. 2017. Curating GitHub for engineered software projects. Empirical Software Engineering 22, 6 (Dec. 2017), 3219–3253. https://doi.org/10.1007/s10664-017-9512-6Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Nadia Nahar, Shurui Zhou, Grace Lewis, and Christian Kästner. 2021. Collaboration Challenges in Building ML-Enabled Systems: Communication, Documentation, Engineering, and Process. arXiv:2110.10234 [cs] (Dec. 2021). http://arxiv.org/abs/2110.10234 arXiv:2110.10234.Google ScholarGoogle Scholar
  13. Ipek Ozkaya. 2020. What is really different in engineering ai-enabled systems?IEEE Software 37, 4 (2020), 3–6. Publisher: IEEE.Google ScholarGoogle Scholar
  14. Mark Treveil, Nicolas Omont, Clément Stenac, Kenji Lefevre, Du Phan, Joachim Zentici, Adrien Lavoillotte, Makoto Miyazaki, and Lynn Heidmann. 2020. Introducing MLOps. O’Reilly Media.Google ScholarGoogle Scholar

Index Terms

  1. A Preliminary Investigation of MLOps Practices in GitHub

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        ESEM '22: Proceedings of the 16th ACM / IEEE International Symposium on Empirical Software Engineering and Measurement
        September 2022
        318 pages
        ISBN:9781450394277
        DOI:10.1145/3544902

        Copyright © 2022 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 19 September 2022

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed limited

        Acceptance Rates

        Overall Acceptance Rate130of594submissions,22%

        Upcoming Conference

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format .

      View HTML Format