Skip to main content
Erschienen in: Empirical Software Engineering 3/2023

01.06.2023

Towards a change taxonomy for machine learning pipelines

Empirical study of ML pipelines and forks related to academic publications

verfasst von: Aaditya Bhatia, Ellis E. Eghan, Manel Grichi, William G. Cavanagh, Zhen Ming (Jack) Jiang, Bram Adams

Erschienen in: Empirical Software Engineering | Ausgabe 3/2023

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Machine Learning (ML) academic publications commonly provide open-source implementations on GitHub, allowing their audience to replicate, validate, or even extend the ML algorithms, data sets and metadata. However, thus far little is known about the degree of collaboration activity happening on such ML research repositories, in particular regarding (1) the degree to which such repositories receive contributions from forks, (2) the nature of such contributions (i.e., the types of changes), and (3) the nature of changes that are not contributed back to forks, which might represent missed opportunities. In this paper, we empirically study contributions to 1,346 ML research repositories and their 67,369 forks, both quantitatively and qualitatively, by building on Hindle et al.’s seminal taxonomy of code changes. We found that while ML research repositories are heavily forked, only 9% of the forks made modifications to the forked repository. 42% of the latter sent changes to the parent repositories, half of which (52%) were accepted by the parent repositories. Our qualitative analysis on 539 contributed and 378 local (fork-only) changes extends Hindle et al.’s taxonomy with two new top-level change categories related to ML (Data and Dependency Management), and 16 new sub-categories, including nine ML-specific ones (input data, parameter tuning, pre-processing, training infrastructure, model structure, pipeline performance, sharing, validation infrastructure, and output data). While the changes that are not contributed back by the forks mostly concern domain-specific features and local experimentation (e.g., parameter tuning), the origin repositories do miss out on a non-trivial 15.4% of Documentation changes, 13.6% of Feature changes and 11.4% of Bug fix changes.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Fußnoten
5
In August 2022, PapersWithCode indexed more than 77,640 academic AI publications along with their code bases.
 
6
In the remainder of this paper, we use the term “ML research repositories” to identify such ML pipelines.
 
18
PRs on GitHub are not limited to just one commit.
 
25
Note that Hindle et al.’s meta program change category is unrelated to the field of Metaprogramming (https://​en.​wikipedia.​org/​wiki/​Metaprogramming).
 
Literatur
Zurück zum Zitat Amershi S, Begel A, Bird C, DeLine R, Gall H, Kamar E, Nagappan N, Nushi B, Zimmermann T (2019) Software engineering for machine learning: a case study. In: 2019 IEEE/ACM 41st international conference on software engineering: software engineering in practice (ICSE-SEIP), pp 291–300 Amershi S, Begel A, Bird C, DeLine R, Gall H, Kamar E, Nagappan N, Nushi B, Zimmermann T (2019) Software engineering for machine learning: a case study. In: 2019 IEEE/ACM 41st international conference on software engineering: software engineering in practice (ICSE-SEIP), pp 291–300
Zurück zum Zitat Arpteg A, Brinne B, Crnkovic-Friis L, Bosch J (2018) Software engineering challenges of deep learning. In: 2018 44th Euromicro conference on software engineering and advanced applications (SEAA). IEEE, pp 50–59 Arpteg A, Brinne B, Crnkovic-Friis L, Bosch J (2018) Software engineering challenges of deep learning. In: 2018 44th Euromicro conference on software engineering and advanced applications (SEAA). IEEE, pp 50–59
Zurück zum Zitat Benestad HC, Anda B, Arisholm E (2009) Understanding software maintenance and evolution by analyzing individual changes: a literature review. J Softw Maint Evol Res Pract 21(6):349–378CrossRef Benestad HC, Anda B, Arisholm E (2009) Understanding software maintenance and evolution by analyzing individual changes: a literature review. J Softw Maint Evol Res Pract 21(6):349–378CrossRef
Zurück zum Zitat Biazzini M, Baudry B (2014) may the fork be with you: novel metrics to analyze collaboration on github. In: Proceedings of the 5th international workshop on emerging trends in software metrics, pp 37–43 Biazzini M, Baudry B (2014) may the fork be with you: novel metrics to analyze collaboration on github. In: Proceedings of the 5th international workshop on emerging trends in software metrics, pp 37–43
Zurück zum Zitat Bird C, Bachmann A, Aune E, Duffy J, Bernstein A, Filkov V, Devanbu P (2009) Fair and balanced? bias in bug-fix datasets. In: Proceedings of the 7th joint meeting of the european software engineering conference and the ACM SIGSOFT symposium on the foundations of software engineering, pp 121–130 Bird C, Bachmann A, Aune E, Duffy J, Bernstein A, Filkov V, Devanbu P (2009) Fair and balanced? bias in bug-fix datasets. In: Proceedings of the 7th joint meeting of the european software engineering conference and the ACM SIGSOFT symposium on the foundations of software engineering, pp 121–130
Zurück zum Zitat Bissyandé TF, Thung F, Wang S, Lo D, Jiang L, Ré veillère L (2013) Empirical evaluation of bug linking. In: 2013 17th European conference on software maintenance and Reengineering, pp 89–98 Bissyandé TF, Thung F, Wang S, Lo D, Jiang L, Ré veillère L (2013) Empirical evaluation of bug linking. In: 2013 17th European conference on software maintenance and Reengineering, pp 89–98
Zurück zum Zitat Bloice MD, Holzinger A (2016) A tutorial on machine learning and data science tools with python. Machine Learning for Health Informatics, pp 435–480 Bloice MD, Holzinger A (2016) A tutorial on machine learning and data science tools with python. Machine Learning for Health Informatics, pp 435–480
Zurück zum Zitat Borges H, Valente MT (2018) What’s in a github star? understanding repository starring practices in a social coding platform. J Syst Softw 146:112–129CrossRef Borges H, Valente MT (2018) What’s in a github star? understanding repository starring practices in a social coding platform. J Syst Softw 146:112–129CrossRef
Zurück zum Zitat Brisson S, Noei E, Lyons K (2020) We are family: analyzing communication in github software repositories and their forks. In: 2020 IEEE 27th international conference on software analysis Evolution and Reengineering (SANER). IEEE, pp 59–69 Brisson S, Noei E, Lyons K (2020) We are family: analyzing communication in github software repositories and their forks. In: 2020 IEEE 27th international conference on software analysis Evolution and Reengineering (SANER). IEEE, pp 59–69
Zurück zum Zitat Chen Z, Zhang JM, Sarro F, Harman M (2022) Maat: a novel ensemble approach to addressing fairness and performance bugs for machine learning software. In: Proceedings of the 30th ACM joint european software engineering conference and symposium on the foundations of software engineering (ESEC/FSE’22). ACM Press Chen Z, Zhang JM, Sarro F, Harman M (2022) Maat: a novel ensemble approach to addressing fairness and performance bugs for machine learning software. In: Proceedings of the 30th ACM joint european software engineering conference and symposium on the foundations of software engineering (ESEC/FSE’22). ACM Press
Zurück zum Zitat Cheng D, Cao C, Xu C, Ma X (2018) Manifesting bugs in machine learning code: An explorative study with mutation testing. In: 2018 IEEE international conference on software quality, reliability and security (QRS). IEEE, pp 313–324 Cheng D, Cao C, Xu C, Ma X (2018) Manifesting bugs in machine learning code: An explorative study with mutation testing. In: 2018 IEEE international conference on software quality, reliability and security (QRS). IEEE, pp 313–324
Zurück zum Zitat Constantino K, Zhou S, Souza M, Figueiredo E, Kästner C (2020) Understanding collaborative software development: an interview study. In: Proceedings of the 15th international conference on global software engineering, pp 55–65 Constantino K, Zhou S, Souza M, Figueiredo E, Kästner C (2020) Understanding collaborative software development: an interview study. In: Proceedings of the 15th international conference on global software engineering, pp 55–65
Zurück zum Zitat Cortés-Coy LF, Linares-Vásquez M, Aponte J, Poshyvanyk D (2014) On automatically generating commit messages via summarization of source code changes. In: 2014 IEEE 14th international working conference on source code analysis and manipulation, pp 275–284 Cortés-Coy LF, Linares-Vásquez M, Aponte J, Poshyvanyk D (2014) On automatically generating commit messages via summarization of source code changes. In: 2014 IEEE 14th international working conference on source code analysis and manipulation, pp 275–284
Zurück zum Zitat Decan A, Mens T, Grosjean P (2019) An empirical comparison of dependency network evolution in seven software packaging ecosystems. Empir Softw Eng 24(1):381–416CrossRef Decan A, Mens T, Grosjean P (2019) An empirical comparison of dependency network evolution in seven software packaging ecosystems. Empir Softw Eng 24(1):381–416CrossRef
Zurück zum Zitat Dey T, Mockus A (2020) Which pull requests get accepted and why? a study of popular npm packages, arXiv:2003.01153 Dey T, Mockus A (2020) Which pull requests get accepted and why? a study of popular npm packages, arXiv:2003.​01153
Zurück zum Zitat Dwarakanath A, Ahuja M, Sikand S, Rao RM, Bose RJC, Dubash N, Podder S (2018) Identifying implementation bugs in machine learning based image classifiers using metamorphic testing. In: Proceedings of the 27th ACM SIGSOFT international symposium on software testing and analysis, pp 118–128 Dwarakanath A, Ahuja M, Sikand S, Rao RM, Bose RJC, Dubash N, Podder S (2018) Identifying implementation bugs in machine learning based image classifiers using metamorphic testing. In: Proceedings of the 27th ACM SIGSOFT international symposium on software testing and analysis, pp 118–128
Zurück zum Zitat Fan Y, Xia X, Lo D, Hassan AE, Li S (2021) What makes a popular academic AI repository? Empir Softw Eng 26(1):1–35CrossRef Fan Y, Xia X, Lo D, Hassan AE, Li S (2021) What makes a popular academic AI repository? Empir Softw Eng 26(1):1–35CrossRef
Zurück zum Zitat Faragó C, Hegedũs P (2014) R Ferenc, The impact of version control operations on the quality change of the source code. In: International conference on computational science and its applications. Springer, pp 353–369 Faragó C, Hegedũs P (2014) R Ferenc, The impact of version control operations on the quality change of the source code. In: International conference on computational science and its applications. Springer, pp 353–369
Zurück zum Zitat Fogel K (2005) Producing open source software: How to run a successful free software project. O’Reilly Media, Inc., Fogel K (2005) Producing open source software: How to run a successful free software project. O’Reilly Media, Inc.,
Zurück zum Zitat German DM, Adams B, Hassan AE (2016) Continuously mining distributed version control systems: an empirical study of how linux uses git. Empir. Softw. Eng. 21(1):260–299CrossRef German DM, Adams B, Hassan AE (2016) Continuously mining distributed version control systems: an empirical study of how linux uses git. Empir. Softw. Eng. 21(1):260–299CrossRef
Zurück zum Zitat Ghadhab L, Jenhani I, Mkaouer MW, Messaoud MB (2021) Augmenting commit classification by using fine-grained source code changes and a pre-trained deep neural language model, vol 135 Ghadhab L, Jenhani I, Mkaouer MW, Messaoud MB (2021) Augmenting commit classification by using fine-grained source code changes and a pre-trained deep neural language model, vol 135
Zurück zum Zitat Gousios G, Pinzger M, Deursen AV (2014) An exploratory study of the pull-based software development model. In: Proceedings of the 36th international conference on software engineering, pp 345–355 Gousios G, Pinzger M, Deursen AV (2014) An exploratory study of the pull-based software development model. In: Proceedings of the 36th international conference on software engineering, pp 345–355
Zurück zum Zitat Granger B, Pérez F (2021) Jupyter: thinking and storytelling with code and data Authorea Preprints Granger B, Pérez F (2021) Jupyter: thinking and storytelling with code and data Authorea Preprints
Zurück zum Zitat Hindle A, German DM, Godfrey MW, Holt RC (2009) Automatic classication of large changes into maintenance categories. In: 2009 IEEE 17th International Conference on Program Comprehension. IEEE, pp 30–39 Hindle A, German DM, Godfrey MW, Holt RC (2009) Automatic classication of large changes into maintenance categories. In: 2009 IEEE 17th International Conference on Program Comprehension. IEEE, pp 30–39
Zurück zum Zitat Hindle D, German M, Holt R (2008) What do large commits tell us? a taxonomical study of large commits. In: Proceedings of the 2008 international working conference on mining software repositories, ser. MSR ’08. New York, NY, USA: association for computing machinery, pp 99–108. [Online]. Available:. https://doi.org/10.1145/1370750.1370773 Hindle D, German M, Holt R (2008) What do large commits tell us? a taxonomical study of large commits. In: Proceedings of the 2008 international working conference on mining software repositories, ser. MSR ’08. New York, NY, USA: association for computing machinery, pp 99–108. [Online]. Available:. https://​doi.​org/​10.​1145/​1370750.​1370773
Zurück zum Zitat Hu Y, Zhang J, Bai X, Yu S, Yang Z (2016) Influence analysis of github repositories. SpringerPlus 5(1):1–19CrossRef Hu Y, Zhang J, Bai X, Yu S, Yang Z (2016) Influence analysis of github repositories. SpringerPlus 5(1):1–19CrossRef
Zurück zum Zitat Idowu S, Strüber D, Berger T (2021) Asset management in machine learning: a survey. In: 2021 IEEE/ACM 43rd international conference on software engineering: software engineering in practice (ICSE-SEIP), pp 51–60 Idowu S, Strüber D, Berger T (2021) Asset management in machine learning: a survey. In: 2021 IEEE/ACM 43rd international conference on software engineering: software engineering in practice (ICSE-SEIP), pp 51–60
Zurück zum Zitat Kalliamvakou E, Gousios G, Blincoe K, Singer L, German DM, Damian D (2014) The promises and perils of mining github. In: Proceedings of the 11th working conference on mining software repositories, ser. MSR 2014. New York, NY, USA: association for computing machinery, pp 92–101. [Online]. Available: https://doi.org/10.1145/2597073.2597074 Kalliamvakou E, Gousios G, Blincoe K, Singer L, German DM, Damian D (2014) The promises and perils of mining github. In: Proceedings of the 11th working conference on mining software repositories, ser. MSR 2014. New York, NY, USA: association for computing machinery, pp 92–101. [Online]. Available: https://​doi.​org/​10.​1145/​2597073.​2597074
Zurück zum Zitat Kim M, Cai D, Kim S (2011) An empirical investigation into the role of api-level refactorings during software evolution. In: Proceedings of the 33rd international conference on software engineering, pp 151–160 Kim M, Cai D, Kim S (2011) An empirical investigation into the role of api-level refactorings during software evolution. In: Proceedings of the 33rd international conference on software engineering, pp 151–160
Zurück zum Zitat Krippendorff K (2011) Computing krippendorff’s alpha-reliability Krippendorff K (2011) Computing krippendorff’s alpha-reliability
Zurück zum Zitat Li H, Shang W, Adams B, Sayagh M, Hassan AE (2020) A qualitative study of the benefits and costs of logging from developers’ perspectives. IEEE Transactions on Software Engineering Li H, Shang W, Adams B, Sayagh M, Hassan AE (2020) A qualitative study of the benefits and costs of logging from developers’ perspectives. IEEE Transactions on Software Engineering
Zurück zum Zitat Lima A, Rossi L, Musolesi M (2014) Coding together at scale: Github as a collaborative social network. In: Eighth international AAAI conference on weblogs and social media Lima A, Rossi L, Musolesi M (2014) Coding together at scale: Github as a collaborative social network. In: Eighth international AAAI conference on weblogs and social media
Zurück zum Zitat Martínez-Fernández S, Bogner J, Franch X, Oriol M, Siebert J, Trendowicz A, Vollmer AM, Wagner S (2021) Software engineering for ai-based systems, a survey, arXiv:2105.01984 Martínez-Fernández S, Bogner J, Franch X, Oriol M, Siebert J, Trendowicz A, Vollmer AM, Wagner S (2021) Software engineering for ai-based systems, a survey, arXiv:2105.​01984
Zurück zum Zitat Mukherjee S, Almanza A, Rubio-González C (2021) Fixing dependency errors for python build reproducibility. In: Proceedings of the 30th ACM SIGSOFT international symposium on software testing and analysis, pp 439–451 Mukherjee S, Almanza A, Rubio-González C (2021) Fixing dependency errors for python build reproducibility. In: Proceedings of the 30th ACM SIGSOFT international symposium on software testing and analysis, pp 439–451
Zurück zum Zitat Nahar N, Zhou S, Lewis G, Kästner C (2022) Collaboration challenges in building ml-enabled systems: communication, documentation, engineering, and process. In: 2022 IEEE/ACM 44th international conference on software engineering (ICSE) Nahar N, Zhou S, Lewis G, Kästner C (2022) Collaboration challenges in building ml-enabled systems: communication, documentation, engineering, and process. In: 2022 IEEE/ACM 44th international conference on software engineering (ICSE)
Zurück zum Zitat Ng A (2021) Mlops: from model-centric to data-centric ai Ng A (2021) Mlops: from model-centric to data-centric ai
Zurück zum Zitat O’Leary K, Uchida M (2020) Common problems with creating machine learning pipelines from existing code O’Leary K, Uchida M (2020) Common problems with creating machine learning pipelines from existing code
Zurück zum Zitat Ozkaya I (2020) What is really different in engineering ai-enabled systems? IEEE Softw 37(4):3–6CrossRef Ozkaya I (2020) What is really different in engineering ai-enabled systems? IEEE Softw 37(4):3–6CrossRef
Zurück zum Zitat Pashchenko I, Vu D-L, Massacci F (2020) A qualitative study of dependency management and its security implications. In: Proceedings of the 2020 ACM SIGSAC conference on computer and communications security, pp 1513–1531 Pashchenko I, Vu D-L, Massacci F (2020) A qualitative study of dependency management and its security implications. In: Proceedings of the 2020 ACM SIGSAC conference on computer and communications security, pp 1513–1531
Zurück zum Zitat Polyzotis N, Roy S, Whang SE, Zinkevich M (2018) Data lifecycle challenges in production machine learning: a survey. ACM SIGMOD Rec 47(2):17–28CrossRef Polyzotis N, Roy S, Whang SE, Zinkevich M (2018) Data lifecycle challenges in production machine learning: a survey. ACM SIGMOD Rec 47(2):17–28CrossRef
Zurück zum Zitat Rahman MM, Roy CK (2014) An insight into the pull requests of github. In: Proceedings of the 11th working conference on mining software repositories, pp 364–367 Rahman MM, Roy CK (2014) An insight into the pull requests of github. In: Proceedings of the 11th working conference on mining software repositories, pp 364–367
Zurück zum Zitat Ren L, Zhou S, Kä stner C (2018) Poster: forks insight: Providing an overview of github forks. In: 2018 IEEE/ACM 40th international conference on software engineering: companion (ICSE-Companion), pp 179–180 Ren L, Zhou S, Kä stner C (2018) Poster: forks insight: Providing an overview of github forks. In: 2018 IEEE/ACM 40th international conference on software engineering: companion (ICSE-Companion), pp 179–180
Zurück zum Zitat Salza P, Palomba F, Di Nucci D, D’Uva C, De Lucia A, Ferrucci F (2018) Do developers update third-party libraries in mobile apps?. In: Proceedings of the 26th conference on program comprehension, pp 255–265 Salza P, Palomba F, Di Nucci D, D’Uva C, De Lucia A, Ferrucci F (2018) Do developers update third-party libraries in mobile apps?. In: Proceedings of the 26th conference on program comprehension, pp 255–265
Zurück zum Zitat Sambasivan N, Kapania S, Highfill H, Akrong D, Paritosh P, Aroyo LM (2021) Everyone wants to do the model work, not the data work: data cascades in high-stakes ai. In: Proceedings of the 2021 CHI conference on human factors in computing systems, pp 1–15 Sambasivan N, Kapania S, Highfill H, Akrong D, Paritosh P, Aroyo LM (2021) Everyone wants to do the model work, not the data work: data cascades in high-stakes ai. In: Proceedings of the 2021 CHI conference on human factors in computing systems, pp 1–15
Zurück zum Zitat Santos JAM, Santos AR, Mendonç a MG (2015) Investigating bias in the search phase of software engineering secondary studies. In: CIbSE, pp 488 Santos JAM, Santos AR, Mendonç a MG (2015) Investigating bias in the search phase of software engineering secondary studies. In: CIbSE, pp 488
Zurück zum Zitat Shivaji S, Whitehead EJ, Akella R, Kim S (2012) Reducing features to improve code change-based bug prediction. IEEE Trans Softw Eng 39 (4):552–569CrossRef Shivaji S, Whitehead EJ, Akella R, Kim S (2012) Reducing features to improve code change-based bug prediction. IEEE Trans Softw Eng 39 (4):552–569CrossRef
Zurück zum Zitat Swanson EB (1976) The dimensions of maintenance. In: Proceedings of the 2nd international conference on Software engineering, pp 492–497 Swanson EB (1976) The dimensions of maintenance. In: Proceedings of the 2nd international conference on Software engineering, pp 492–497
Zurück zum Zitat Tizpaz-Niari S, Černỳ P, Trivedi A (2020) Detecting and understanding real-world differential performance bugs in machine learning libraries. In: Proceedings of the 29th ACM SIGSOFT international symposium on software testing and analysis, pp 189–199 Tizpaz-Niari S, Černỳ P, Trivedi A (2020) Detecting and understanding real-world differential performance bugs in machine learning libraries. In: Proceedings of the 29th ACM SIGSOFT international symposium on software testing and analysis, pp 189–199
Zurück zum Zitat Wang J, Li L, Zeller A (2020) Better code, better sharing: on the need of analyzing jupyter notebooks. In: Proceedings of the ACM/IEEE 42nd international conference on software engineering: new ideas and emerging results, pp 53–56 Wang J, Li L, Zeller A (2020) Better code, better sharing: on the need of analyzing jupyter notebooks. In: Proceedings of the ACM/IEEE 42nd international conference on software engineering: new ideas and emerging results, pp 53–56
Zurück zum Zitat Washizaki H, Uchida H, Khomh F, Gué héneuc Y-G (2019) Studying software engineering patterns for designing machine learning systems. In: 2019 10th International workshop on empirical software engineering in practice (IWESEP). IEEE, pp 49–495 Washizaki H, Uchida H, Khomh F, Gué héneuc Y-G (2019) Studying software engineering patterns for designing machine learning systems. In: 2019 10th International workshop on empirical software engineering in practice (IWESEP). IEEE, pp 49–495
Zurück zum Zitat Wu R, Zhang H, Kim S, Cheung S-C (2011) Relink: recovering links between bugs and changes. In: Proceedings of the 19th ACM SIGSOFT symposium and the 13th european conference on foundations of software engineering, pp 15–25 Wu R, Zhang H, Kim S, Cheung S-C (2011) Relink: recovering links between bugs and changes. In: Proceedings of the 19th ACM SIGSOFT symposium and the 13th european conference on foundations of software engineering, pp 15–25
Zurück zum Zitat Yan M, Fu Y, Zhang X, Yang D, Xu L, Kymer JD (2016) Automatically classifying software changes via discriminative topic model: supporting multi-category and cross-project. J Syst Softw 113:296–308CrossRef Yan M, Fu Y, Zhang X, Yang D, Xu L, Kymer JD (2016) Automatically classifying software changes via discriminative topic model: supporting multi-category and cross-project. J Syst Softw 113:296–308CrossRef
Zurück zum Zitat Zhang X, Chen Y, Gu Y, Zou W, Xie X, Jia X, Xuan J (2018) How do multiple pull requests change the same code: a study of competing pull requests in github. In: 2018 IEEE international conference on software maintenance and evolution (ICSME).IEEE, pp 228–239 Zhang X, Chen Y, Gu Y, Zou W, Xie X, Jia X, Xuan J (2018) How do multiple pull requests change the same code: a study of competing pull requests in github. In: 2018 IEEE international conference on software maintenance and evolution (ICSME).IEEE, pp 228–239
Zurück zum Zitat Zhang T, Gao C, Ma L, Lyu M, Kim M (2019) An empirical study of common challenges in developing deep learning applications. In: 2019 IEEE 30th international symposium on software reliability engineering (ISSRE). IEEE, pp 104–115 Zhang T, Gao C, Ma L, Lyu M, Kim M (2019) An empirical study of common challenges in developing deep learning applications. In: 2019 IEEE 30th international symposium on software reliability engineering (ISSRE). IEEE, pp 104–115
Zurück zum Zitat Zhao Y, Leung H, Yang Y, Zhou Y, Xu B (2017) Towards an understanding of change types in bug fixing code. Inf Softw Technol 86:37–53CrossRef Zhao Y, Leung H, Yang Y, Zhou Y, Xu B (2017) Towards an understanding of change types in bug fixing code. Inf Softw Technol 86:37–53CrossRef
Zurück zum Zitat Zhou S, Vasilescu B, Kä stner C (2020) How has forking changed in the last 20 years? a study of hard forks on github. In: 2020 IEEE/ACM 42nd international conference on software engineering (ICSE). IEEE, pp 445–456 Zhou S, Vasilescu B, Kä stner C (2020) How has forking changed in the last 20 years? a study of hard forks on github. In: 2020 IEEE/ACM 42nd international conference on software engineering (ICSE). IEEE, pp 445–456
Zurück zum Zitat Zhou S, Vasilescu B, Kastner C (2019) What the fork: a study of inefficient and efficient forking practices in social coding. In: Proceedings of the 2019 27th ACM joint meeting on european software engineering conference and symposium on the foundations of software engineering, pp 350–361 Zhou S, Vasilescu B, Kastner C (2019) What the fork: a study of inefficient and efficient forking practices in social coding. In: Proceedings of the 2019 27th ACM joint meeting on european software engineering conference and symposium on the foundations of software engineering, pp 350–361
Metadaten
Titel
Towards a change taxonomy for machine learning pipelines
Empirical study of ML pipelines and forks related to academic publications
verfasst von
Aaditya Bhatia
Ellis E. Eghan
Manel Grichi
William G. Cavanagh
Zhen Ming (Jack) Jiang
Bram Adams
Publikationsdatum
01.06.2023
Verlag
Springer US
Erschienen in
Empirical Software Engineering / Ausgabe 3/2023
Print ISSN: 1382-3256
Elektronische ISSN: 1573-7616
DOI
https://doi.org/10.1007/s10664-022-10282-8

Weitere Artikel der Ausgabe 3/2023

Empirical Software Engineering 3/2023 Zur Ausgabe

Premium Partner