Skip to main content
Erschienen in: Empirical Software Engineering 7/2022

01.12.2022

A large-scale empirical study of commit message generation: models, datasets and evaluation

verfasst von: Wei Tao, Yanlin Wang, Ensheng Shi, Lun Du, Shi Han, Hongyu Zhang, Dongmei Zhang, Wenqiang Zhang

Erschienen in: Empirical Software Engineering | Ausgabe 7/2022

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Commit messages are natural language descriptions of code changes, which are important for program understanding and maintenance. However, writing commit messages manually is time-consuming and laborious, especially when the code is updated frequently. Various approaches utilizing generation or retrieval techniques have been proposed to automatically generate commit messages. To achieve a better understanding of how the existing approaches perform in solving this problem, this paper conducts a systematic and in-depth analysis of the state-of-the-art models and datasets. We find that: (1) Different variants of the BLEU metric used in previous works affect the evaluation. (2) Most datasets are crawled only from Java repositories while repositories in other programming languages are not sufficiently explored. (3) Dataset splitting strategies can influence the performance of existing models by a large margin. (4) For pre-trained models, fune-tuning with different multi-programming-language combinations can influence their performance. Based on these findings, we collect a large-scale, information-rich, M ulti-language C ommit M essage D ataset (MCMD). Using MCMD, we conduct extensive experiments under different experiment settings including splitting strategies and multi-programming-language combinations. Furthermore, we provide suggestions for comprehensively evaluating commit message generation models and discuss possible future research directions. We believe our work can help practitioners and researchers better evaluate and select models for automatic commit message generation. Our source code and data are available at https://​anonymous.​4open.​science/​r/​CommitMessageEmp​irical.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Literatur
Zurück zum Zitat Ahmad WU, Chakraborty S, Ray B, Chang K (2021) Unified pre-training for program understanding and generation. In: NAACL-HLT. Association for Computational Linguistics, pp 2655–2668 Ahmad WU, Chakraborty S, Ray B, Chang K (2021) Unified pre-training for program understanding and generation. In: NAACL-HLT. Association for Computational Linguistics, pp 2655–2668
Zurück zum Zitat Alon U, Brody S, Levy O, Yahav E (2019) code2seq: generating sequences from structured representations of code. In: ICLR Alon U, Brody S, Levy O, Yahav E (2019) code2seq: generating sequences from structured representations of code. In: ICLR
Zurück zum Zitat Bahdanau D, Cho K, Bengio Y (2015) Neural machine translation by jointly learning to align and translate. In: ICLR Bahdanau D, Cho K, Bengio Y (2015) Neural machine translation by jointly learning to align and translate. In: ICLR
Zurück zum Zitat Banerjee S, Lavie A (2005) METEOR: an automatic metric for MT evaluation with improved correlation with human judgments. In: IEEValuation@ACL. Association for Computational Linguistics, pp 65–72 Banerjee S, Lavie A (2005) METEOR: an automatic metric for MT evaluation with improved correlation with human judgments. In: IEEValuation@ACL. Association for Computational Linguistics, pp 65–72
Zurück zum Zitat Barnett JG, Gathuru CK, Soldano LS, McIntosh S (2016) The relationship between commit message detail and defect proneness in java projects on github. In: MSR. ACM, pp 496–499 Barnett JG, Gathuru CK, Soldano LS, McIntosh S (2016) The relationship between commit message detail and defect proneness in java projects on github. In: MSR. ACM, pp 496–499
Zurück zum Zitat Buse RPL, Weimer W (2010) Automatically documenting program changes. In: ASE. ACM, pp 33–42 Buse RPL, Weimer W (2010) Automatically documenting program changes. In: ASE. ACM, pp 33–42
Zurück zum Zitat Chen B, Cherry C (2014) A systematic comparison of smoothing techniques for sentence-level BLEU. In: WMT@ACL. The Association for Computer Linguistics, pp 362–367 Chen B, Cherry C (2014) A systematic comparison of smoothing techniques for sentence-level BLEU. In: WMT@ACL. The Association for Computer Linguistics, pp 362–367
Zurück zum Zitat Clark K, Luong M, Le QV, Manning CD (2020) ELECTRA: pre-training text encoders as discriminators rather than generators. In: ICLR. OpenReview.net Clark K, Luong M, Le QV, Manning CD (2020) ELECTRA: pre-training text encoders as discriminators rather than generators. In: ICLR. OpenReview.net
Zurück zum Zitat Conneau A, Wu S, Li H, Zettlemoyer L, Stoyanov V (2020) Emerging cross-lingual structure in pretrained language models. In: ACL. Association for Computational Linguistics, pp 6022–6034 Conneau A, Wu S, Li H, Zettlemoyer L, Stoyanov V (2020) Emerging cross-lingual structure in pretrained language models. In: ACL. Association for Computational Linguistics, pp 6022–6034
Zurück zum Zitat Cortes-Coy LF, Vásquez ML, Aponte J, Poshyvanyk D (2014) On automatically generating commit messages via summarization of source code changes. In: SCAM. IEEE Computer Society, pp 275–284 Cortes-Coy LF, Vásquez ML, Aponte J, Poshyvanyk D (2014) On automatically generating commit messages via summarization of source code changes. In: SCAM. IEEE Computer Society, pp 275–284
Zurück zum Zitat Devlin J, Chang M, Lee K, Toutanova K (2019) BERT: pre-training of deep bidirectional transformers for language understanding. In: NAACL-HLT (1). Association for computational linguistics, pp 4171–4186 Devlin J, Chang M, Lee K, Toutanova K (2019) BERT: pre-training of deep bidirectional transformers for language understanding. In: NAACL-HLT (1). Association for computational linguistics, pp 4171–4186
Zurück zum Zitat Dragan N, Collard ML, Maletic JI (2006) Reverse engineering method stereotypes. In: ICSM. IEEE Computer Society, pp 24–34 Dragan N, Collard ML, Maletic JI (2006) Reverse engineering method stereotypes. In: ICSM. IEEE Computer Society, pp 24–34
Zurück zum Zitat Feng Z, Guo D, Tang D, Duan N, Feng X, Gong M, Shou L, Qin B, Liu T, Jiang D, Zhou M (2020) Codebert: a pre-trained model for programming and natural languages. In: EMNLP (Findings), findings of ACL, vol EMNLP 2020. Association for Computational Linguistics, pp 1536–1547 Feng Z, Guo D, Tang D, Duan N, Feng X, Gong M, Shou L, Qin B, Liu T, Jiang D, Zhou M (2020) Codebert: a pre-trained model for programming and natural languages. In: EMNLP (Findings), findings of ACL, vol EMNLP 2020. Association for Computational Linguistics, pp 1536–1547
Zurück zum Zitat Fluri B, Würsch M, Pinzger M, Gall HC (2007) Change distilling: tree differencing for fine-grained source code change extraction. IEEE Trans Software Eng 33(11):725–743CrossRef Fluri B, Würsch M, Pinzger M, Gall HC (2007) Change distilling: tree differencing for fine-grained source code change extraction. IEEE Trans Software Eng 33(11):725–743CrossRef
Zurück zum Zitat Guo D, Ren S, Lu S, Feng Z, Tang D, Liu S, Zhou L, Duan N, Svyatkovskiy A, Fu S, Tufano M, Deng SK, Clement CB, Drain D, Sundaresan N, Yin J, Jiang D, Zhou M (2021) Graphcodebert: pre-training code representations with data flow. In: ICLR. OpenReview.net Guo D, Ren S, Lu S, Feng Z, Tang D, Liu S, Zhou L, Duan N, Svyatkovskiy A, Fu S, Tufano M, Deng SK, Clement CB, Drain D, Sundaresan N, Yin J, Jiang D, Zhou M (2021) Graphcodebert: pre-training code representations with data flow. In: ICLR. OpenReview.net
Zurück zum Zitat Hayes AF, Krippendorff K (2007) Answering the call for a standard reliability measure for coding data. Commun Methods Meas 1(1):77–89CrossRef Hayes AF, Krippendorff K (2007) Answering the call for a standard reliability measure for coding data. Commun Methods Meas 1(1):77–89CrossRef
Zurück zum Zitat Hindle A, Germán DM, Godfrey MW, Holt RC (2009) Automatic classication of large changes into maintenance categories. In: ICPC. IEEE Computer Society, pp 30–39 Hindle A, Germán DM, Godfrey MW, Holt RC (2009) Automatic classication of large changes into maintenance categories. In: ICPC. IEEE Computer Society, pp 30–39
Zurück zum Zitat Hoang T, Kang HJ, Lo D, Lawall J (2020) Cc2vec: distributed representations of code changes. In: ICSE. ACM, pp 518–529 Hoang T, Kang HJ, Lo D, Lawall J (2020) Cc2vec: distributed representations of code changes. In: ICSE. ACM, pp 518–529
Zurück zum Zitat Huang Y, Jia N, Zhou H, Chen X, Zheng Z, Tang M (2020) Learning human-written commit messages to document code changes. J Comput Sci Technol 35(6):1258–1277CrossRef Huang Y, Jia N, Zhou H, Chen X, Zheng Z, Tang M (2020) Learning human-written commit messages to document code changes. J Comput Sci Technol 35(6):1258–1277CrossRef
Zurück zum Zitat Jiang S (2019) Boosting neural commit message generation with code semantic analysis. In: ASE. IEEE, pp 1280–1282 Jiang S (2019) Boosting neural commit message generation with code semantic analysis. In: ASE. IEEE, pp 1280–1282
Zurück zum Zitat Jiang S, Armaly A, McMillan C (2017) Automatically generating commit messages from diffs using neural machine translation. In: ASE Jiang S, Armaly A, McMillan C (2017) Automatically generating commit messages from diffs using neural machine translation. In: ASE
Zurück zum Zitat Jiang S, McMillan C (2017) Towards automatic generation of short summaries of commits. In: Proceedings of the 25th international conference on program comprehension, ICPC 2017, Buenos Aires, Argentina, May 22-23, 2017 Jiang S, McMillan C (2017) Towards automatic generation of short summaries of commits. In: Proceedings of the 25th international conference on program comprehension, ICPC 2017, Buenos Aires, Argentina, May 22-23, 2017
Zurück zum Zitat Koehn P, Hoang H, Birch A, Callison-Burch C, Federico M, Bertoldi N, Cowan B, Shen W, Moran C, Zens R, Dyer C, Bojar O, Constantin A, Herbst E (2007) Moses: open source toolkit for statistical machine translation. In: ACL. The Association for Computational Linguistics Koehn P, Hoang H, Birch A, Callison-Burch C, Federico M, Bertoldi N, Cowan B, Shen W, Moran C, Zens R, Dyer C, Bojar O, Constantin A, Herbst E (2007) Moses: open source toolkit for statistical machine translation. In: ACL. The Association for Computational Linguistics
Zurück zum Zitat Lample G, Conneau A, Ranzato M, Denoyer L, Jégou H (2018) Word translation without parallel data. In: ICLR (Poster). Openreview.net Lample G, Conneau A, Ranzato M, Denoyer L, Jégou H (2018) Word translation without parallel data. In: ICLR (Poster). Openreview.net
Zurück zum Zitat LeClair A, McMillan C (2019) Recommendations for datasets for source code summarization. In: NAACL-HLT (1). Association for Computational Linguistics, pp 3931–3937 LeClair A, McMillan C (2019) Recommendations for datasets for source code summarization. In: NAACL-HLT (1). Association for Computational Linguistics, pp 3931–3937
Zurück zum Zitat Lin C (2004) ROUGE: a package for automatic evaluation of summaries. In: Text summarization branches out, pp 74–81 Lin C (2004) ROUGE: a package for automatic evaluation of summaries. In: Text summarization branches out, pp 74–81
Zurück zum Zitat Lin C, Och FJ (2004) Automatic evaluation of machine translation quality using longest common subsequence and skip-bigram statistics. In: ACL. ACL, pp 605–612 Lin C, Och FJ (2004) Automatic evaluation of machine translation quality using longest common subsequence and skip-bigram statistics. In: ACL. ACL, pp 605–612
Zurück zum Zitat Liu C, Xia X, Lo D, Gao C, Yang X, Grundy JC (2022) Opportunities and challenges in code search tools. ACM Comput Surv 54(9):196:1–196:40CrossRef Liu C, Xia X, Lo D, Gao C, Yang X, Grundy JC (2022) Opportunities and challenges in code search tools. ACM Comput Surv 54(9):196:1–196:40CrossRef
Zurück zum Zitat Liu Q, Liu Z, Zhu H, Fan H, Du B, Qian Y (2019) Generating commit messages from diffs using pointer-generator network. In: MSR. IEEE/ACM, pp 299–309 Liu Q, Liu Z, Zhu H, Fan H, Du B, Qian Y (2019) Generating commit messages from diffs using pointer-generator network. In: MSR. IEEE/ACM, pp 299–309
Zurück zum Zitat Liu S, Gao C, Chen S, Nie LY, Liu Y (2020) ATOM: commit message generation based on abstract syntax tree and hybrid ranking. TSE PP:1–1 Liu S, Gao C, Chen S, Nie LY, Liu Y (2020) ATOM: commit message generation based on abstract syntax tree and hybrid ranking. TSE PP:1–1
Zurück zum Zitat Liu Y, Ott M, Goyal N, Du J, Joshi M, Chen D, Levy O, Lewis M, Zettlemoyer L, Stoyanov V (2019) Roberta: a robustly optimized BERT pretraining approach. arXiv:1907.11692 Liu Y, Ott M, Goyal N, Du J, Joshi M, Chen D, Levy O, Lewis M, Zettlemoyer L, Stoyanov V (2019) Roberta: a robustly optimized BERT pretraining approach. arXiv:1907.​11692
Zurück zum Zitat Liu Z, Xia X, Hassan AE, Lo D, Xing Z, Wang X (2018) Neural-machine-translation-based commit message generation: how far are we?. In: ASE. ACM, pp 373–384 Liu Z, Xia X, Hassan AE, Lo D, Xing Z, Wang X (2018) Neural-machine-translation-based commit message generation: how far are we?. In: ASE. ACM, pp 373–384
Zurück zum Zitat Liu Z, Xia X, Treude C, Lo D, Li S (2019) Automatic generation of pull request descriptions. In: ASE. IEEE, pp 176–188 Liu Z, Xia X, Treude C, Lo D, Li S (2019) Automatic generation of pull request descriptions. In: ASE. IEEE, pp 176–188
Zurück zum Zitat Loyola P, Marrese-taylor E, Balazs JA, Matsuo Y, Satoh F (2018) Content aware source code change description generation. In: INLG. Association for Computational Linguistics, pp 119–128 Loyola P, Marrese-taylor E, Balazs JA, Matsuo Y, Satoh F (2018) Content aware source code change description generation. In: INLG. Association for Computational Linguistics, pp 119–128
Zurück zum Zitat Loyola P, Marrese-Taylor E, Matsuo Y (2017) A neural architecture for generating natural language descriptions from source code changes. In: ACL (2). Association for Computational Linguistics, pp 287–292 Loyola P, Marrese-Taylor E, Matsuo Y (2017) A neural architecture for generating natural language descriptions from source code changes. In: ACL (2). Association for Computational Linguistics, pp 287–292
Zurück zum Zitat Luong T, Pham H, Manning CD (2015) Effective approaches to attention-based neural machine translation. In: EMNLP, pp 1412–1421 Luong T, Pham H, Manning CD (2015) Effective approaches to attention-based neural machine translation. In: EMNLP, pp 1412–1421
Zurück zum Zitat Ma Q, Wei J, Bojar O, Graham Y (2019) Results of the WMT19 metrics shared task: segment-level and strong MT systems pose big challenges. In: WMT (2). Association for Computational Linguistics, pp 62–90 Ma Q, Wei J, Bojar O, Graham Y (2019) Results of the WMT19 metrics shared task: segment-level and strong MT systems pose big challenges. In: WMT (2). Association for Computational Linguistics, pp 62–90
Zurück zum Zitat Mogotsi IC, Manning CD, Raghavan P, Schütze H (2010) Introduction to information retrieval - Cambridge University Press, Cambridge, England, 2008, 482 pp, ISBN: 978-0-521-86571-5. Inf Retr 13(2):192–195CrossRef Mogotsi IC, Manning CD, Raghavan P, Schütze H (2010) Introduction to information retrieval - Cambridge University Press, Cambridge, England, 2008, 482 pp, ISBN: 978-0-521-86571-5. Inf Retr 13(2):192–195CrossRef
Zurück zum Zitat Moreno L, Aponte J, Sridhara G, Marcus A, Pollock LL, Vijay-Shanker K (2013) Automatic generation of natural language summaries for java classes. In: ICPC. IEEE Computer Society, pp 23–32 Moreno L, Aponte J, Sridhara G, Marcus A, Pollock LL, Vijay-Shanker K (2013) Automatic generation of natural language summaries for java classes. In: ICPC. IEEE Computer Society, pp 23–32
Zurück zum Zitat Moreno L, Marcus A (2012) Jstereocode: automatically identifying method and class stereotypes in java code. In: ASE. ACM, pp 358–361 Moreno L, Marcus A (2012) Jstereocode: automatically identifying method and class stereotypes in java code. In: ASE. ACM, pp 358–361
Zurück zum Zitat Myers JL, Well AD, Lorch RF Jr (2013) Research design and statistical analysis. Routledge Myers JL, Well AD, Lorch RF Jr (2013) Research design and statistical analysis. Routledge
Zurück zum Zitat Nie LY, Gao C, Zhong Z, Lam W, Liu Y, Xu Z (2021) Coregen: contextualized code representation learning for commit message generation. Neurocomputing 459:97–107CrossRef Nie LY, Gao C, Zhong Z, Lam W, Liu Y, Xu Z (2021) Coregen: contextualized code representation learning for commit message generation. Neurocomputing 459:97–107CrossRef
Zurück zum Zitat Panichella S, Panichella A, Beller M, Zaidman A, Gall HC (2016) The impact of test case summaries on bug fixing performance: an empirical investigation. In: ICSE. ACM, pp 547–558 Panichella S, Panichella A, Beller M, Zaidman A, Gall HC (2016) The impact of test case summaries on bug fixing performance: an empirical investigation. In: ICSE. ACM, pp 547–558
Zurück zum Zitat Papineni K, Roukos S, Ward T, Zhu W (2002) Bleu: a method for automatic evaluation of machine translation. In: ACL. ACL, pp 311–318 Papineni K, Roukos S, Ward T, Zhu W (2002) Bleu: a method for automatic evaluation of machine translation. In: ACL. ACL, pp 311–318
Zurück zum Zitat Petersen K, Vakkalanka S, Kuzniarz L (2015) Guidelines for conducting systematic mapping studies in software engineering: an update. Inf Softw Technol 64:1–18CrossRef Petersen K, Vakkalanka S, Kuzniarz L (2015) Guidelines for conducting systematic mapping studies in software engineering: an update. Inf Softw Technol 64:1–18CrossRef
Zurück zum Zitat Ranzato M, Chopra S, Auli M, Zaremba W (2016) Sequence level training with recurrent neural networks. In: ICLR (Poster) Ranzato M, Chopra S, Auli M, Zaremba W (2016) Sequence level training with recurrent neural networks. In: ICLR (Poster)
Zurück zum Zitat Rebai S, Kessentini M, Alizadeh V, Sghaier OB, Kazman R (2020) Recommending refactorings via commit message analysis. Inf Softw Technol 126:106332CrossRef Rebai S, Kessentini M, Alizadeh V, Sghaier OB, Kazman R (2020) Recommending refactorings via commit message analysis. Inf Softw Technol 126:106332CrossRef
Zurück zum Zitat See A, Liu PJ, Manning CD (2017) Get to the point: summarization with pointer-generator networks. In: ACL (1). Association for Computational Linguistics, pp 1073–1083 See A, Liu PJ, Manning CD (2017) Get to the point: summarization with pointer-generator networks. In: ACL (1). Association for Computational Linguistics, pp 1073–1083
Zurück zum Zitat Sennrich R, Firat O, Cho K, Birch A, Haddow B, Hitschler J, Junczys-Dowmunt M, Läubli S, Barone AVM, Mokry J, Nadejde M (2017) Nematus: a toolkit for neural machine translation. In: EACL (Software demonstrations). Association for Computational Linguistics, pp 65–68 Sennrich R, Firat O, Cho K, Birch A, Haddow B, Hitschler J, Junczys-Dowmunt M, Läubli S, Barone AVM, Mokry J, Nadejde M (2017) Nematus: a toolkit for neural machine translation. In: EACL (Software demonstrations). Association for Computational Linguistics, pp 65–68
Zurück zum Zitat Shen J, Sun X, Li B, Yang H, Hu J (2016) On automatic summarization of what and why information in source code changes. In: COMPSAC. IEEE Computer Society, pp 103–112 Shen J, Sun X, Li B, Yang H, Hu J (2016) On automatic summarization of what and why information in source code changes. In: COMPSAC. IEEE Computer Society, pp 103–112
Zurück zum Zitat Sillito J, Murphy GC, Volder KD (2008) Asking and answering questions during a programming change task. IEEE Trans Software Eng 34(4):434–451CrossRef Sillito J, Murphy GC, Volder KD (2008) Asking and answering questions during a programming change task. IEEE Trans Software Eng 34(4):434–451CrossRef
Zurück zum Zitat Sorbo AD, Visaggio CA, Penta MD, Canfora G, Panichella S (2021) An nlp-based tool for software artifacts analysis. In: ICSME. IEEE, pp 569–573 Sorbo AD, Visaggio CA, Penta MD, Canfora G, Panichella S (2021) An nlp-based tool for software artifacts analysis. In: ICSME. IEEE, pp 569–573
Zurück zum Zitat Swanson EB (1976) The dimensions of maintenance. In: ICSE. IEEE Computer Society, pp 492–497 Swanson EB (1976) The dimensions of maintenance. In: ICSE. IEEE Computer Society, pp 492–497
Zurück zum Zitat Tao W, Wang Y, Shi E, Du L, Han S, Zhang H, Zhang D, Zhang W (2021) On the evaluation of commit message generation models: an experimental study. In: ICSME. IEEE, pp 126–136 Tao W, Wang Y, Shi E, Du L, Han S, Zhang H, Zhang D, Zhang W (2021) On the evaluation of commit message generation models: an experimental study. In: ICSME. IEEE, pp 126–136
Zurück zum Zitat van der Lee C, Gatt A, van Miltenburg E, Wubben S, Krahmer E (2019) Best practices for the human evaluation of automatically generated text. In: Proceedings of the 12th international conference on natural language generation, INLG van der Lee C, Gatt A, van Miltenburg E, Wubben S, Krahmer E (2019) Best practices for the human evaluation of automatically generated text. In: Proceedings of the 12th international conference on natural language generation, INLG
Zurück zum Zitat Vásquez ML, Cortes-Coy LF, Aponte J, Poshyvanyk D (2015) Changescribe: a tool for automatically generating commit messages. In: ICSE (2). IEEE Computer Society, pp 709–712 Vásquez ML, Cortes-Coy LF, Aponte J, Poshyvanyk D (2015) Changescribe: a tool for automatically generating commit messages. In: ICSE (2). IEEE Computer Society, pp 709–712
Zurück zum Zitat Wang B, Yan M, Liu Z, Xu L, Xia X, Zhang X, Yang D (2021a) Quality assurance for automated commit message generation. In: SANER. IEEE, pp 260–271 Wang B, Yan M, Liu Z, Xu L, Xia X, Zhang X, Yang D (2021a) Quality assurance for automated commit message generation. In: SANER. IEEE, pp 260–271
Zurück zum Zitat Wang H, Xia X, Lo D, He Q, Wang X, Grundy J (2021b) Context-aware retrieval-based deep commit message generation. ACM Trans Softw Eng Methodol 30(4):56:1–56:30CrossRef Wang H, Xia X, Lo D, He Q, Wang X, Grundy J (2021b) Context-aware retrieval-based deep commit message generation. ACM Trans Softw Eng Methodol 30(4):56:1–56:30CrossRef
Zurück zum Zitat Wang X, Wang Y, Wan Y, Wang J, Zhou P, Li L, Wu H, Liu J (2022) CODE-MVP: learning to represent source code from multiple views with contrastive pre-training. In: NAACL-HLT. Association For computational Linguistics Wang X, Wang Y, Wan Y, Wang J, Zhou P, Li L, Wu H, Liu J (2022) CODE-MVP: learning to represent source code from multiple views with contrastive pre-training. In: NAACL-HLT. Association For computational Linguistics
Zurück zum Zitat Wang Y, Wang W, Joty SR, Hoi SCH (2021) Codet5: identifier-aware unified pre-trained encoder-decoder models for code understanding and generation. In: EMNLP (1). Association for Computational Linguistics, pp 8696–8708 Wang Y, Wang W, Joty SR, Hoi SCH (2021) Codet5: identifier-aware unified pre-trained encoder-decoder models for code understanding and generation. In: EMNLP (1). Association for Computational Linguistics, pp 8696–8708
Zurück zum Zitat Xu S, Yao Y, Xu F, Gu T, Tong H, Lu J (2019) Commit message generation for source code changes. In: IJCAI, pp 3975–3981. ijcai.org Xu S, Yao Y, Xu F, Gu T, Tong H, Lu J (2019) Commit message generation for source code changes. In: IJCAI, pp 3975–3981. ijcai.org
Zurück zum Zitat Xue N (2011) Steven bird, Evan Klein and Edward Loper. Natural Language Processing with Python. O’Reilly Media, Inc 2009. ISBN: 978-0-596-51649-9. Nat Lang Eng 17(3):419–424CrossRef Xue N (2011) Steven bird, Evan Klein and Edward Loper. Natural Language Processing with Python. O’Reilly Media, Inc 2009. ISBN: 978-0-596-51649-9. Nat Lang Eng 17(3):419–424CrossRef
Zurück zum Zitat Yang Y, Xia X, Lo D, Grundy JC (2020) A survey on deep learning for software engineering. ACM Comput Surv Yang Y, Xia X, Lo D, Grundy JC (2020) A survey on deep learning for software engineering. ACM Comput Surv
Metadaten
Titel
A large-scale empirical study of commit message generation: models, datasets and evaluation
verfasst von
Wei Tao
Yanlin Wang
Ensheng Shi
Lun Du
Shi Han
Hongyu Zhang
Dongmei Zhang
Wenqiang Zhang
Publikationsdatum
01.12.2022
Verlag
Springer US
Erschienen in
Empirical Software Engineering / Ausgabe 7/2022
Print ISSN: 1382-3256
Elektronische ISSN: 1573-7616
DOI
https://doi.org/10.1007/s10664-022-10219-1

Weitere Artikel der Ausgabe 7/2022

Empirical Software Engineering 7/2022 Zur Ausgabe

Premium Partner