Skip to main content
Erschienen in: Empirical Software Engineering 6/2021

01.11.2021

Using a balanced scorecard to identify opportunities to improve code review effectiveness: an industrial experience report

verfasst von: Masum Hasan, Anindya Iqbal, Mohammad Rafid Ul Islam, A.J.M. Imtiajur Rahman, Amiangshu Bosu

Erschienen in: Empirical Software Engineering | Ausgabe 6/2021

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Peer code review is a widely adopted software engineering practice to ensure code quality and ensure software reliability in both the commercial and open-source software projects. Due to the large effort overhead associated with practicing code reviews, project managers often wonder, if their code reviews are effective and if there are improvement opportunities in that respect. Since project managers at Samsung Research Bangladesh (SRBD) were also intrigued by these questions, this research developed, deployed, and evaluated a production-ready solution using the Balanced SCorecard (BSC) strategy that SRBD managers can use in their day-to-day management to monitor individual developer’s, a particular project’s or the entire organization’s code review effectiveness. Following the four-step framework of the BSC strategy, we– 1) defined the operation goals of this research, 2) defined a set of metrics to measure the effectiveness of code reviews, 3) developed an automated mechanism to measure those metrics, and 4) developed and evaluated a monitoring application to inform the key stakeholders. Our automated model to identify useful code reviews achieves 7.88% and 14.39% improvement in terms of accuracy and minority class F1 score respectively over the models proposed in prior studies. It also outperforms human evaluators from SRBD, that the model replaces, by a margin of 25.32% and 23.84% respectively in terms of accuracy and minority class F1 score. In our post-deployment survey, SRBD developers and managers indicated that they found our solution as useful and it provided them with important insights to help their decision makings.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Fußnoten
1
We are unable to make the dataset publicly available due to the restrictions imposed by our NDA with SRBD.
 
2
On StackOverflow, each accepted answer gets 15 points, upvote gets 10 points, and downvote gets -2 points
 
3
The numbers represent the number of interviewees that consider this type of comment as Useful or Not Useful
 
4
Point biserial correlation
 
6
Numbers in parentheses indicate how many CRA users of our evaluation survey mentioned this particular insight. One user may have mentioned multiple insights.
 
Literatur
Zurück zum Zitat Ahmed T, Bosu A, Iqbal A, Rahimi S (2017) SentiCR: a customized sentiment analysis tool for code review interactions. In: 32nd IEEE/ACM international conference on automated software engineering (NIER track), ASE ’17 Ahmed T, Bosu A, Iqbal A, Rahimi S (2017) SentiCR: a customized sentiment analysis tool for code review interactions. In: 32nd IEEE/ACM international conference on automated software engineering (NIER track), ASE ’17
Zurück zum Zitat Bacchelli A, Bird C (2013) Expectations, outcomes, and challenges of modern code review. In: Proceedings of the 2013 international conference on software engineering, pp 712–721. IEEE Press Bacchelli A, Bird C (2013) Expectations, outcomes, and challenges of modern code review. In: Proceedings of the 2013 international conference on software engineering, pp 712–721. IEEE Press
Zurück zum Zitat Barnett M, Bird C, Brunet J, Lahiri SK (2015) Helping developers help themselves: Automatic decomposition of code review changesets. In: Proceedings of the 37th international conference on software engineering-volume 1. IEEE Press, pp 134–144 Barnett M, Bird C, Brunet J, Lahiri SK (2015) Helping developers help themselves: Automatic decomposition of code review changesets. In: Proceedings of the 37th international conference on software engineering-volume 1. IEEE Press, pp 134–144
Zurück zum Zitat Basili VR, Shull F, Lanubile F (1999) Building knowledge through families of experiments. IEEE Trans Softw Eng 25(4):456–473CrossRef Basili VR, Shull F, Lanubile F (1999) Building knowledge through families of experiments. IEEE Trans Softw Eng 25(4):456–473CrossRef
Zurück zum Zitat Beller M, Bacchelli A, Zaidman A, Juergens E (2014) Modern code reviews in open-source projects: Which problems do they fix?. In: Proceedings of the 11th working conference on mining software repositories. pp 202–211 Beller M, Bacchelli A, Zaidman A, Juergens E (2014) Modern code reviews in open-source projects: Which problems do they fix?. In: Proceedings of the 11th working conference on mining software repositories. pp 202–211
Zurück zum Zitat Buitinck L, Louppe G, Blondel M, Pedregosa F, Mueller A, Grisel O, Niculae V, Prettenhofer P, Gramfort A, Grobler J, Layton R, VanderPlas J, Joly A, Holt B, Varoquaux G (2013) API design for machine learning software: experiences from the scikit-learn project. In: ECML PKDD workshop: languages for data mining and machine learning. pp 108–122 Buitinck L, Louppe G, Blondel M, Pedregosa F, Mueller A, Grisel O, Niculae V, Prettenhofer P, Gramfort A, Grobler J, Layton R, VanderPlas J, Joly A, Holt B, Varoquaux G (2013) API design for machine learning software: experiences from the scikit-learn project. In: ECML PKDD workshop: languages for data mining and machine learning. pp 108–122
Zurück zum Zitat Calefato F, Lanubile F, Maiorano F, Novielli N (2018) Sentiment polarity detection for software development. Empir Softw Eng 23(3):1352–1382CrossRef Calefato F, Lanubile F, Maiorano F, Novielli N (2018) Sentiment polarity detection for software development. Empir Softw Eng 23(3):1352–1382CrossRef
Zurück zum Zitat Camilo F, Meneely A, Nagappan M (2015) Do bugs foreshadow vulnerabilities?: a study of the chromium project. In: Proceedings of the 12th working conference on mining software repositories. IEEE Press, pp 269–279 Camilo F, Meneely A, Nagappan M (2015) Do bugs foreshadow vulnerabilities?: a study of the chromium project. In: Proceedings of the 12th working conference on mining software repositories. IEEE Press, pp 269–279
Zurück zum Zitat Chen T, Guestrin C (2016) Xgboost: A scalable tree boosting system. In: Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining. ACM, pp 785–794 Chen T, Guestrin C (2016) Xgboost: A scalable tree boosting system. In: Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining. ACM, pp 785–794
Zurück zum Zitat Chouchen M, Ouni A, Kula RG, Wang D, Thongtanunam P, Mkaouer MW, Matsumoto K (2021) Anti-patterns in modern code review: Symptoms and prevalence. In: 2021 IEEE international conference on software analysis, evolution and reengineering (SANER), pp 531–535, DOI https://doi.org/10.1109/SANER50967.2021.00060, (to appear in print) Chouchen M, Ouni A, Kula RG, Wang D, Thongtanunam P, Mkaouer MW, Matsumoto K (2021) Anti-patterns in modern code review: Symptoms and prevalence. In: 2021 IEEE international conference on software analysis, evolution and reengineering (SANER), pp 531–535, DOI https://​doi.​org/​10.​1109/​SANER50967.​2021.​00060, (to appear in print)
Zurück zum Zitat Cohen J, Brown E, DuRette B, Teleki S (2006) Best kept secrets of peer code review. Smart Bear Somerville Cohen J, Brown E, DuRette B, Teleki S (2006) Best kept secrets of peer code review. Smart Bear Somerville
Zurück zum Zitat Czerwonka J, Greiler M, Tilford J (2015) Code reviews do not find bugs: how the current code review best practice slows us down. In: Proceedings of the 37th international conference on software engineering-volume 2. IEEE Press, pp 27–28 Czerwonka J, Greiler M, Tilford J (2015) Code reviews do not find bugs: how the current code review best practice slows us down. In: Proceedings of the 37th international conference on software engineering-volume 2. IEEE Press, pp 27–28
Zurück zum Zitat di Biase M, Bruntink M, Bacchelli A (2016) A security perspective on code review: The case of chromium. In: 2016 IEEE 16th international working conference on source code analysis and manipulation (SCAM). IEEE, pp 21–30 di Biase M, Bruntink M, Bacchelli A (2016) A security perspective on code review: The case of chromium. In: 2016 IEEE 16th international working conference on source code analysis and manipulation (SCAM). IEEE, pp 21–30
Zurück zum Zitat Ebert F, Castor F, Novielli N, Serebrenik A (2019) Confusion in code reviews: reasons, impacts, and coping strategies. In: 2019 IEEE 26th international conference on software analysis, evolution and reengineering (SANER). IEEE, pp 49–60 Ebert F, Castor F, Novielli N, Serebrenik A (2019) Confusion in code reviews: reasons, impacts, and coping strategies. In: 2019 IEEE 26th international conference on software analysis, evolution and reengineering (SANER). IEEE, pp 49–60
Zurück zum Zitat Flesch R (2007) Flesch–kincaid readability test. Retrieved October 26, 2007 Flesch R (2007) Flesch–kincaid readability test. Retrieved October 26, 2007
Zurück zum Zitat Flyvbjerg B (2006) Five misunderstandings about case-study research. Qual Inq 12(2):219–245CrossRef Flyvbjerg B (2006) Five misunderstandings about case-study research. Qual Inq 12(2):219–245CrossRef
Zurück zum Zitat Fracz W, Dajda J (2018) Developers’ game: A preliminary study concerning a tool for automated developers assessment. In: 2018 IEEE international conference on software maintenance and evolution (ICSME). pp 695–699, DOI https://doi.org/10.1109/ICSME.2018.00079, (to appear in print) Fracz W, Dajda J (2018) Developers’ game: A preliminary study concerning a tool for automated developers assessment. In: 2018 IEEE international conference on software maintenance and evolution (ICSME). pp 695–699, DOI https://​doi.​org/​10.​1109/​ICSME.​2018.​00079, (to appear in print)
Zurück zum Zitat Hatton L (2008) Testing the value of checklists in code inspections. IEEE Softw 25(4):82–88CrossRef Hatton L (2008) Testing the value of checklists in code inspections. IEEE Softw 25(4):82–88CrossRef
Zurück zum Zitat Hirao T, Ihara A, Ueda Y, Phannachitta P, Matsumoto K (2016) The impact of a low level of agreement among reviewers in a code review process. In: IFIP International conference on open source systems. Springer, pp 97–110 Hirao T, Ihara A, Ueda Y, Phannachitta P, Matsumoto K (2016) The impact of a low level of agreement among reviewers in a code review process. In: IFIP International conference on open source systems. Springer, pp 97–110
Zurück zum Zitat Ho TK (1995) Random decision forests. In: Proceedings of 3rd international conference on document analysis and recognition, vol 1. IEEE, pp 278–282 Ho TK (1995) Random decision forests. In: Proceedings of 3rd international conference on document analysis and recognition, vol 1. IEEE, pp 278–282
Zurück zum Zitat Hofner G, Mani V, Nambiar R, Apte M (2011) Fostering a high-performance culture in offshore software engineering teams using balanced scorecards and project scorecards. In: 2011 IEEE Sixth international conference on global software engineering. IEEE, pp 35–39 Hofner G, Mani V, Nambiar R, Apte M (2011) Fostering a high-performance culture in offshore software engineering teams using balanced scorecards and project scorecards. In: 2011 IEEE Sixth international conference on global software engineering. IEEE, pp 35–39
Zurück zum Zitat Hopfield JJ (1988) Artificial neural networks. IEEE Circ Devices Mag 4(5):3–10CrossRef Hopfield JJ (1988) Artificial neural networks. IEEE Circ Devices Mag 4(5):3–10CrossRef
Zurück zum Zitat Hosmer DW Jr, Lemeshow S, Sturdivant RX (2013) Applied logistic regression, vol 398, Wiley, Hoboken Hosmer DW Jr, Lemeshow S, Sturdivant RX (2013) Applied logistic regression, vol 398, Wiley, Hoboken
Zurück zum Zitat Huq F, Hasan M, Pantho MAH, Mahbub S, Iqbal A, Ahmed T (2020) Review4repair: Code review aided automaticprogram repairing. arXiv:2010.01544 Huq F, Hasan M, Pantho MAH, Mahbub S, Iqbal A, Ahmed T (2020) Review4repair: Code review aided automaticprogram repairing. arXiv:2010.​01544
Zurück zum Zitat Islam MR, Zibran MF (2017) Leveraging automated sentiment analysis in software engineering. In: 2017 IEEE/ACM 14th international conference on mining software repositories (MSR). IEEE, pp 203–214 Islam MR, Zibran MF (2017) Leveraging automated sentiment analysis in software engineering. In: 2017 IEEE/ACM 14th international conference on mining software repositories (MSR). IEEE, pp 203–214
Zurück zum Zitat Jin C, De-Lin L, Fen-Xiang M (2009) An improved id3 decision tree algorithm. In: 2009 4th international conference on computer science & education. IEEE, pp 127–130 Jin C, De-Lin L, Fen-Xiang M (2009) An improved id3 decision tree algorithm. In: 2009 4th international conference on computer science & education. IEEE, pp 127–130
Zurück zum Zitat Kaplan RS, Norton DP et al (1992) The balanced scorecard: measures that drive performance Kaplan RS, Norton DP et al (1992) The balanced scorecard: measures that drive performance
Zurück zum Zitat Khomh F, Dhaliwal T, Zou Y, Adams B (2012) Do faster releases improve software quality?: an empirical case study of mozilla firefox. In: Proceedings of the 9th IEEE working conference on mining software repositories. IEEE Press, pp 179–188 Khomh F, Dhaliwal T, Zou Y, Adams B (2012) Do faster releases improve software quality?: an empirical case study of mozilla firefox. In: Proceedings of the 9th IEEE working conference on mining software repositories. IEEE Press, pp 179–188
Zurück zum Zitat Kononenko O, Baysal O, Guerrouj L, Cao Y, Godfrey MW (2015) Investigating code review quality: Do people and participation matter?. In: 2015 IEEE International conference on software maintenance and evolution (ICSME), pp 111–120 Kononenko O, Baysal O, Guerrouj L, Cao Y, Godfrey MW (2015) Investigating code review quality: Do people and participation matter?. In: 2015 IEEE International conference on software maintenance and evolution (ICSME), pp 111–120
Zurück zum Zitat Mair S (2002) A balanced scorecard for a small software group. IEEE Softw 19(6):21–27CrossRef Mair S (2002) A balanced scorecard for a small software group. IEEE Softw 19(6):21–27CrossRef
Zurück zum Zitat Marlow J, Dabbish L, Herbsleb J (2013) Impression formation in online peer production: activity traces and personal profiles in github. In: Proceedings of the 2013 conference on computer supported cooperative work. pp 117–128 Marlow J, Dabbish L, Herbsleb J (2013) Impression formation in online peer production: activity traces and personal profiles in github. In: Proceedings of the 2013 conference on computer supported cooperative work. pp 117–128
Zurück zum Zitat Marr B, Neely A (2003) Automating the balanced scorecard–selection criteria to identify appropriate software applications. Measuring Business Excellence Marr B, Neely A (2003) Automating the balanced scorecard–selection criteria to identify appropriate software applications. Measuring Business Excellence
Zurück zum Zitat McCarney R, Warner J, Iliffe S, Van Haselen R, Griffin M, Fisher P (2007) The hawthorne effect: a randomised, controlled trial. BMC Med Res Methodol 7(1):30CrossRef McCarney R, Warner J, Iliffe S, Van Haselen R, Griffin M, Fisher P (2007) The hawthorne effect: a randomised, controlled trial. BMC Med Res Methodol 7(1):30CrossRef
Zurück zum Zitat McIntosh S, Kamei Y, Adams B, Hassan AE (2014) The impact of code review coverage and code review participation on software quality: A case study of the qt, vtk, and itk projects. In: Proceedings of the 11th working conference on mining software repositories. pp 192–201 McIntosh S, Kamei Y, Adams B, Hassan AE (2014) The impact of code review coverage and code review participation on software quality: A case study of the qt, vtk, and itk projects. In: Proceedings of the 11th working conference on mining software repositories. pp 192–201
Zurück zum Zitat Mockus A, Fielding RT, Herbsleb J (2000) A case study of open source software development: the apache server. In: Proceedings of the 22nd international conference on Software engineering. ACM, pp 263–272 Mockus A, Fielding RT, Herbsleb J (2000) A case study of open source software development: the apache server. In: Proceedings of the 22nd international conference on Software engineering. ACM, pp 263–272
Zurück zum Zitat Novielli N, Girardi D, Lanubile F (2018) A benchmark study on sentiment analysis for software engineering research. In: 2018 IEEE/ACM 15th international conference on mining software repositories (MSR). IEEE, pp 364–375 Novielli N, Girardi D, Lanubile F (2018) A benchmark study on sentiment analysis for software engineering research. In: 2018 IEEE/ACM 15th international conference on mining software repositories (MSR). IEEE, pp 364–375
Zurück zum Zitat Papalexandris A, Ioannou G, Prastacos GP (2004) Implementing the balanced scorecard in greece: a software firm’s experience. Long Range Plann 37 (4):351–366CrossRef Papalexandris A, Ioannou G, Prastacos GP (2004) Implementing the balanced scorecard in greece: a software firm’s experience. Long Range Plann 37 (4):351–366CrossRef
Zurück zum Zitat Rahman MM, Roy CK, Collins JA (2016) Correct: code reviewer recommendation in github based on cross-project and technology experience. In: Proceedings of the 38th international conference on software engineering companion. pp 222–231 Rahman MM, Roy CK, Collins JA (2016) Correct: code reviewer recommendation in github based on cross-project and technology experience. In: Proceedings of the 38th international conference on software engineering companion. pp 222–231
Zurück zum Zitat Rahman MM, Roy CK, Kula RG (2017) Predicting usefulness of code review comments using textual features and developer experience. In: Proceedings of the 14th international conference on mining software repositories, MSR ’17. IEEE Press, pp 215–226 Rahman MM, Roy CK, Kula RG (2017) Predicting usefulness of code review comments using textual features and developer experience. In: Proceedings of the 14th international conference on mining software repositories, MSR ’17. IEEE Press, pp 215–226
Zurück zum Zitat Rigby PC, Bird C (2013) Convergent contemporary software peer review practices. In: Proceedings of the 2013 9th joint meeting on foundations of software engineering. ACM, pp 202–212 Rigby PC, Bird C (2013) Convergent contemporary software peer review practices. In: Proceedings of the 2013 9th joint meeting on foundations of software engineering. ACM, pp 202–212
Zurück zum Zitat Rigby PC, German DM (2006) A preliminary examination of code review processes in open source projects. Tech. rep., Technical Report DCS-305-IR University of Victoria Rigby PC, German DM (2006) A preliminary examination of code review processes in open source projects. Tech. rep., Technical Report DCS-305-IR University of Victoria
Zurück zum Zitat Sadowski C, Söderberg E, Church L, Sipko M, Bacchelli A (2018) Modern code review: a case study at google. In: Proceedings of the 40th international conference on software engineering: software engineering in practice. ACM, pp 181–190 Sadowski C, Söderberg E, Church L, Sipko M, Bacchelli A (2018) Modern code review: a case study at google. In: Proceedings of the 40th international conference on software engineering: software engineering in practice. ACM, pp 181–190
Zurück zum Zitat Shull FJ, Carver JC, Vegas S, Juristo N (2008) The role of replications in empirical software engineering. Empir Soft Eng 13(2):211–218CrossRef Shull FJ, Carver JC, Vegas S, Juristo N (2008) The role of replications in empirical software engineering. Empir Soft Eng 13(2):211–218CrossRef
Zurück zum Zitat Suykens JA, Vandewalle J (1999) Least squares support vector machine classifiers. Neural Process Lett 9(3):293–300CrossRef Suykens JA, Vandewalle J (1999) Least squares support vector machine classifiers. Neural Process Lett 9(3):293–300CrossRef
Zurück zum Zitat Thongtanunam P, McIntosh S, Hassan AE, Iida H (2017) Review participation in modern code review. Empir Softw Eng 22(2):768–817CrossRef Thongtanunam P, McIntosh S, Hassan AE, Iida H (2017) Review participation in modern code review. Empir Softw Eng 22(2):768–817CrossRef
Zurück zum Zitat Thongtanunam P, Tantithamthavorn C, Kula RG, Yoshida N, Iida H, Matsumoto K (2015) Who should review my code? a file location-based code-reviewer recommendation approach for modern code review. In: 2015 IEEE 22nd international conference on software analysis, evolution, and reengineering (SANER).IEEE, pp 141–150 Thongtanunam P, Tantithamthavorn C, Kula RG, Yoshida N, Iida H, Matsumoto K (2015) Who should review my code? a file location-based code-reviewer recommendation approach for modern code review. In: 2015 IEEE 22nd international conference on software analysis, evolution, and reengineering (SANER).IEEE, pp 141–150
Zurück zum Zitat Toloşi L, Lengauer T (2011) Classification with correlated features: unreliability of feature ranking and solutions. Bioinformatics 27(14):1986–1994CrossRef Toloşi L, Lengauer T (2011) Classification with correlated features: unreliability of feature ranking and solutions. Bioinformatics 27(14):1986–1994CrossRef
Metadaten
Titel
Using a balanced scorecard to identify opportunities to improve code review effectiveness: an industrial experience report
verfasst von
Masum Hasan
Anindya Iqbal
Mohammad Rafid Ul Islam
A.J.M. Imtiajur Rahman
Amiangshu Bosu
Publikationsdatum
01.11.2021
Verlag
Springer US
Erschienen in
Empirical Software Engineering / Ausgabe 6/2021
Print ISSN: 1382-3256
Elektronische ISSN: 1573-7616
DOI
https://doi.org/10.1007/s10664-021-10038-w

Weitere Artikel der Ausgabe 6/2021

Empirical Software Engineering 6/2021 Zur Ausgabe

Premium Partner