nach oben

Empirical Software Engineering

Erschienen in:

01.11.2021

Using a balanced scorecard to identify opportunities to improve code review effectiveness: an industrial experience report

verfasst von: Masum Hasan, Anindya Iqbal, Mohammad Rafid Ul Islam, A.J.M. Imtiajur Rahman, Amiangshu Bosu

Erschienen in: Empirical Software Engineering | Ausgabe 6/2021

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Peer code review is a widely adopted software engineering practice to ensure code quality and ensure software reliability in both the commercial and open-source software projects. Due to the large effort overhead associated with practicing code reviews, project managers often wonder, if their code reviews are effective and if there are improvement opportunities in that respect. Since project managers at Samsung Research Bangladesh (SRBD) were also intrigued by these questions, this research developed, deployed, and evaluated a production-ready solution using the Balanced SCorecard (BSC) strategy that SRBD managers can use in their day-to-day management to monitor individual developer’s, a particular project’s or the entire organization’s code review effectiveness. Following the four-step framework of the BSC strategy, we– 1) defined the operation goals of this research, 2) defined a set of metrics to measure the effectiveness of code reviews, 3) developed an automated mechanism to measure those metrics, and 4) developed and evaluated a monitoring application to inform the key stakeholders. Our automated model to identify useful code reviews achieves 7.88% and 14.39% improvement in terms of accuracy and minority class F₁ score respectively over the models proposed in prior studies. It also outperforms human evaluators from SRBD, that the model replaces, by a margin of 25.32% and 23.84% respectively in terms of accuracy and minority class F₁ score. In our post-deployment survey, SRBD developers and managers indicated that they found our solution as useful and it provided them with important insights to help their decision makings.

Vorheriger Artikel Fixing vulnerabilities potentially hinders maintainability

Nächster Artikel A study of how Docker Compose is used to compose multi-component systems

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

We are unable to make the dataset publicly available due to the restrictions imposed by our NDA with SRBD.

On StackOverflow, each accepted answer gets 15 points, upvote gets 10 points, and downvote gets -2 points

The numbers represent the number of interviewees that consider this type of comment as Useful or Not Useful

Point biserial correlation

https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.KFold.html

Numbers in parentheses indicate how many CRA users of our evaluation survey mentioned this particular insight. One user may have mentioned multiple insights.

Ahmed T, Bosu A, Iqbal A, Rahimi S (2017) SentiCR: a customized sentiment analysis tool for code review interactions. In: 32nd IEEE/ACM international conference on automated software engineering (NIER track), ASE ’17

Bacchelli A, Bird C (2013) Expectations, outcomes, and challenges of modern code review. In: Proceedings of the 2013 international conference on software engineering, pp 712–721. IEEE Press

Barnett M, Bird C, Brunet J, Lahiri SK (2015) Helping developers help themselves: Automatic decomposition of code review changesets. In: Proceedings of the 37th international conference on software engineering-volume 1. IEEE Press, pp 134–144

Basili VR, Shull F, Lanubile F (1999) Building knowledge through families of experiments. IEEE Trans Softw Eng 25(4):456–473CrossRef

Beller M, Bacchelli A, Zaidman A, Juergens E (2014) Modern code reviews in open-source projects: Which problems do they fix?. In: Proceedings of the 11th working conference on mining software repositories. pp 202–211

Bosu A, Carver JC (2013) Impact of peer code review on peer impression formation: A survey. In: 2013 ACM / IEEE international symposium on empirical software engineering and measurement. pp 133–142. https://doi.org/10.1109/ESEM.2013.23

Bosu A, Carver JC, Bird C, Orbeck J, Chockley C (2017) Process aspects and social dynamics of contemporary code review: Insights from open source development and industrial practice at microsoft. IEEE Trans Softw Eng 43(1):56–75. https://doi.org/10.1109/TSE.2016.2576451CrossRef

Bosu A, Greiler M, Bird C (2015) Characteristics of useful code reviews: an empirical study at Microsoft. In: 2015 IEEE/ACM 12th working conference on mining software repositories. IEEE, Florence, pp 146–156. https://doi.org/10.1109/MSR.2015.21. http://ieeexplore.ieee.org/document/7180075/

Buitinck L, Louppe G, Blondel M, Pedregosa F, Mueller A, Grisel O, Niculae V, Prettenhofer P, Gramfort A, Grobler J, Layton R, VanderPlas J, Joly A, Holt B, Varoquaux G (2013) API design for machine learning software: experiences from the scikit-learn project. In: ECML PKDD workshop: languages for data mining and machine learning. pp 108–122

Calefato F, Lanubile F, Maiorano F, Novielli N (2018) Sentiment polarity detection for software development. Empir Softw Eng 23(3):1352–1382CrossRef

Camilo F, Meneely A, Nagappan M (2015) Do bugs foreshadow vulnerabilities?: a study of the chromium project. In: Proceedings of the 12th working conference on mining software repositories. IEEE Press, pp 269–279

Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357. https://doi.org/10.1613/jair.953. https://jair.org/index.php/jair/article/view/10302CrossRef

Chen T, Guestrin C (2016) Xgboost: A scalable tree boosting system. In: Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining. ACM, pp 785–794

Chouchen M, Ouni A, Kula RG, Wang D, Thongtanunam P, Mkaouer MW, Matsumoto K (2021) Anti-patterns in modern code review: Symptoms and prevalence. In: 2021 IEEE international conference on software analysis, evolution and reengineering (SANER), pp 531–535, DOI https://doi.org/10.1109/SANER50967.2021.00060, (to appear in print)

Chouchen, M., Ouni A., Mkaouer M.W., Kula R.G., Inoue K. (2021) Whoreview: A multi-objective search-based approach for code reviewers recommendation in modern code review. Appl Soft Comput 100:106908. https://doi.org/10.1016/j.asoc.2020.106908. https://www.sciencedirect.com/science/article/pii/S1568494620308462CrossRef

Cohen J, Brown E, DuRette B, Teleki S (2006) Best kept secrets of peer code review. Smart Bear Somerville

Czerwonka J, Greiler M, Tilford J (2015) Code reviews do not find bugs: how the current code review best practice slows us down. In: Proceedings of the 37th international conference on software engineering-volume 2. IEEE Press, pp 27–28

di Biase M, Bruntink M, Bacchelli A (2016) A security perspective on code review: The case of chromium. In: 2016 IEEE 16th international working conference on source code analysis and manipulation (SCAM). IEEE, pp 21–30

Ebert F, Castor F, Novielli N, Serebrenik A (2019) Confusion in code reviews: reasons, impacts, and coping strategies. In: 2019 IEEE 26th international conference on software analysis, evolution and reengineering (SANER), pp 49–60. https://doi.org/10.1109/SANER.2019.8668024

Fagan ME (1976) Design and code inspections to reduce errors in program development. IBM Syst J 15(3):182–211. https://doi.org/10.1147/sj.153.0182.+CrossRef

Flesch R (2007) Flesch–kincaid readability test. Retrieved October 26, 2007

Flyvbjerg B (2006) Five misunderstandings about case-study research. Qual Inq 12(2):219–245CrossRef

Fracz W, Dajda J (2018) Developers’ game: A preliminary study concerning a tool for automated developers assessment. In: 2018 IEEE international conference on software maintenance and evolution (ICSME). pp 695–699, DOI https://doi.org/10.1109/ICSME.2018.00079, (to appear in print)

Gerrit code review - rest api (2019) https://gerrit-review.googlesource.com/Documentation/rest-api.html. (Accessed on 09/27/2019)

Hatton L (2008) Testing the value of checklists in code inspections. IEEE Softw 25(4):82–88CrossRef

Hirao T, Ihara A, Ueda Y, Phannachitta P, Matsumoto K (2016) The impact of a low level of agreement among reviewers in a code review process. In: IFIP International conference on open source systems. Springer, pp 97–110

Ho TK (1995) Random decision forests. In: Proceedings of 3rd international conference on document analysis and recognition, vol 1. IEEE, pp 278–282

Hofner G, Mani V, Nambiar R, Apte M (2011) Fostering a high-performance culture in offshore software engineering teams using balanced scorecards and project scorecards. In: 2011 IEEE Sixth international conference on global software engineering. IEEE, pp 35–39

Hopfield JJ (1988) Artificial neural networks. IEEE Circ Devices Mag 4(5):3–10CrossRef

Hosmer DW Jr, Lemeshow S, Sturdivant RX (2013) Applied logistic regression, vol 398, Wiley, Hoboken

Huq F, Hasan M, Pantho MAH, Mahbub S, Iqbal A, Ahmed T (2020) Review4repair: Code review aided automaticprogram repairing. arXiv:2010.01544

Islam MR, Zibran MF (2017) Leveraging automated sentiment analysis in software engineering. In: 2017 IEEE/ACM 14th international conference on mining software repositories (MSR). IEEE, pp 203–214

Jin C, De-Lin L, Fen-Xiang M (2009) An improved id3 decision tree algorithm. In: 2009 4th international conference on computer science & education. IEEE, pp 127–130

Kaplan RS, Norton DP et al (1992) The balanced scorecard: measures that drive performance

Khomh F, Dhaliwal T, Zou Y, Adams B (2012) Do faster releases improve software quality?: an empirical case study of mozilla firefox. In: Proceedings of the 9th IEEE working conference on mining software repositories. IEEE Press, pp 179–188

Kononenko O, Baysal O, Godfrey MW (2016) Code review quality: How developers see it. In: Proceedings of the 38th international conference on software engineering, ICSE ’16. https://doi.org/10.1145/2884781.2884840. ACM, New York, pp 1028–1038

Kononenko O, Baysal O, Guerrouj L, Cao Y, Godfrey MW (2015) Investigating code review quality: Do people and participation matter?. In: 2015 IEEE International conference on software maintenance and evolution (ICSME), pp 111–120

Mair S (2002) A balanced scorecard for a small software group. IEEE Softw 19(6):21–27CrossRef

Marlow J, Dabbish L, Herbsleb J (2013) Impression formation in online peer production: activity traces and personal profiles in github. In: Proceedings of the 2013 conference on computer supported cooperative work. pp 117–128

Marr B, Neely A (2003) Automating the balanced scorecard–selection criteria to identify appropriate software applications. Measuring Business Excellence

McCarney R, Warner J, Iliffe S, Van Haselen R, Griffin M, Fisher P (2007) The hawthorne effect: a randomised, controlled trial. BMC Med Res Methodol 7(1):30CrossRef

McIntosh S, Kamei Y, Adams B, Hassan AE (2014) The impact of code review coverage and code review participation on software quality: A case study of the qt, vtk, and itk projects. In: Proceedings of the 11th working conference on mining software repositories. pp 192–201

Mäntylä MV, Lassenius C (2009) What types of defects are really discovered in code reviews? IEEE Trans Softw Eng 35(3):430–448. https://doi.org/10.1109/TSE.2008.71CrossRef

Mockus A, Fielding RT, Herbsleb J (2000) A case study of open source software development: the apache server. In: Proceedings of the 22nd international conference on Software engineering. ACM, pp 263–272

Natural language toolkit — nltk 3.5 documentation (2020) https://www.nltk.org/. (Accessed on 12/06/2020)

Naumenko D (2018) Java diff utils. https://github.com/dnaumenko/java-diff-utils

Novielli N, Girardi D, Lanubile F (2018) A benchmark study on sentiment analysis for software engineering research. In: 2018 IEEE/ACM 15th international conference on mining software repositories (MSR). IEEE, pp 364–375

pandas.qcut — pandas 1.1.5 documentation (2020) https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.qcut.html. (Accessed on 12/24/2020)

Papalexandris A, Ioannou G, Prastacos GP (2004) Implementing the balanced scorecard in greece: a software firm’s experience. Long Range Plann 37 (4):351–366CrossRef

Rahman MM, Roy CK, Collins JA (2016) Correct: code reviewer recommendation in github based on cross-project and technology experience. In: Proceedings of the 38th international conference on software engineering companion. pp 222–231

Rahman MM, Roy CK, Kula RG (2017) Predicting usefulness of code review comments using textual features and developer experience. In: Proceedings of the 14th international conference on mining software repositories, MSR ’17. IEEE Press, pp 215–226

Rigby PC, Bird C (2013) Convergent contemporary software peer review practices. In: Proceedings of the 2013 9th joint meeting on foundations of software engineering. ACM, pp 202–212

Rigby PC, German DM (2006) A preliminary examination of code review processes in open source projects. Tech. rep., Technical Report DCS-305-IR University of Victoria

Sadowski C, Söderberg E, Church L, Sipko M, Bacchelli A (2018) Modern code review: a case study at google. In: Proceedings of the 40th international conference on software engineering: software engineering in practice. ACM, pp 181–190

Salton G, Buckley C (1988) Term-weighting approaches in automatic text retrieval. Inf Process Manage 24(5):513–523. https://doi.org/10.1016/0306-4573(88)90021-0CrossRef

Shull FJ, Carver JC, Vegas S, Juristo N (2008) The role of replications in empirical software engineering. Empir Soft Eng 13(2):211–218CrossRef

StackOverflow (2021) What is reputation? how do i earn (and losek) it?. https://stackoverflow.com/help/whats-reputation

Suykens JA, Vandewalle J (1999) Least squares support vector machine classifiers. Neural Process Lett 9(3):293–300CrossRef

textstat - pypi (2020) https://pypi.org/project/textstat/. (Accessed on 12/06/2020)

Thongtanunam P, Hassan AE (2020) Review dynamics and their impact on software quality. IEEE Trans Softw Eng :1–1. https://doi.org/10.1109/TSE.2020.2964660

Thongtanunam P, McIntosh S, Hassan AE, Iida H (2017) Review participation in modern code review. Empir Softw Eng 22(2):768–817CrossRef

Thongtanunam P, Tantithamthavorn C, Kula RG, Yoshida N, Iida H, Matsumoto K (2015) Who should review my code? a file location-based code-reviewer recommendation approach for modern code review. In: 2015 IEEE 22nd international conference on software analysis, evolution, and reengineering (SANER).IEEE, pp 141–150

Toloşi L, Lengauer T (2011) Classification with correlated features: unreliability of feature ranking and solutions. Bioinformatics 27(14):1986–1994CrossRef

Titel: Using a balanced scorecard to identify opportunities to improve code review effectiveness: an industrial experience report
verfasst von: Masum Hasan
Anindya Iqbal
Mohammad Rafid Ul Islam
A.J.M. Imtiajur Rahman
Amiangshu Bosu
Publikationsdatum: 01.11.2021
Verlag: Springer US
Erschienen in: Empirical Software Engineering / Ausgabe 6/2021
Print ISSN: 1382-3256
Elektronische ISSN: 1573-7616
DOI: https://doi.org/10.1007/s10664-021-10038-w

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Springer Professional "Wirtschaft+Technik"

Weitere Artikel der Ausgabe 6/2021

The forgotten role of search queries in IR-based bug localization: an empirical study

Rotten green tests in Java, Pharo and Python

Why and what happened? Aiding bug comprehension with automated category and causal link identification

An empirical study of IoT topics in IoT developer discussions on Stack Overflow

Understanding developers’ privacy and security mindsets via climate theory

An empirical study of Q&A websites for game developers

Premium Partner