skip to main content
research-article
Open Access

Destructive Criticism in Software Code Review Impacts Inclusion

Published:11 November 2022Publication History
Skip Abstract Section

Abstract

The software industry lacks gender diversity. Recent research has suggested that a toxic working culture is to blame. Studies have found that communications in software repositories directed towards women are more negative in general. In this study, we use a destructive criticism lens to examine gender differences in software code review feedback. Software code review is a practice where code is peer reviewed and negative feedback is often delivered. We explore differences in perceptions, frequency, and impact of destructive criticism across genders. We surveyed 93 software practitioners eliciting perceived reactions to hypothetical scenarios (or vignettes) where participants are asked to imagine receiving either constructive or destructive criticism. In addition, the survey collected general opinions on feedback obtained during software code review as well as the frequency that participants give and receive destructive criticism.

We found that opinions on destructive criticism vary. Women perceive destructive criticism as less appropriate and are less motivated to continue working with the developer after receiving destructive criticism. Destructive criticism is fairly common with more than half of respondents having received nonspecific negative feedback and nearly a quarter having received inconsiderate negative feedback in the past year. Our results suggest that destructive criticism in code review could be a contributing factor to the lack of gender diversity observed in the software industry.

Skip Supplemental Material Section

Supplemental Material

References

  1. Cheryl S Alexander and Henry Jay Becker. 1978. The use of vignettes in survey research. Public opinion quarterly, Vol. 42, 1 (1978), 93--104.Google ScholarGoogle Scholar
  2. Kelly M Allred and Dianne L Chambless. 2018. Racial differences in attributions, perceived criticism, and upset: A study with Black and White community participants. Behavior therapy, Vol. 49, 2 (2018), 273--285.Google ScholarGoogle Scholar
  3. Anonymous. 2014. Leaving Toxic Open Source Communities. https://modelviewculture.com/pieces/leaving-toxic-open-source-communitiesGoogle ScholarGoogle Scholar
  4. Anonymous. 2016. I worked on Facebook's Trending team -- the most toxic work experience of my life. https://www.theguardian.com/technology/2016/may/17/facebook-trending-news-team-curators-toxic-work-environmentGoogle ScholarGoogle Scholar
  5. Neal M Ashkanasy, Charmine EJ H"artel, and Wilfred J Zerbe. 2000. Emotions in the workplace: Research, theory, and practice. Quorum Books/Greenwood Publishing Group.Google ScholarGoogle Scholar
  6. Gordon D. Atlas. 1994. Sensitivity to Criticism: A New Measure of Responses to Everyday Criticisms. Journal of Psychoeducational Assessment, Vol. 12, 3 (1994), 241--253.Google ScholarGoogle ScholarCross RefCross Ref
  7. Alberto Bacchelli and Christian Bird. 2013. Expectations, outcomes, and challenges of modern code review. In in Proc. 2013 35th International Conference on Software Engineering (ICSE). 712--721.Google ScholarGoogle ScholarCross RefCross Ref
  8. Sebastian Baltes and Paul Ralph. 2020. Sampling in software engineering research: A critical review and guidelines. arXiv preprint arXiv:2002.07764 (2020).Google ScholarGoogle Scholar
  9. Lecia Barker, Cynthia Mancha, and Catherine Ashcraft. 2014. What is the impact of gender diversity on technology business performance. Research Summary [Internet] (2014).Google ScholarGoogle Scholar
  10. Robert A Baron. 1988. Negative effects of destructive criticism: Impact on conflict, self-efficacy, and task performance. Journal of Applied Psychology, Vol. 73, 2 (May 1988), 199.Google ScholarGoogle ScholarCross RefCross Ref
  11. Len Bass, Ingo Weber, and Liming Zhu. 2015. DevOps: A software architect's perspective. Addison-Wesley Professional.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Kent Beck, James Grenning, Robert C. Martin, Mike Beedle, Jim Highsmith, Steve Mellor, Arie van Bennekum, Andrew Hunt, Ken Schwaber, Alistair Cockburn, Ron Jeffries, Jeff Sutherland, Ward Cunningham, Jon Kern, Dave Thomas, Martin Fowler, and Brian Marick. 2001. Manifesto for Agile Software Development.Google ScholarGoogle Scholar
  13. Frank D Belschak and Deanne N Den Hartog. 2009. Consequences of positive and negative feedback: The impact on emotions and extra-role behaviors. Applied Psychology, Vol. 58, 2 (Apr 2009), 274--303.Google ScholarGoogle ScholarCross RefCross Ref
  14. Amel Bennaceur, Ampaeli Cano, Lilia Georgieva, Mariam Kiran, Maria Salama, and Poonam Yadav. 2018. Issues in Gender Diversity and Equality in the UK. In in Proc.1st International Workshop on Gender Equality in Software Engineering. 5--9.Google ScholarGoogle Scholar
  15. Kelly Blincoe, Giuseppe Valetto, and Daniela Damian. 2015. Facilitating coordination between software developers: A study and techniques for timely and efficient recommendations. IEEE Transactions on Software Engineering, Vol. 41, 10 (2015), 969--985.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Amiangshu Bosu and Jeffrey C Carver. 2013. Impact of peer code review on peer impression formation: A survey. In 2013 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement. 133--142.Google ScholarGoogle ScholarCross RefCross Ref
  17. Amiangshu Bosu, Jeffrey C Carver, Christian Bird, Jonathan Orbeck, and Christopher Chockley. 2016. Process aspects and social dynamics of contemporary code review: Insights from open source development and industrial practice at microsoft. IEEE Transactions on Software Engineering, Vol. 43, 1 (Jun 2016), 56--75.Google ScholarGoogle Scholar
  18. Dave Bouckenooghe, Usman Raja, and Arif Nazir Butt. 2013. Combined effects of positive and negative affectivity and job satisfaction on job performance and turnover intentions. The Journal of psychology, Vol. 147, 2 (2013), 105--123.Google ScholarGoogle ScholarCross RefCross Ref
  19. Virginia Braun and Victoria Clarke. 2006. Using thematic analysis in psychology. Qualitative research in psychology, Vol. 3, 2 (Jan 2006), 77--101.Google ScholarGoogle Scholar
  20. Margaret Burnett, Anicia Peters, Charles Hill, and Noha Elarief. 2016. Finding gender-inclusiveness software issues with GenderMag: a field investigation. In in Proc. 2016 Conference on Human Factors in Computing Systems. 2586--2598.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Gilad Chen, Stanley M Gully, and Dov Eden. 2001. Validation of a new general self-efficacy scale. Organizational research methods, Vol. 4, 1 (Jan 2001), 62--83.Google ScholarGoogle Scholar
  22. Jithin Cheriyan, Bastin Tony Roy Savarimuthu, and Stephen Cranefield. 2020. Norm violation in online communities--A study of Stack Overflow comments. arXiv preprint arXiv:2004.05589 (Apr 2020).Google ScholarGoogle Scholar
  23. Jacqui Chetty and Glenda Barlow-Jones. 2018. Coding for girls: dismissing the boys club myth. In the 18th International Conference on Information, Communication Technologies in Education (ICICTE 2018).Google ScholarGoogle Scholar
  24. Lee Anna Clark and David Watson. 1988. Mood and the mundane: Relations between daily life events and self-reported mood. Journal of personality and social psychology, Vol. 54, 2 (1988), 296.Google ScholarGoogle ScholarCross RefCross Ref
  25. Jacob Cohen. 2013. Statistical power analysis for the behavioral sciences. Academic press.Google ScholarGoogle Scholar
  26. Katy Cook. 2020. Culture & Environment. In The Psychology of Silicon Valley. Springer, 37--64.Google ScholarGoogle Scholar
  27. Lee J Cronbach. 1951. Coefficient alpha and the internal structure of tests. psychometrika, Vol. 16, 3 (1951), 297--334.Google ScholarGoogle Scholar
  28. Laura Dabbish, Colleen Stuart, Jason Tsay, and Jim Herbsleb. 2012. Social coding in GitHub: transparency and collaboration in an open software repository. In Proceedings of the ACM 2012 conference on computer supported cooperative work. 1277--1286.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Aniruddha Das. 2009. Sexual harassment at work in the United States. Archives of sexual behavior, Vol. 38, 6 (2009), 909--921.Google ScholarGoogle Scholar
  30. Paul A David and Joseph S Shapiro. 2008. Community-based production of open-source software: What do we know about the developers who participate? Information Economics and Policy, Vol. 20, 4 (Dec 2008), 364--398.Google ScholarGoogle ScholarCross RefCross Ref
  31. Munmun De Choudhury and Scott Counts. 2013. Understanding affect in the workplace via social media. In Proceedings of the 2013 conference on Computer supported cooperative work. 303--316.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Giuseppe Destefanis, Marco Ortu, Steve Counsell, Stephen Swift, Michele Marchesi, and Roberto Tonelli. 2016. Software development: do good manners matter? PeerJ Computer Science, Vol. 2 (2016), e73.Google ScholarGoogle ScholarCross RefCross Ref
  33. Nicolas Ducheneaut. 2005. Socialization in an open source software community: A socio-technical analysis. Computer Supported Cooperative Work (CSCW), Vol. 14, 4 (2005), 323--368.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Michelle K Duffy, Daniel C Ganster, and Milan Pagon. 2002. Social undermining in the workplace. Academy of management Journal, Vol. 45, 2 (2002), 331--351.Google ScholarGoogle ScholarCross RefCross Ref
  35. Carolyn D Egelman, Emerson Murphy-Hill, Elizabeth Kammer, Margaret Morrow Hodges, Collin Green, Ciera Jaspan, and James Lin. 2020. Predicting developers' negative feelings about code review. In 2020 IEEE/ACM 42nd International Conference on Software Engineering (ICSE). 174--185.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Staale Einarsen, Helge Hoel, and Guy Notelaers. 2009. Measuring exposure to bullying and harassment at work: Validity, factor structure and psychometric properties of the Negative Acts Questionnaire-Revised. Work & stress, Vol. 23, 1 (2009), 24--44.Google ScholarGoogle Scholar
  37. Ikram El Asri, Noureddine Kerzazi, Gias Uddin, Foutse Khomh, and MA Janati Idrissi. 2019. An empirical study of sentiments in code reviews. Information and Software Technology, Vol. 114 (2019), 37--54.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Ilker Etikan, Sulaiman Abubakar Musa, and Rukayya Sunusi Alkassim. 2016. Comparison of convenience sampling and purposive sampling. American journal of theoretical and applied statistics, Vol. 5, 1 (2016), 1--4.Google ScholarGoogle Scholar
  39. Michael Fagan. 1976. Design and code inspections to reduce errors in program development. Vol. 15. 182--2011.Google ScholarGoogle Scholar
  40. Leon Festinger. 1957. A theory of cognitive dissonance. Vol. 2. Stanford university press.Google ScholarGoogle Scholar
  41. Denae Ford, Reed Milewicz, and Alexander Serebrenik. 2019. How remote work can foster a more inclusive environment for transgender developers. In 2019 IEEE/ACM 2nd International Workshop on Gender Equality in Software Engineering (GE). IEEE, 9--12.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Denae Ford and Chris Parnin. 2015. Exploring causes of frustration for software developers. In 2015 IEEE/ACM 8th International Workshop on Cooperative and Human Aspects of Software Engineering. IEEE, 115--116.Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Daviti Gachechiladze, Filippo Lanubile, Nicole Novielli, and Alexander Serebrenik. 2017. Anger and its direction in collaborative software development. In 2017 IEEE/ACM 39th International Conference on Software Engineering: New Ideas and Emerging Technologies Results Track (ICSE-NIER). IEEE, 11--14.Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Michail N Giannakos, Letizia Jaccheri, and Ioannis Leftheriotis. 2014. Happy girls engaging with technology: Assessing emotions and engagement related to programming activities. In International Conference on Learning and Collaboration Technologies. Springer, 398--409.Google ScholarGoogle ScholarCross RefCross Ref
  45. Daniel Graziotin, Fabian Fagerholm, Xiaofeng Wang, and Pekka Abrahamsson. 2017. Unhappy developers: Bad for themselves, bad for process, and bad for software product. In 2017 IEEE/ACM 39th International Conference on Software Engineering Companion (ICSE-C). 362--364.Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Daniel Graziotin, Fabian Fagerholm, Xiaofeng Wang, and Pekka Abrahamsson. 2018. What happens when software developers are (un) happy. Journal of Systems and Software, Vol. 140 (2018), 32--47.Google ScholarGoogle ScholarCross RefCross Ref
  47. Sanuri Dananja Gunawardena, Peter Devine, Isabelle Beaumont, Lola Garden, Emerson Murphy-Hill, and Kelly Blincoe. 2022. Replication Package for Destructive Criticism in Software Code Review Impacts Inclusion. https://doi.org/10.6084/m9.figshare.14378954.v2Google ScholarGoogle Scholar
  48. Emitza Guzman, David Azócar, and Yang Li. 2014. Sentiment analysis of commit comments in GitHub: an empirical study. In in Proc.11th working conference on mining software repositories. 352--355.Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Daniel Ilgen and Cori Davis. 2000. Bearing bad news: Reactions to negative performance feedback. Applied Psychology, Vol. 49, 3 (Jul 2000), 550--565.Google ScholarGoogle ScholarCross RefCross Ref
  50. Nasif Imtiaz, Justin Middleton, Joymallya Chakraborty, Neill Robson, Gina Bai, and Emerson Murphy-Hill. 2019. Investigating the effects of gender bias on GitHub. In in Proc. 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE). 700--711.Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Nathan Ingraham. 2016. Report: Apple is a sexist, toxic work environment. https://www.engadget.com/2016-09--15-apple-sexist-workplace-reports.htmlGoogle ScholarGoogle Scholar
  52. Md Rakibul Islam and Minhaz F Zibran. 2017. Leveraging automated sentiment analysis in software engineering. In 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR). IEEE, 203--214.Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Matthew Jay. 2019. generalhoslem: Goodness of Fit Tests for Logistic Regression Models. https://CRAN.R-project.org/package=generalhoslem R package version 1.3.4.Google ScholarGoogle Scholar
  54. Avraham N Kluger, Shai Lewinsohn, and John R Aiello. 1994. The influence of feedback on mood: Linear effects on pleasantness and curvilinear effects on arousal. Organizational Behavior and Human Decision Processes, Vol. 60, 2 (1994), 276--299.Google ScholarGoogle ScholarCross RefCross Ref
  55. Richard S Lazarus and Richard S Lazarus. 1991. Emotion and adaptation. Oxford University Press on Demand.Google ScholarGoogle Scholar
  56. Dawn Nafus. 2012. "Patches don't have gender': What is not open in open source software. New Media & Society, Vol. 14, 4 (Jun 2012), 669--683.Google ScholarGoogle ScholarCross RefCross Ref
  57. Marco Ortu, Bram Adams, Giuseppe Destefanis, Parastou Tourani, Michele Marchesi, and Roberto Tonelli. 2015. Are bullies more productive? Empirical study of affectiveness vs. issue fixing time. In in Proc.12th Working Conference on Mining Software Repositories. 303--313.Google ScholarGoogle ScholarCross RefCross Ref
  58. Christian R Østergaard, Bram Timmermans, and Kari Kristinsson. 2011. Does a different view create something new? The effect of employee diversity on innovation. Research Policy, Vol. 40, 3 (Apr 2011), 500--509.Google ScholarGoogle ScholarCross RefCross Ref
  59. Rajshakhar Paul, Amiangshu Bosu, and Kazi Zakia Sultana. 2019. Expressions of sentiments during code reviews: Male vs. female. In 2019 IEEE 26th International Conference on Software Analysis, Evolution and Reengineering (SANER). 26--37.Google ScholarGoogle ScholarCross RefCross Ref
  60. Philip M Podsakoff and Jiing-Lih Farh. 1989. Effects of feedback sign and credibility on goal setting and task performance. Organizational behavior and human decision processes, Vol. 44, 1 (1989), 45--67.Google ScholarGoogle Scholar
  61. Erik Pulkstenis and Timothy J Robinson. 2004. Goodness-of-fit tests for ordinal response regression models. Statistics in medicine, Vol. 23, 6 (2004), 999--1014.Google ScholarGoogle Scholar
  62. Naveen Raman, Minxuan Cao, Yulia Tsvetkov, Christian K"astner, and Bogdan Vasilescu. 2020. Stress and burnout in open source: toward finding, understanding, and mitigating unhealthy interactions. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering: New Ideas and Emerging Results. 57--60.Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Jana L Raver, Jaclyn M Jensen, Junghyun Lee, and Jane O'Reilly. 2012. Destructive criticism revisited: Appraisals, task outcomes, and the moderating role of competitiveness. Applied Psychology, Vol. 61, 2 (2012), 177--203.Google ScholarGoogle ScholarCross RefCross Ref
  64. Neill Robson. 2018. Diversity and decorum in open source communities. In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 986--987.Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. Caitlin Sadowski, Emma Söderberg, Luke Church, Michal Sipko, and Alberto Bacchelli. 2018. Modern code review: a case study at google. In Proceedings of the 40th International Conference on Software Engineering: Software Engineering in Practice. 181--190.Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. S. Sankarram. 2018. Unlearning Toxic Behaviors in a Code Review Culture. https://medium.com/@sandya.sankarram/unlearning-toxic-behaviors-in-a-code-review-culture-b7c295452a3cGoogle ScholarGoogle Scholar
  67. Farhana Sarker, Bogdan Vasilescu, Kelly Blincoe, and Vladimir Filkov. 2019. Socio-technical work-rate increase associates with changes in work patterns in online projects. In 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE). 936--947.Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. Jaydeb Sarker, Asif Kamal Turzo, and Amiangshu Bosu. 2020. A Benchmark Study of the Contemporary Toxicity Detectors on Software Engineering Interactions. arXiv preprint arXiv:2009.09331 (2020).Google ScholarGoogle Scholar
  69. Alexander Serenko and Ofir Turel. 2021. Why are women underrepresented in the American IT industry? The role of explicit and implicit gender identities. Journal of the Association for Information Systems, Vol. 22, 1 (2021), 8.Google ScholarGoogle ScholarCross RefCross Ref
  70. Mojtaba Shahin, Muhammad Ali Babar, and Liming Zhu. 2017. Continuous integration, delivery and deployment: a systematic review on approaches, tools, challenges and practices. IEEE Access, Vol. 5 (2017), 3909--3943.Google ScholarGoogle ScholarCross RefCross Ref
  71. Karina Kohl Silveira, Soraia Musse, Isabel H Manssour, Renata Vieira, and Rafael Prikladnicki. 2019. Confidence in programming skills: gender insights from StackOverflow developers survey. In 2019 IEEE/ACM 41st International Conference on Software Engineering: Companion Proceedings (ICSE-Companion). IEEE, 234--235.Google ScholarGoogle ScholarDigital LibraryDigital Library
  72. Vinayak Sinha, Alina Lazar, and Bonita Sharif. 2016. Analyzing developer sentiment in commit logs. In Proceedings of the 13th international conference on mining software repositories. 520--523.Google ScholarGoogle ScholarDigital LibraryDigital Library
  73. Megan Squire and Rebecca Gazda. 2015. FLOSS as a Source for Profanity and Insults: Collecting the Data. In 2015 48th Hawaii International Conference on System Sciences. IEEE, 5290--5298.Google ScholarGoogle ScholarDigital LibraryDigital Library
  74. R Steyer, P Schwenkmezger, P Notz, and M Eid. 1997. Multidimensional Mood Questionnaire (MDMQ). Göttingen: Hogrefe (1997), 33.Google ScholarGoogle Scholar
  75. Margaret-Anne Storey, Neil A Ernst, Courtney Williams, and Eirini Kalliamvakou. 2020. The who, what, how of software engineering research: a socio-technical framework. Empirical Software Engineering, Vol. 25, 5 (2020), 4097--4129.Google ScholarGoogle ScholarDigital LibraryDigital Library
  76. Josh Terrell, Andrew Kofink, Justin Middleton, Clarissa Rainear, Emerson Murphy-Hill, Chris Parnin, and Jon Stallings. 2017. Gender differences and bias in open source: Pull request acceptance of women versus men. PeerJ Computer Science, Vol. 3 (May 2017), e111.Google ScholarGoogle Scholar
  77. Mike Thelwall, David Wilkinson, and Sukhvinder Uppal. 2010. Data mining emotion in social network communication: Gender differences in MySpace. Journal of the American Society for Information Science and Technology, Vol. 61, 1 (Jan 2010), 190--199.Google ScholarGoogle ScholarCross RefCross Ref
  78. Gregory K Tortoriello and William Hart. 2019. Trait interpersonal vulnerability attenuates beneficial effects of constructive criticism on failure responses. British Journal of Psychology, Vol. 110, 3 (Aug 2019), 594--613.Google ScholarGoogle ScholarCross RefCross Ref
  79. Parastou Tourani, Bram Adams, and Alexander Serebrenik. 2017. Code of conduct in open source projects. In 2017 IEEE 24th international conference on software analysis, evolution and reengineering (SANER). IEEE, 24--33.Google ScholarGoogle ScholarCross RefCross Ref
  80. Parastou Tourani, Yujuan Jiang, and Bram Adams. 2014. Monitoring sentiment in open source mailing lists: exploratory study on the apache ecosystem. In CASCON, Vol. 14. 34--44.Google ScholarGoogle Scholar
  81. Bogdan Vasilescu, Daryl Posnett, Baishakhi Ray, Mark GJ van den Brand, Alexander Serebrenik, Premkumar Devanbu, and Vladimir Filkov. 2015. Gender and tenure diversity in GitHub teams. In in Proc. 33rd annual ACM conference on human factors in computing systems. 3789--3798.Google ScholarGoogle Scholar
  82. W. N. Venables and B. D. Ripley. 2002. Modern Applied Statistics with S fourth ed.). Springer, New York. https://www.stats.ox.ac.uk/pub/MASS4/ ISBN 0--387--95457-0.Google ScholarGoogle Scholar
  83. Timothy D Wilson and Daniel T Gilbert. 2003. Affective forecasting. (2003).Google ScholarGoogle Scholar
  84. Michal R Wrobel. 2013. Emotions in the software development process. In 2013 6th International Conference on Human System Interactions (HSI). IEEE, 518--523.Google ScholarGoogle ScholarCross RefCross Ref
  85. Michal R Wrobel, Olga Springer, and Kelly Blincoe. 2019. Perceptions of Gender Diversity's impact on mood in software development teams. IEEE Software, Vol. 36, 5 (2019), 51--56.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Destructive Criticism in Software Code Review Impacts Inclusion

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader