1 Introduction
Hence, while the existing categorizations contribute to the understanding of the SATD phenomenon, we think that a different categorization is required as a basis for the design of tools that can help support SATD resolution. By providing a more fine-grained classification of the problems experienced by contributors we expect that more actionable insights can be obtained from SATD. Thus, we ask the following research question:doesn’t work: Depending on the compression engine used, compressed bytes may differ. False errors would be reported.assertTrue(‘‘File content mismatch'', FILE_UTILS.contentEquals(. . .)));
.
RQ1:What kind of problems do SATD annotations describe?
Several authors hypothesize that the expression of negative sentiment may be a proxy for the priority of a problem to be solved (Gachechiladze et al. 2017; Uddin and Khomh 2017; Lin et al. 2019). In other fields, such as marketing, negative sentiment has a clear meaning. For instance, customers give greater weight to negative information (Wright 1974), and negative reviews are more useful to customers’ decisions than positive ones (Casaló et al. 2015; Sparks and Browning 2011). However, to the best of our knowledge, nobody has studied how priority is expressed in different kinds of software development issues—and in particular TD-related issues—and whether developers use negative sentiment to indicate priority. This leads us to address the following research question:// Yow...this is still ugly
RQ2:How do developers annotate SATD that they believe requires extra priority?
RQ3:Do developers believe that the expression of negative sentiment in SATD is an acceptable practice?
RQ4:How does the occurrence of negative sentiment vary across different kinds of SATD annotations?
RQ5:To what extent do SATD annotations belonging to different categories contain additional details?
-
We seek a better understanding of the TD annotation practices of open-source developers, and to that end we design and discuss a survey in which we ask open-source developers to (i) provide us with insights about their TD annotation practices, and (ii) draft SATD comments for five different scenarios;
-
We add two new research questions: RQ2 and RQ3, which we address considering the results of our survey;
-
We extend two existing research questions (RQ4 and RQ5) with the results of the survey.
2 Study Design
2.1 Addressing RQ1: SATD Content Coding
2.1.1 Dataset
SATD type | Initial dataset | Without duplication | Sampled |
---|---|---|---|
Defect | 472 | 350 | 116 (11%) |
Design | 2703 | 2260 | 657 (63%) |
Documentation | 54 | 49 | 39 (4%) |
Implementation | 757 | 550 | 183 (18%) |
Test | 85 | 80 | 43 (4%) |
Total | 4071 | 3289 | 1038 |
2.1.2 Data Analysis
2.2 Addressing RQ2 and RQ3
Question | Response type |
---|---|
When writing source code, how often do you write source code | Never, Rarely (Less than once a |
comments indicating delayed or intended work activities such as | month), Sometimes (Monthly), |
TODO, FIXME, hack, workaround, etc.? | Often (Weekly), Very often (Daily) |
When authoring comments that describe a problem, how often do | Never, Rarely (Less than once |
you write negative source-code comments indicating delayed or | a month), Sometimes (Monthly), |
intended work activities such as TODO, FIXME, hack, workaround, | Often (Weekly), Very often (Daily) |
etc.? | |
How often do you come across negative source-code comments | Never, Rarely (Less than once a |
indicating delayed or intended work activities such as TODO, | month), Sometimes (Monthly), |
FIXME, hack, workaround, etc.? | Often (Weekly), Very often (Daily) |
Suppose you believe that an issue requires extra priority, how would | Open-text |
you usually indicate this in a comment indicating delayed or ? | |
intended work activities such as TODO, FIXME, hack, workaround, | |
etc. | |
While writing a comment describing an issue in the source-code, I | Strongly disagree, Disagree, |
am more likely to write negative comments for issues that I believe | Neutral, Agree, Strongly agree |
are more important. | |
Writing negative comments to assign extra priority to issues in the | Strongly disagree, Disagree, |
source-code is an acceptable practice. | Neutral, Agree, Strongly agree |
Whenever I come across a source-code comment describing a | Strongly disagree, Disagree, |
problem that is particularly negative, I interpret this as a more | Neutral, Agree, Strongly agree |
important issue than a source-code comment describing a problem | |
that is more neutral. |
2.3 Addressing RQ4
SATD category | Vignette |
---|---|
Functional issue | You are working on an open-source mail client and you are working on a new |
feature. You observe that the auto-completion of e-mail addresses is broken: It | |
should complete addresses using e-mail addresses from the address book and e-mail | |
addresses used recently. However, it only uses addresses from the address book | |
for the auto-complete. You do not have time to fix this immediately. | |
Partially/not | You are working on an open-source mail client. You observe that one method is |
impl. func. | not yet finished: If the method detects invalid input it should raise a dialog window, |
and this is not currently implemented. You do not have time to fix this immediately. | |
Poor impl. choice | You are working on an open-source diagramming application. You observe that a |
code fragment is copied over and over again. You do not have time to refactor this | |
immediately. | |
Documentation | You are working on an open-source diagramming application. You observe that |
there is a method without any documentation, in violation of the agreed upon | |
coding guidelines. You do not have the time to read the method and write | |
the documentation yourself. | |
Wait | While working on an open-source Java GUI application and you are implementing |
a new feature, however, to implement this feature you are dependent on | |
an external API that is not yet available. |
2.3.1 Sentiment Labeling of SATD
-
negative: the comment expresses negative sentiment about the underlying source-code (e.g., “this method is a nightmare”); specifically, we considered the following factors: terms highlighting urgency (like the presence of terms such as “asap” and “urgent”), the presence of multiple exclamation and question marks, as well as, the presence of some keywords being reported in upper case such as the term NOT in the comment: “// the plot field is NOT tested”;
-
non-negative: the comment expresses either positive or no sentiment about the code referenced in the comment (e.g., “TODO: Why is this a special case?”);
-
mixed: the comment expresses both positive and negative sentiment (e.g., “This is a fairly specific hack for empty string, but it does the job”).
2.3.2 Survey
2.4 Addressing RQ5: Identifying Additional Details in SATD
-
for class names, we search for all possible class names of a project, obtained from its git repository (all file versions from all branches), onto comments, using a case insensitive, word boundary match, and for methods references we use a simple regular expression (“∖∖w + ∖∖(”, matching all words that contain one or more alphanumeric characters followed directly by an opening parenthesis);
-
for bug references, we use the Fischer et al. approach (Fischer et al. 2003), e.g., matching JIRA-style references (e.g., “jruby-1234”) or GitHub-style reference (e.g., “#1234”);
-
for URLs we match the following two regular expressions onto the SATD comments, i.e.,
http://
andhttps://
.
2.5 Survey Preparation and Sampling
-
We sent out emails to the mailing lists of open-source software projects. The list of projects is identical to the list that was used for the study of Zampetti et al. (2021). This list also includes the mailing lists of five out of ten projects from the Maldonado et al. dataset (i.e., the ones for which we were able to access the mailing list) we used for the other part of the study. We did not limit survey participation to the projects from the Maldonado dataset to ensure larger participation in the survey. In total, we invited the developers of 93 open-source through the respective mailing lists, Discord, Slack, and Google Group channels.
-
We posted the link to the survey to several Facebook and LinkedIn groups, which target open-source developers.
-
We posted the survey on the Twitter accounts of the authors.
-
We asked personal contacts for which we know that they contribute to open-source projects to fill out the survey.
3 Study Results
3.1 Survey Responses
3.2 RQ1: What Kind of Problems do SATD Annotations Describe?
Macro-category | Defect | Design | Doc. | Impl. | Test | Total |
---|---|---|---|---|---|---|
Poor implementation choices | 22 | 361 | 2 | 43 | 1 | 429 |
Partially implemented | 27 | 94 | 5 | 100 | 3 | 229 |
Functional issues | 48 | 68 | 0 | 18 | 1 | 135 |
Wait | 6 | 76 | 0 | 6 | 1 | 89 |
Documentation issues | 0 | 19 | 30 | 4 | 1 | 54 |
Testing issues | 0 | 1 | 0 | 0 | 35 | 36 |
Misalignment | 1 | 13 | 2 | 5 | 0 | 21 |
SATD comments outdated | 2 | 1 | 0 | 0 | 0 | 3 |
Deployment issues | 1 | 0 | 0 | 1 | 0 | 2 |
False positive | 9 | 24 | 0 | 6 | 1 | 40 |
Total | 116 | 657 | 39 | 183 | 43 | 1038 |
substr()
above and below with more efficient method” in jmeter indicates performance issues.NativeException
is expected to be used from Ruby code, it should provide a real allocator to be used. Otherwise Class.new
will fail, as will marshaling. JRUBY-415” in jruby. 11 SATD comments, instead, indicate the presence of misbehavior that is acceptable even though a better solution must be found, i.e., Fix to postpone: e.g., “this will generate false positives but we can live with that” in ant.waitFor()
hangs on some Java implementations” in jEdit, or cases where the actual implementation inherits a bug from an external API being used, e.g., “Workaround for JDK bug 4071281 [...] in JDK 1.2” in jEdit.ComponentMetamodel
is complete and merged” in hibernate. There are also seven SATD comments where developers admit the presence of a TD in the code that cannot be addressed before an issue already opened is not fixed, e.g., “// TODO: This whole block can be deleted when issue 6266 is resolved” in argouml. Differently from the comments belonging to the Fix to Postpone leaf in the Functional issue category where the TD corresponds to the functional issue for which developers do not have to rush to fix them, in this case the functional issue is simply the event developers are waiting for before removing a TD from the code. An interesting phenomenon related to waiting is an SATD comment requiring other SATD comments to be fixed (2), e.g., “TODO: simply remove this override if we fix the above todos” in hibernate. We found four comments in which developers need to wait for a proper API to be found, e.g., “This really should be Long.decode
, but there isn’t one. As a result, hex and octal literals ending in ’l’ or ’L’ don’t work.” in jEdit. Differently from the comments belonging to the Poor API usage leaf under the Poor Implementation Choices category where the TD corresponds to an inappropriate API usage, here we group comments where developers admits the presence of a workaround that must be removed once an appropriate API is found, i.e., the external event developers are waiting for.isAModel(obj)
or isAProfile(obj)
would clarify what is going on here” in argouml.3.3 RQ2: How do Developers Annotate SATD that they Believe Requires Extra Priority?
Card sorting code | Occurrence |
---|---|
Should be discussed elsewhere (issue tracker, code review, mail, PM, backlog, tests) | 19 |
Tag | 14 |
Should not be indicated in the source code (alternative reporting mechanism is not indicated) | 4 |
Rationale | 2 |
Code should report an error | 2 |
Not-ready work should not be merged | 1 |
Tags make the code not ready to merge during code reviews | 1 |
Tag followed by the name of the person who has to address it | 1 |
Tag followed by the bug ID detailing the issue in the issue tracking system | 1 |
Use specific keywords in the comment like issue, ASAP and high-priority | 1 |
3.4 RQ3: Do Developers Believe that the Expression of Negative Sentiment in SATD is an Acceptable Practice?
3.5 RQ4: How does the Occurrence of Negative Sentiment Vary Across Different Kinds of SATD Annotations?
Category | Negative | (%) | Non-negative | Mixed | Total |
---|---|---|---|---|---|
Poor implementation choices | 125 | (29%) | 294 | 7 | 426 |
Partially implemented | 29 | (13%) | 197 | 2 | 228 |
Functional issues | 66 | (49%) | 67 | 2 | 135 |
Wait | 41 | (46%) | 45 | 3 | 89 |
Documentation issues | 18 | (33%) | 36 | 0 | 54 |
Testing issues | 12 | (33%) | 24 | 0 | 36 |
Misalignment | 6 | (29%) | 15 | 0 | 21 |
SATD comments outdated | 1 | (33%) | 2 | 0 | 3 |
Deployment issues | 1 | (50%) | 1 | 0 | 2 |
Total | 299 | (30%) | 681 | 14 | 994 |
Category | Negative | (%) | Non-negative | Mixed | No-comment | Total |
---|---|---|---|---|---|---|
Poor implementation choices | 2 | (7%) | 27 | 0 | 17 | 46 |
Partially implemented | 4 | (11%) | 32 | 0 | 10 | 46 |
Functional issues | 7 | (23%) | 24 | 0 | 15 | 46 |
Wait | 1 | (4%) | 22 | 0 | 23 | 46 |
Documentation issues – A | 2 | (10%) | 18 | 0 | 16 | 36 |
Documentation issues – B | 0 | (0%) | 3 | 0 | 7 | 10 |
Total | 16 | (11%) | 126 | – | Total comments: 142 |
Category 1 | Category 2 | p-value | OR |
---|---|---|---|
Functional issues | Partially implemented | < 0.01 | 6.52 |
Functional issues | Documentation issues | 0.04 | 2.43 |
Functional issues | Poor implementation choices | < 0.01 | 2.30 |
Poor implementation choices | Partially implemented | < 0.01 | 2.85 |
Wait | Partially implemented | < 0.01 | 5.82 |
Wait | Poor implementation choices | 0.02 | 2.05 |
Testing issues | Partially implemented | 0.02 | 3.41 |
Documentation issues | Partially implemented | 0.03 | 2.67 |
3.6 RQ5: To What Extent do SATD Annotations Belonging to Different Categories Contain Additional Details?
Category | Component | Name | Bug id | URL | Date |
---|---|---|---|---|---|
Functional issues | 47 (35%) | 12 (9%) | 11 (8%) | 1 (1%) | 9 (7%) |
Poor implementation choices | 152 (35%) | 48 (11%) | 5 (1%) | 0 | 16 (4%) |
Wait | 21 (24%) | 4 (5%) | 9 (10%) | 2 (2%) | 1 (1%) |
Deployment issues | 0 | 0 | 0 | 0 | 0 |
SATD comments outdated | 1 (33%) | 0 | 1 (33%) | 0 | 0 |
Partially implemented | 50 (22%) | 22 (10%) | 0 | 0 | 1 (< 1%) |
Testing issues | 7 (19%) | 3 (8%) | 0 | 0 | 0 |
Documentation issues | 19 (35%) | 11 (20%) | 0 | 0 | 1 (2%) |
Misalignment | 7 (33%) | 4 (19%) | 0 | 0 | 0 |
Tot. (unique) | 304 (30%) | 104 (10%) | 26 (3%) | 3 (0.3%) | 28 (3%) |
Category | Name | Bug id | Date | Total comments drafted |
---|---|---|---|---|
Functional issues | 2 (6.25%) | 4 (12.50%) | 0 (0.00%) | 31 |
Poor implementation choices | 1 (3.22%) | 5 (16.12%) | 0 (0.00%) | 29 |
Wait | 0 (0.00%) | 3 (12.00%) | 1 (4.00%) | 23 |
Partially implemented | 0 (0.00%) | 4 (11.11%) | 0 (0.00%) | 36 |
Documentation issues – A | 0 (0.00%) | 3 (15.00%) | 0 (0.00%) | 20 |
Documentation issues – B | 0 (0.00%) | 0 (0.00%) | 0 (0.00%) | 3 |
4 Discussion
5 Related Work
5.1 Technical Debt and Self-Admitted Technical Debt
Bavota and Russo (2016) | Our taxonomy | |||
---|---|---|---|---|
1st Level | 2nd Level | 3rd Level | Category | Sub-category |
Code | Low Internal | Poor Impl. Choices | Poor impl. solutions | |
Quality | Poor API usage | |||
Code review needed | ||||
Maintainability issues | ||||
Performance issues | ||||
Usability | ||||
Won’t improve the code | ||||
Partially/Not Impl. Func. | ||||
Workaround | Wait | Temporary patch | ||
Design | Code Smells | Poor Impl. Choices | Maint. Issues | |
Poor Impl. Solutions | ||||
Design Patterns | Poor Impl. Choices | Maintainability Issues | ||
Poor Impl. Sol. | ||||
Doc. | Incons. Comm. | Doc. Issues | Inconsistent Doc. | |
Addressed TD | SATD outdated | |||
Won’t fix | Func. Issues | Fix to postpone | ||
Poor Impl. Choices | Won’t improve the code | |||
Doc. Issues | Won’t modify doc. | |||
Licensing | ||||
Defect | Defects | Known defects to fix | Func. Issues | Bug to fix |
Partially fixed | Func. Issues | Temporary Patch | ||
defects | Partially/Not Impl. Func. | Work under specific cond. | ||
Low Ext. | Poor Impl. Choices | Usability | ||
Qual. | ||||
Test | Testing Issues | Improve tests | ||
Test case bugs | ||||
Disalign. prod/test code | ||||
Req. | Functional | Func. Issues | ||
Improv. to feat. | Partially/Not Impl. Func. | Work under specific cond. | ||
needed | Func. issue elsewhere | |||
Pre-cond. missing | ||||
Post-cond. unchecked | ||||
Incompl. except. handling | ||||
New feat. to be impl. | Partially/Not Impl. Func. | Work under specific cond. | ||
Func. issue elsewhere | ||||
Pre-cond. missing | ||||
Post-cond. unchecked | ||||
Incompl. except. handling | ||||
Non Functional | Performances | Poor Impl. Choices | Performance issues |