1 Introduction
-
RQ1: To what extent do developers share links in the review discussion? It is not yet known how a project of repositories uses link sharing in their communities. Using a sample of well-known OpenStack and Qt projects, we would like to investigate the trend of link sharing, common domains in the project, and the link target types.
-
RQ2: Does the number of links shared in the review discussion correlate with review time? Prior studies (Baysal et al. 2016; Kononenko et al. 2018) analyzed the impact of technical and non-technical factors on the review process (e.g., review outcome, review time). However, little is known about whether or not the practice of sharing links can be correlated with review time. It is possible that link sharing may shorten the review time as it provides the required information to a review, which might help reviewers to conduct a review faster. To address this RQ, we conduct a statistical analysis using a non-linear regression model to analyze a correlation between link sharing and review time.
-
RQ3: What are the common intentions of links shared in the review discussion? Previous work (Pascarella et al. 2018) has identified different types of information that are needed by reviewers when conducting a review. Yet, little is known to what extent can link sharing meets such information needs. Hence, we aim to investigate the intention behind link sharing in order to better understand the role and usefulness of link sharing during reviews.
2 Motivating Example
3 Case Study Design
3.1 Studied Projects
3.2 Data Preparation
OpenStack | Qt | |
---|---|---|
Studied Period | 11/2011-07/2019 | 05/2011-07/2019 |
# Reviews (#Merged/#Abandoned) | 58,212 (45,439/12,773) | 40,758 (35,284/5,474) |
# Reviewers | 4,568 | 1,123 |
# Reviews with Links | 14,655 (25.2%) | 4,613 (11.3%) |
# Unique Links | 26,746 | 7,518 |
# Total Links | 31,698 | 7,988 |
# Links per Review (1st Qu./Median/3rd Qu.) | 1/1/2 | 1/1/2 |
Percent of Links Shared by Reviewers | 62.3% | 44.0% |
Percent of Links Shared by Authors | 37.6% | 56.0% |
3.3 RQ1 Analysis
-
(I) Representative dataset construction. As the full set of our constructed data is too large to manually examine their link targets, we then draw a statistically representative sample. The required sample size was calculated so that our conclusions about the ratio of links with a specific characteristic would generalize to all links in the same bucket with a confidence level of 95% and a confidence interval of 5.7 The calculation of statistically significant sample sizes based on population size, confidence interval, and confidence level is well established (Krejcie and Morgan 1970). We randomly sample 379 internal links and 327 external links from the unique links of the OpenStack project, and 363 internal links and 309 external links of the Qt project. To remove the threat of links being inaccessible (404), our approach includes verifying each link until our sample size is reached. To do so, we first randomly select 500 internal candidate links and 500 external candidate links for each studied project. Then we automatically verified and filtered out links that are inaccessible. In the end, we filtered out 70 inaccessible internal links and 138 inaccessible external links for the OpenStack project, and 75 inaccessible internal links and 147 inaccessible external links for the Qt project.
-
(II) Manual coding. To classify the type of link targets, we perform two iterations of manual coding. In the first iteration, the first two authors independently coded 50 internal and external links in the sample. The initial codes were based on the coding scheme of Hata et al. (2019) which provides the 14 types of link targets in source code comments. However, we found that their codes did not cover all link targets in our datasets. Hence, the following five codes emerged from our manual analysis in the first iteration:
-
Communication channel: links target for the mailing list, chat room.
-
GitHub activity: links target for pull requests, commits, and issues.
-
Media: links target for pictures and videos.
-
Memo: links target for the personally recorded documentation.
-
Review: links target for the code review.
-
3.4 RQ2 Analysis
Confounding variables | Description |
Add | The number of added lines by a patch. |
Delete | The number of deleted lines by a patch. |
Patch size | The total number of added and deleted lines by a patch. |
Purpose | The purpose of a patch, i.e., bug, document, feature. |
# Files | The number of files what were changed by a patch. |
# Revisions | The number of review iterations. |
Patch author experience | The number of prior patches that were submitted by the patch author. |
# Comments | The number of messages posted in a review discussion by reviewers and the patch authors, excluding messages for change updates and the number of inline comments. |
# Author comments | The number of messages posted in a review discussion by the patch author, excluding messages for change updates and the number of inline comments. |
# Reviewer comments | The number of messages posted in a review discussion by reviewers, excluding messages for change updates and the number of inline comments. |
# Reviewers | The number of developers who posted a comment to a review discussion. |
Link sharing variables | Description |
# External links | The number of external links shared in the general discussion. |
# Internal links | The number of internal links shared in the general discussion. |
# Total links | The number of internal and external links shared in the general discussion. |
skewness
and kurtosis
function of the moments
R package to check whether or not our modeled dataset is skewed. If the distribution of review time is skewed (i.e., p-value < 0.05), similar to prior work (Mcintosh et al. 2016), we use a log transfor mation to lessen the skew in order to better fit the assumption of the OLS technique.redun
function of the rms
R package to detect the redundant variables and remove them from our models.spearman2
function of the rms
R package to calculate the Spearman multiple ρ2 between the explanatory and response variables. The larger Spearman multiple ρ2 denotes to the higher potential of sharing a nonlinear relationship. Thus, variables with larger ρ2 values are allocated more degrees of freedom than variables with smaller ρ2 values. To avoid the over-fitting issue, we only allocate three to five degrees of freedom to those variables with high ρ2 values and allocate one degree of freedom (i.e., a linear relationship) to variables with low ρ2 values.rcs
function of the rms
R package to assign the allocated degress of freedom to each explanatory variable. Then, we use the ols
function of the rms
R package to construct the model.anova
function of the rms
R package to report both the Wald χ2 value and its corresponding p-value.Predict
function of the rms
package to plot the estimated review time while varying the value of a particular explanatory variable and hold the other explanatory variables at their median values.3.5 RQ3 Analysis
Category | Description | Taxonomy of information needs (Pascarella et al. 2018) |
---|---|---|
Providing Context | The link is shared to provide the additional information related to the implementation. | Context–Reviewers ask about the infor mation aimed at clarifying the context of a given implementation |
Elaborating | The link is shared to complete the infor mation or references related to the patch. | Rationale–Reviewers ask questions to get a rationale why the patch was implemented in a certain way. |
Clarifying | The link is shared to clarify some doubts about the review process or to correct the reviewer’s understanding of the patch. | Correct Understanding–Reviewers ask questions to confirm the reviewer’s interpretation/understanding or to cla rify doubts. |
Explaining Necessity | The link is shared to inform more suitable solutions or explain the reasons why the patch is no longer needed. | Necessity–Reviewers need to know whether the patch (or a part of it) is necessary. |
Proposing Improvement | The link is shared to point out an alterna tive solution or suggestion improvement. | Suitability of An Alternative Solution– Reviewers pose a question to discuss options and alternative solutions to the implementation of the patch. |
Suggesting Experts | The link is shared to point out to an expert (other developers) who should be involved. | Specialized Expertise–Reviewers ask other reviewers to contribute with their specialized expertise. |
Informing Splitted Patches | The link is shared to inform that the patch has been splitted. | Splittable–Reviewers ask questions to seek the possibility of splitting the patch into multiple, separated patches. |
4 Case Study Results
4.1 RQ1: To what extent do developers share links in the review discussion?
Internal | External | |||
---|---|---|---|---|
OpenStack | Qt | OpenStack | Qt | |
Top 1 | review.openstack.org | codereview.qt-project.org | github.com | paste.kde.org |
(51%) | (79%) | (15%) | (14%) | |
Top 2 | github.com/openstack | bugreports.qt.io | docs.python.org | github.com |
(13%) | (6%) | (4%) | (6%) | |
Top 3 | bugs.launchpad.net | testresults.qt.io | gist.github.com | msdn.microsoft.com |
(7%) | (4%) | (4%) | (5%) | |
Top 4 | logs.openstack.org | doc.qt.io | bugzilla.redhat.com | pastebin.kde.org |
(6%) | (2%) | (3%) | (3%) | |
Top 5 | wiki.openstack.org | wiki.qt.io | stackoverflow.com | gcc.gnu.org |
(4%) | (2%) | (2%) | (3%) |
Internal | External | |||
---|---|---|---|---|
OpenStack | Qt | OpenStack | Qt | |
Licence | – | – | 0.3% | – |
Software homepage | 1.1% | 0.6% | 9.2% | 3.6% |
Specification | 2.6% | – | 2.8% | 0.6% |
Organization homepage | – | – | 0.6% | 0.3% |
Tutorial or article | 7.1% | 5.8% | 18.7% | 14.6% |
API documentation | – | 2.5% | 15.3% | 16.5% |
Blog post | – | – | 2.8% | 1.9% |
Bug report | 9.2% | 9.9% | 8.3% | 8.4% |
Research paper | – | – | 0.3% | – |
Code | 13.5% | 4.1% | 13.5% | 10.4% |
Forum thread | – | 0.8% | 0.3% | 0.6% |
Book content | – | – | 0.6% | 1.0% |
Q&A thread | – | – | 1.2% | 0.6% |
Stack Overflow | – | – | 3.1% | 1.9% |
Communication channel | 2.6% | 0.3% | 4.9% | 3.6% |
GitHub activity | 0.3% | – | 4.9% | 3.6% |
Media | – | – | 2.1% | 5.5% |
Memo | 5.8% | 1.1% | 5.8% | 6.5% |
Review | 55.9% | 73.6% | – | 0.3% |
Others | 1.8% | 1.4% | 5.5% | 20.1% |
4.2 RQ2: Does the number of links shared in the review discussion correlate with review time?
OpenStack | Qt | ||||
Adjusted R2 | 0.3737 | 0.4580 | |||
Optimism-reduced adjusted R2 | 0.3733 | 0.4573 | |||
Overall Wald χ2 | 34,649 | 34,358 | |||
Budgeted Degrees of Freedom | 3,873 | 2,711 | |||
Spent Degrees of Freedom | 24 | 24 | |||
Confounding variables | Overall | Nonlinear | Overall | Nonlinear | |
Patch size | D.F. | ‡ | ‡ | ||
χ2 | |||||
Add | D.F. | 2 | 1 | 2 | 1 |
χ2 | 154∗∗∗ | 153∗∗∗ | 337∗∗∗ | 337∗∗∗ | |
Delete | D.F. | 1 | – | 1 | – |
χ2 | 1o | 0.53o | |||
Purpose | D.F. | 2 | – | 2 | – |
χ2 | 276∗∗∗ | 131∗∗∗ | |||
# Files | D.F. | 2 | – | 1 | – |
χ2 | 31∗∗∗ | 0.34o | |||
Patch author Exp. | D.F. | 1 | – | 1 | – |
χ2 | 220∗∗∗ | 98∗∗∗ | |||
# Comments | D.F. | ‡ | ‡ | ||
χ2 | |||||
# Author comments | D.F. | 3 | 2 | 3 | 2 |
χ2 | 1936∗∗∗ | 1679∗∗∗ | 2216∗∗∗ | 1549∗∗∗ | |
# Reviewer comments | D.F. | 4 | 3 | 3 | 2 |
χ2 | 1360∗∗∗ | 1066∗∗∗ | 4908∗∗∗ | 4036∗∗∗ | |
# Reviewers | D.F. | ‡ | ‡ | ||
χ2 | |||||
# Revisions | D.F. | 3 | 2 | 3 | 2 |
χ2 | 3237∗∗∗ | 1847∗∗∗ | 2038∗∗∗ | 1687∗∗∗ | |
Link sharing variables | Overall | Nonlinear | Overall | Nonlinear | |
# External links | D.F. | 1 | – | 1 | – |
χ2 | 3o | 0.22o | |||
# Internal links | D.F. | 1 | – | 1 | – |
χ2 | 119∗∗∗ | 78∗∗∗ | |||
# Total links | D.F. | ‡ | ‡ | ||
χ2 |
4.3 RQ3: What are the common intentions of links shared in the review discussion?
Intention | OpenStack | Qt |
---|---|---|
Providing Context | 40% | 40% |
Explaining Necessity | 23% | 18% |
Elaborating | 19% | 16% |
5 Discussions
5.1 Developer Feedback
1 | 2 | 3 | 4 | 5 | |
---|---|---|---|---|---|
Finding of RQ1–“Developers often share internal links to reference reviews, bug reports and source code, while external links often reference tutorials and API documentation.” | 0 | 3 | 2 | 31 | 17 |
Finding of RQ2–“A review that has an internal link shared during its review discussion is likely to take reviewing time longer than other reviews.” | 3 | 24 | 15 | 11 | 0 |
Intention | Respondent | |
---|---|---|
Count | ||
Providing Context | 48 | |
Explaining Necessity | 40 | |
Elaborating | 35 | |
Clarifying | 33 | |
Proposing Improvement | 26 | |
Informing Splitted Patches | 17 | |
Suggesting Experts | 6 | |
Others | 3 |