15.07.2019  Focus  Ausgabe 8/2020 Open Access
Corereviewer recommendation based on Pull Request topic model and collaborator social network
 Zeitschrift:
 Soft Computing > Ausgabe 8/2020
Wichtige Hinweise
Communicated by B. B. Gupta.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
1 Introduction
GitHub, a popular opensource community (Begel et al.
2013; Liao et al.
2018), has attracted the participations of tens of millions of developers and millions of opensource projects. Pull Request (PR) is a major contributor to external developers of opensource projects (Gousios et al.
2014). When an external developer has implemented some new features or fixed some bugs, he or she can contribute his or her code via submitting a PR. Then, the core developers decide how to deal with the PR after taking all the opinions of reviewers into consideration.
With the rise of big data (Kuang et al.
2018) and service computing (Liao et al.
2019; Li et al.
2018), GitHub is becoming more and more hot and popular. The recent studies showed that the popular projects receive nearly 100 PRs per day from external contributors (Liao et al.
2018). To improve the efficiency of PR reviewing, some researchers (Yu et al.
2014; Balachandran
2013; Thongtanunam et al.
2014) have proposed some proposals to recommend appropriate reviewers for new PR. However, the reviewers they recommend include any developers, whatever the core developer or external developer he or she is. If the recommender is an external developer, he or she just can review the PR, but he or she cannot decide to refuse or merge the PR. The PR still needs waiting for a core developer to do the final decision. The delay might be a long time. However, if the recommender is the core developer, the delay can be reduced greatly.
Anzeige
In this paper, we propose a NTCRA (Social Network and Topic Modelbased CoreReviewer Recommendation Algorithm) algorithm to match the appropriate collaborators as the reviewer for new PR. However, the active reviewers are always recommended frequently in the majority reviewer recommendation methods. To reduce the recommendation frequency of the active collaborators, we calculate the influence of each collaborator by the collaborator–PR network. Then, we use the influence as the weight to calculate the recommendation score between each collaborator and the new PR.
The structure of this paper is organized as follows. We present the existing research results in Sect.
2. Section
3 introduces the algorithm methodology and details of NTCRA. We explain the results of experiment and its validation in Sects.
4 and
5. Section
6 introduces the possible problems and draws conclusions.
2 Related work
2.1 Influential people mining
Influential people mining is a popular issue in the social network. Currently, most of the influence calculation methods are based on PageRank, HITS, and their methods of variation. Hassan Sayyadi (
2009) constructed an authorpaper heterogeneous network, and proposed a method to rank the nodes which combines HITS and PageRank. Hassan Sayyadi used HITS to calculate the importance of paper, and used the PageRank to calculate the importance of author. Thung et al. (
2013) proposed a network of developers and projects, and applied the PageRank algorithm to calculate the weight of the nodes. Fan et al. (
2018) proposed a kind of discovery algorithm based on local core members to solve the problem of community detection in social networks, and get the importance of each node in the process. Yang et al. (
2012) studied the graph structure and random walk model and proposed the SocialRank algorithm to calculate the individual influence. Li et al. (
2015) proposed a measure called CommRank, which calculates the influence of communities in the social network. And based on the CommRank, they improved algorithm to deal with the problem of maximizing influence. We means that most of the methods described above are based on HITS, PageRank or Katz models. These methods have better results than traditional methods based on degree and closeness.
2.2 Reviewer recommendation
As an essential part of GitHub, more and more researchers have focused on improving the efficiency of PR reviewing. Zhang et al. (
2014a,
b) conducted an exploratory study of @mention in PRbased software development, and found that @mention is beneficial to speed up the review of PR. However, due to the large number of people in the opensource community, developers are not able to find the suitable reviewers quickly and accurately. To solve this problem, some researchers have proposed some reviewer recommenders. For example, Balachandran et al. (
2013) developed a Review Bot to find developers who frequently submit code changes as reviewers. Later, Thongtanunam et al. (
2014) proposed a method of using the file path to find reviewers. Yu et al. (
2014) proposed a reviewer recommendation algorithm based on the review network. Lipcak et al. (
2018) conducted big data experiments on various methods above and found that the results of these methods for largescale projects are not very satisfactory, but they have a good effect on mediumscale projects. Xia et al. (
2017) proposed a recommendation algorithm that considers implicit relations and neighborhood models. This method can extract the possible implicit information in the PR comment records, and then are collided and filtered through the neighbor model to obtain the final recommender. Yang et al. (
2018) proposed a twolayer reviewer recommendation model that matches by combining recommendation scores with reviewer types. This method sets up different types of reviewers based on the common recommendation methods, and distinguishes and recommends the reviewers from the perspective of technology and management.
Anzeige
All these recommendation methods include text analysis, deep learning (Chang et al.
2017; Gong et al.
2017; Li et al.
2017), collaborative filtering, and multiple network (Deng et al.
2019), but those existing methods are lack of considerations of the multidimensional features (Zhang et al.
2018) of prediction (Kuang et al.
2018,
2019) and recommendation. Meanwhile, their approaches focus on recommending reviewers for new PRs, but the recommended reviewers include all developers, not only the core developers. In opensource ecosystem like GitHub (Liao et al.
2018,
2019a,
b), if the recommended reviewer is not a core developer, the reviewer cannot merge or refuse the PR when he or she thinks the PR meets or does not meet the demand of the project directly. The PR still needs waiting core developer to do the merging or refusing operation. However, if the reviewer is a core developer, he or she can merge or refuse the PR directly in that case. And all these approaches face the repeating recommendation problems of active reviewers which will bring a heavy workload to the active reviewer. A large amount of research content above indicate that there is still a huge room for improvement of PR review.
3 Methodology
In GitHub, the core developers are the managers of the project. They can submit the revisions to the main repository directly, and they can refuse or merge the PR which the external developers submit. These core developers are called collaborators of the project in GitHub. Usually, a collaborator gets higher expertise of the project than the external developers. Hence, if they are recommended as the reviewers to PRs, he or she would make a decision to merge/refuse the PR quickly. In this case, the collaborators are matched as a reviewer of that PR. As the rule, a PR just can be merged or refused by the collaborators of the project. In other words, if a reviewer is not a collaborator, he/she will not have the permission to merge or refuse the PRs. If the reviewer is a collaborator, he/she can make the final decision immediately after reviewing the PRs.
In order to better introduce our methods and related nouns, we give the following definitions.
Definition
If the reviewer is a collaborator, he/she will be called core reviewer of the PR.
Usually, the title and description of one PR reflect the theme of the PR. We propose a NTCRA algorithm to calculate the relationship between the theme and reviewer (only include collaborators). Generally, a higher influence a person gets, the more possibility he will be a collaborator. That means, he will get more work to do and sometimes the situation will decrease the PR integrating efficiency. To reduce the frequency of active collaborator recommendation, we construct a collaborator–PR heterogeneous network in NTCRA to calculate the influence of each collaborator. In NTCRA, if a developer reviews that PR, we assume that he or she is interested in the theme of that PR. The topics which represent the theme of the PR will be extracted from PRs via LDA model (Blei et al.
2003; Kuang et al.
2018). And the topicdocument distributions will be used to calculate the relationship between topics and collaborators. Meanwhile, to reduce the cost of topicdocument distributions calculation of new PRs, we use the distributions calculation of new PRs, and obtain the topic distributions of new PR from the wordtopic distributions which generated by topic extracted processing previously. According to the influence of collaborator calculated by collaborator–PR network, we can work out at the matching scores for each collaborator from the relationship between topics and collaborators. The collaborator who gets the highest score will be recommended as core reviewer to the PR. The overview of the proposed algorithm NTCRA is shown in Fig.
1. We will expand more details of the NTCRA algorithm in the following sections.
×
3.1 Collaborator–PR network construction
We construct the
Collaborator–
PR Network individually in every project. In a given project, the reviewing relationship between PRs and collaborators is many to many. As shown in Fig.
2, there are many Pull Requests in Project
P. A collaborator can review several PRs, and a PR can be reviewed by several collaborators more than once (posting one comment represents reviewing once). For example,
Collaborator C
_{1} reviewed
Pull Request PR
_{1} W
_{1} times, and reviewed
Pull Request PR
_{2} W
_{2} times.
Pull Request PR
_{2} is reviewed by
C
_{1},
C
_{2},
C
_{3}, respectively.
×
The
Collaborator–
PR network is defined as a heterogeneous network, and it is a weighted undirected graph which includes two types of nodes and two types of edges. The two types of nodes are collaborators and PRs, the two types of edges are review edges and common interest edges, shown as in Fig.
3. If
collaborator c
_{i} reviewed PR
_{j} at least once, there is an edge
w
_{ij} between
c
_{i} and PR
_{j}. The weight
w
_{ij} is defined as the reviewing times of PR
_{j}. If collaborator
c
_{i} and
c
_{j} reviewed a PR together, there is a common interest shared between them. Hence, there is an edge between collaborator
c
_{i} and
c
_{j}, and the weight is defined as the number of PRs reviewed by them together. (All of them have posted comments to the PR.)
×
3.2 Collaborator influence calculation
In order to better analyze the corresponding rules of the influence network, we partition the collaborator–PR network into two networks. The network on the left contains collaborators and PRs, and they are connected with each other through review edges. This network is a bipartite network which can map onto the HITS network; we define collaborators as hub nodes and PRs as authority nodes. Thus, we can use the HITS to transfer authority scores between collaborators and PRs. The network on the right contains the collaborator nodes and common interest edges, so we can use PageRank algorithm to calculate authority scores between collaborators. Figure
4 shows the example of partition.
×
Traditional approaches consider the network structure as nonweighted; the propagation process is in uniform distribution. In fact, the relationship between different collaborators and PRs is different. Hence, the propagation should be different too. In this paper, we propose an asymmetric strategy to pass authority scores. We use propagated matrices to show the propagated process; the propagation matrices are calculated as follows:
$$ Mcp(i,j) = \frac{{Mr(i,j) \times w_{ij} }}{{\sum\nolimits_{i = 1}^{C} {w_{ij} } }} $$
(1)
$$ {\text{M}}pc(i,j) = \frac{{Mr(i,j)^{T} \times w_{ji} }}{{\sum\nolimits_{j = 1}^{PR} {w_{ij} } }} $$
(2)
$$ {\text{Mcc}}(i,j) = \frac{{Mc(i,j) \times e_{ij} }}{{\sum\nolimits_{i = 1}^{C} {e_{ij} } }} $$
(3)
We use vectors to represent the ranking scores, where
R(
C) represents the ranking scores of collaborators and
R(PR) represents the ranking scores of PRs. We use 1/n to initial the PR ranking scores, and use 1/m to initiate the collaborator ranking score, where n and m correspond to the number of PRs and collaborators, respectively. We update the ranking scores by the iteration steps, until the error between two iterations less than the error value set previously. The iteration steps are defined as follows:
where
Mpc, Mcp, Mcc are the propagation matrices,
k is the times of iterations,
α and β are the parameters using to adjust the weight of the two network,
α +
β=1.
$$ R_{k} ({\text{PR}}) = Mcp \times R_{k  1} (C) $$
(4)
$$ R_{k} (C) = \alpha \times Mpc \times R_{k  1} ({\text{PR}}) + \beta \times Mcc \times R_{k  1} (C) $$
(5)
3.3 PR topiccollaborator relation matrix construction
We construct the corresponding
relation matrix individually in every project. In this paper, PRs were extracted by GitHub API, and each PR includes title and descriptions. A vector is used to represent each PR which is described by the probability of topics extracted by LDA and labeled with a set of collaborators who reviewed that PR. For applying the LDA on the corpus, we preprocess the text of each PR. We remove stop words from public data set of Google, and restore the rest of the words to a unified tense. In a PR, the probability of a topic indicates the importance of the topic in the PR. The bigger the probability is, the more importance the topic in the PR is. Meanwhile, the probability also can represent the relationship between topics and reviewers. In GitHub, a developer review a PR based on his interest. Generally, the higher the probability is, the stronger the relationship between topic and reviewer is. We calculate multiple topic probabilities (Bian et al.
2014) with PRs that reviewed by the same collaborator, since each collaborator has reviewed a lot of PRs.
Usually, the importance of the topic is different in different PRs, and the importance of the topic is related to the length of the text of PR. The larger the number of text in PR, the greater the importance of the topic should be. Therefore, we define the topicimportance of PR as follows,
where
t
_{i} represents the
ith topic;
D indicates the PR set;
N
_{d} is the number of words in dth PR;
P(
t
_{i}
d) represents topic distribution for PR extracted by LDA algorithm.
$$ {\text{import}}(t_{i} ) = \frac{{\sum\nolimits_{d}^{D} {N_{d} P(t_{i} d)} }}{{\sum\nolimits_{d}^{D} {N_{d} } }} $$
(6)
However, a collaborator just reviewed some PRs which he or she is interested in. The relationship between a collaborator and topics should be calculated by the part of PRs he or she reviewed. Therefore, we define the relationship between collaborators and topics as follows:
$$ R(c_{i} ,t_{i} ) = \frac{{\sum\nolimits_{d}^{D} {\lambda_{di} N_{d} P(t_{i} d)} }}{{\sum\nolimits_{d}^{D} {\lambda_{di} N_{d} } }} $$
(7)
Given that the number of PRs reviewed by each collaborator is different, the relationship matrix of topics and collaborators will change due to the activity of the collaborators. More active collaborators have higher topic scores, and less active collaborators have less obvious topical characteristics. To solve this problem, we weight topic score of each collaborator. Therefore, the weight is defined as follows:
where
K is the number of topics in PRs, and
t
_{k} represents the
kth topic.
$$ {\text{matrix}}(c_{i} ,t_{j} ) = \frac{{R(c_{i} ,t_{j} )}}{{\sum\nolimits_{k}^{K} {R(c_{i} ,t_{k} )} }} $$
(8)
3.4 Topicdistribution calculation of new PR
Based on the results calculated in the above steps, the final two steps for recommending the appropriate reviewers for the new PR are to calculate the topic distribution of the new PR and recommend the reviewers for new PR based on the topic and influence score.
Considering the characteristics of the LDA method for extracting text topics, we have two different methods for calculating the topic distribution of the new PR. The first method is to put the text of the new PR into the training set to extract the topic probability, and the other method is to calculate the topic probability of the new PR by using the topicword distribution extracted by the training set. We assume to have a training data set with 100 PRs and a test data set with 10 PRs. If we use the first method, then we need to run the LDA method and the recommended algorithm for a total of (100 + 1) * 10 times, since each new PR needs to run the LDA method to extract new topics, and the distribution of reviewers and topics also needs to be updated synchronously. If we use the second method, we only need to run the program 100 + 10 times. Obviously, the first method will greatly increase the system time complexity with the number of new PRs. The second method only needs to run the LDA method to extract topic information once, and the accuracy of the topic is only slightly reduced. Based on the comparisons of the above two methods, we take the second way to calculate the topic distribution of new PR.
The second method calculates the subject probability of the new PR using the topicword distribution generated during the LDA extracting topic process. Therefore,
\( P(t_{i} d) \) is as follows:
where
c(
w,d) indicates the number of
w
th word in
d
th document, and
P(
wt
_{i}) represents the
i
th topicword distribution.
$$ P(t_{i} d) = \frac{{\sum\nolimits_{w \in V} {c(w,d)P(wt_{i} )} }}{{\sum\nolimits_{k}^{K} {\sum\nolimits_{v \in V} {c(v,d)P(vt_{k} )} } }} $$
(9)
3.5 Recommendation reviewer for new PR
Based on the relationship between the topic and the collaborator, the topic probability of the new PR, and the influence score of the collaborator, we can integrate the above steps to recommend the appropriate reviewer for the new PR. If the topic distribution contains multiple maximum values, it means that this PR may be related to multiple collaborators, all of them can be recommended as the reviewers of the PR. As shown in Algorithm 1, NTCRA first obtains the maximum value of the new PR topic. Then, NTCRA matches the collaborator’s topic distribution to get the matching score for the new PR and all collaborators. Finally, NTCRA finds the biggest scorer, which is the best core reviewer for the new PR. In case that there are multiple collaborators getting the maximum score, all of them will be recommended for the PR as the candidates of core reviewers.
×
4 Experiments
4.1 Datasets
We have obtained data from three popular projects from GitHub. The details of the datasets are shown in Table
1. We introduced the PR in the experiment closed on august 1, 2016, and downloaded it via the api provided by GitHub. The api provided by GitHub contains various information data of PR. Usually, we think that the number of developers of a project can roughly reflect the scale of a project. In the following experiments, we have used contributor with similar capabilities to developer as indicator. According to the number of contributors, we selected projects of different sizes as experimental data, as shown in the following Table
1. The scale of a project is defined as follows:
Table 1
Detail of datasets
Project

PR

Collaborator

Contributor


fastlane

2779

17

613

mopidy

588

8

93

coala

952

32

209

Small: The number of contributors less than 100.
Medium base: The number of contributors more than 100, but less than 500.
Large: The number of contributors more than 500.
4.2 Experiments design
We have proposed four key questions to be solved as follows:
Q1
How about the quality of the influence of collaborator calculation method?
Q2
What is the relationship between topic and collaborator? Is the relationship between topic and collaborator many to many or one to many or others?
Q3
What is the performance of the new PR topic probability calculation method? Can the proposed topicdistribution calculation method be implemented smoothly?
Q4
What about the performance of the proposed method? Is the influence of collaborator performs better than the expertise? And how is the performance changed among the number of topics extracted by the topic model?
For
Q1, we will compare the influence rank with the expertise rank of collaborators. The expertise of a collaborator is defined as the number of PRs he or she reviewed. Meanwhile, the influence in PR reviewing will be calculated by the proposed method. Through the comparison, we will get how the influence and expertise distribution look like, and get the one which is more suitable for the recommendation.
For
Q2, we will apply LDA to extract the topics for PRs, and construct the
relation matrix. Then, we will apply punch card to visualize the
relation matrix, and recognize the relation pattern between topics and collaborators.
For
Q3, the divergence of topic distribution by LDA and topicdistribution calculation method of new PRs is measured by Jensen–Shannon divergence (Lin
1991). It is an improved version of the Kullback–Leibler divergence, calculated as
where
p and q is the topic distributions of different texts.
$$ D_{js} (p,q) = \frac{1}{2}\left( {D_{kl} \left( {p,\frac{p + q}{2}} \right) + D_{kl} \left( {q,\frac{p + q}{2}} \right)} \right) $$
(10)
For
Q4, we use the precision and recall to measure the performance of the proposed method. To compare the performance between influence and expertise, we will apply influence and expertise to calculate the precision and recall, respectively. In order to more intuitively and accurately analyze the performance differences between different topics, we chose to experiment with the number of relatively representative topics.
In the topic extraction process, we used JGibbsLDA
^{1} implemented by Gibbs sampling. We used the default hyperparameters beta and alpha, and the iteration parameter was set to 1000. Since the LDA topic extraction method is a probability model, it is possible that the returned results are not exactly the same each time. But the distribution of topics is generally consistent, and we value the relationship between different topics and collaborators, rather than the specific content of each topic. For example, we run the LDA method twice and get two topic distributions. It is very likely that the results of the two executions are different. But we can always find the corresponding relationship in the two result sets. And what we need is the relationship between the collaborators and different topics, so although the results are different in different implementations, there is no significant impact on the final recommendation results.
4.3 Evaluation method
We use accuracy and recall to measure the performance of the proposed algorithm in real projects, which have been widely used in the field of recommended systems. According to our definition of core reviewer in section III, the core reviewer of a PR may be more than one. Hence, the precision and recall should be calculated as
where
act_coreRev is the set of actual core reviewers; The recom
_coreRev is the set of core reviewers of the PR recommended by the proposed method.
$$ {\text{Precision}} = \frac{{{\text{act}}\_{\text{coreRev}} \cap {\text{recom}}\_{\text{coreRev}}}}{{{\text{recom}}\_{\text{coreRev}}}} $$
(11)
$$ {\text{Recall}} = \frac{{{\text{act}}\_{\text{coreRev}} \cap {\text{recom}}\_{\text{coreRev}}}}{{  {\text{act}}\_{\text{coreRev}}}} $$
(12)
4.4 Model complexity
In order to better explain the construction and implementation of the model, we have analyzed the algorithm complexity and time performance of the model. In NTCRA, when we recommend a reviewer for a new PR, we obtain the appropriate reviewer by calculating the product of the reviewer’s topic distribution of the subject probability of the new PR and the influence factor of the reviewer. Therefore, the complexity of the algorithm is mainly affected by the number of reviewers, the number of keywords in the new PR, the number of topics, the number of keywords in the training text, and the number of PR in the training data, which can be expressed as
\( O\left( {wt^{q} r^{r + q} n} \right) \), where w represents the number of keywords in the training text, t represents the number of topics, q represents the number of PRs in the training data, r represents the number of reviewers, and n represents the number of keywords in the new PR. When NTCRA recommends a reviewer for a new PR, the algorithm obtains the appropriate
K reviewers by calculating the match value of each reviewer and the PR. The main time for the program to run is to calculate the matching value of the new PR and each reviewer and its ordering. The average matching time of one reviewer is 36.4 ms (ms).
5 Results
5.1 Collaborator influence
We have constructed the
Collaborator–
PR network and calculated the influence of each collaborator of the project
coala. When calculating the influence of each collaborator, we have tried multiple group values of
α, β. We find that the influence calculation algorithm performs better when setting
α = 0.7,
β = 0.3. As our expectation, the
Collaborator–
PR network should focus on the review network, supplemented by the common interest network. The common interest network plays a role of balance between active collaborators and less active collaborators. Hence, the result is correct. To verify the performance of the influence calculation algorithm, we conducted a comparison experiment between influence and expertise. The expertise is calculated by the number of reviewed PRs. The more PRs who reviewed, the higher expertise who will get. Figure
5 shows the results of comparison between influence and expertise. Compared with the expertise, the influence between collaborators performs more balance than the expertise. As shown in Fig.
5b, we can see that the difference of expertise between active collaborators and less active collaborators is extra huge. The expertise of less active collaborator is covered by the active collaborators entirely. As shown in Fig.
5a, there is still a big gap between active collaborators and less active collaborators. But the gap nerve as big as the expertise, it has been reduced extremely. The reduction in gap between active collaborators and less active collaborators will benefit the corereviewer recommendation very much.
×
5.2 The structure of relation matrix
For
Q2, we constructed a
relationship matrix between collaborators and topics according the proposed method, and showed the results of the relationship matrix on a punch card. As shown in Fig.
6, the punch card shows the
relationship matrix of collaborators and topics in the
coala project, and the size of punch reflects the strength of the relationship between the collaborators and the topics. The larger the relationship value between the collaborator and the topic, the larger the corresponding node on the punch card. For the convenience of observation, we just extract 15 topics. The Yaxis denotes the collaborators, the Xaxis denotes the topics. From Fig.
6, we can find that the most collaborators always relate to several topics mostly. And the closest topics of different collaborators always are different. According to Fig.
5, we can find punch card of the most active and less active collaborator present very similar. All the size of the punches present nearly. That means that they have the same relationship with all topics. In this case, the influence of collaborators will help recognize the different between them.
×
5.3 Topic distributions
For
Q3, we have randomly selected seven PRs, and worked out at the topic distribution by LDA method and the proposed topicdistribution calculation method of new PRs, respectively. To compare the two distributions, we use the stacked histograms to visualize the results. As shown in Fig.
7, the
Xaxis denotes the documents, the
Yaxis denotes the probability of topics. Each document includes two distributions, the distribution on the left side denotes the distribution calculated by LDA, on the right side one denotes the distribution calculated by the proposed method. The probability of topics from bottom to top responds topic0–topic14, respectively. From Fig.
7, we can find the maximum entry of the topic distributions between the two methods that tend to be consistent. Since we recommend core reviewer to the new PR based on the topic with the largest value, differences between topics of lower probability do not affect the final recommendation. Moreover, we have selected 1000 comments to separately calculate the topic distribution obtained by LDA and the proposed method, and used Jensen–Shannon divergence to measure the difference between the two results. Jensen–Shannon divergence has a value range of 0 to 1. The smaller the difference between the two distributions, the smaller the value of the Jensen–Shannon divergence. We calculated the Jensen–Shannon divergence in the above 1000 comment data, the average difference is only 0.050. The Jensen–Shannon divergence close to 0 means that the difference between the two distributions is small. Hence, the results show that the proposed method for calculating the topic distribution is effective and feasible.
×
5.4 Performance
To verify the performance and compare the performance between influence and expertise, we have conducted a comparison experiment between influence and expertise. We apply the proposed method to calculate the performance with influence and expertise on three projects which is shown in Table
1, respectively. Here, the number of topics extracted by LDA is 20. The results are shown in Fig.
8. From the comparison figure, we can find that the influence performs better than the expertise. Even though the recall does not improve in each project, but the precision is improved in each project. Especially in project
coala, the precision improved significantly. We also can find that the recall in project
fastlane and
mopidy is lower than the project
coala. We check the datasets, and find that there always have more than one collaborators to review the same PR, but the proposed method recommends the collaborator who get the highest score as the core reviewer which means there is just one collaborator matched to the PR. So, the recall is just around 50%.
×
To explore the influence of number of topics extracted from PRs to the proposed method, we apply the proposed method extracts topics with
K(number of topics) = 10, 15, 20, 25, 30, 40, and calculate the precision and recall in each setting, respectively. Figure
9 shows the results. As shown in Fig.
9, in the process of increasing the number of topics from 10 to 40, the recommended accuracy and recall curves of the three projects do not change notably. We also can find the average precision of these three projects is greater than 70%.
×
6 Discussion and conclusions
6.1 Discussion
Limitation of NTCRA The recall of the proposed method always gets a low score. The proposed method just recommends the collaborator who gets highest score as the core reviewer. Since the mechanism of the proposed method, the number of collaborators who get the highest score is just one in the most situations. However, in some projects, the PR is reviewed by more than one collaborator. Hence, the recall is always less than 50%.
Meanwhile, if there is not enough history data, the algorithm also will get a poor performance.
Frequent matching of active collaborators From the observation, the phenomenon of frequent matching active collaborators as core reviewer still exists, even though the phenomenon reduces a lot comparing with the matching by expertise.
6.2 Conclusions
In this paper, we have proposed an algorithm that recommends a suitable core reviewer for PR, which combines topic model with social network. The NTCRA algorithm uses the text information in PR and comments to build the relationship between collaborators and topics. Meanwhile, we use the review relation of collaborators to construct a collaborator–PR heterogeneous network. Based on the network, we apply a collaborator influence calculation algorithm to calculate the influence of each collaborator to the PRs. Finally, based on the relationship between topics and collaborators, the topic probability of new PR, and the influence score of collaborators, we can integrate the above steps to recommend the appropriate core reviewers for new PR. We have combined the three real opensource projects (
fastlane, coala and
mopidy) to study the performance of our algorithm, the influence of reviewer, and the topic distribution. After detailed verification, we have the following conclusions:
(1)
The proposed influence calculation algorithm is feasible. The algorithm plays an important role to balance the gap between active collaborators and less active collaborators.
(2)
Each collaborator always relates with some closest topics and for different collaborators, the closest topics are different. The most active collaborators may have same relationship to all topics.
(3)
The maximum topic calculated by the LDA method is basically the same as the proposed topicdistribution method. The Jensen–Shannon divergence between the two results is only 0.05, and the data indicate that their difference is small.
(4)
Overall, the recommended precision of the proposed algorithm is better than 70%, and the change in the number of topics has little effect on the final result.
Even though our algorithm can get an average precision greater than 70%, there still exist a lot of problems. In view of the current low recall rate and fewer core reviewers, we will conduct more detailed analysis and improvement in subsequent studies. We will consider extending the core reviewers to all reviewers in the community, as well as analyzing the reviewer’s characteristics and topic preferences to improve the recall rate of the algorithm.
Compliance with ethical standards
Conflict of interest
The authors declare that there is no conflict of interest regarding the publication of this paper.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (
http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Footnotes