Introduction
Company-owned OSS projects: This term refers to software companies that started and curated OSS projects in a private environment, but later on decided to open-source them. Therefore, the OSS project that was previously restricted to the company’s employees could now potentially receive contributions from contributors that are not anyhow affiliated with the given company. |
atom
, electron
, hubot
, git-lfs
, and linguist
. We chose these projects because they were initially developed by (and are maintained at) GitHub; therefore, we could take advantage of GitHub features to understand whether a contributor is an internal or external one (more details at the “Method” section). Through a set of quantitative and qualitative analysis, this paper makes the following contributions:
-
We provide evidence that there is a workforce of developers who are external to the company who opened the code contributing to the project, creating a community that extends the boundaries of the company. The number of external developers can be up to 32 × greater than internal ones.
-
We show that, although the external community is engaging, external members face a hard time to get a contribution accepted. In 4 out of the 5 studied projects, most of the rejected pull requests were made by external developers. In terms of time taken to process a pull request, on average, externals take 11.37 days to be processed. Internals, on the other hand, take 2.61 days.
-
We find that internal developers still play a crucial role in the project, playing the integrator role in two of the analyzed projects. However, external members are also acquiring this role. In project
hubot
, for instance, ∼ 80% of the team of integrators is composed by external developers.
Method
Studied projects
-
atom
, a cross-platform text editor. It has ∼ 34,300 commits, ∼ 3750 pull requests, 400 contributors, ∼ 43,000 stars, and ∼ 8400 forks. It is mostly written in JavaScript and CoffeeScript and has ∼ 7 years of historical records. GitHub started its development in 20114 and open-sourced it in May 20145. -
electron
, a tool to build cross-platform desktop apps with JavaScript, HTML, and CSS. It has ∼ 18,000 commits, ∼ 3800 pull requests, 721 contributors, ∼ 56,000 stars, and ∼ 7200 forks. It is mostly written in C++ and has ∼ 5 years of historical records. GitHub started its development in March 20136 and open-sourced it in October 20157. -
hubot
, a customizable life embetterment robot. It has ∼ 2000 commits, ∼ 700 pull requests, 253 contributors, ∼ 13,700 stars, and ∼ 3200 forks. It is mostly written in JavaScript and has ∼ 7 years of historical records. GitHub started its development in August 20118 and open-sourced it in October 20119. -
git-lfs
, a git extension for versioning large files. It has ∼ 6300 commits, ∼ 1300 pull requests, 99 contributors, ∼ 5300 stars, and ∼ 900 forks. It is mostly written in Go and has ∼ 5 years of historical records. GitHub started its development in September 201310 and open-sourced it on April 201511. -
linguist
, a library to detect blob languages. It has 5600 commits, ∼ 2400 pull requests, 684 source code contributors, ∼ 5400 stars, and ∼ 2000 forks. It is mostly written in Ruby and has ∼ 7 years of historical records. GitHub started its development in May 201112 and open-sourced it in October 201513.
linguist
started as a stand-alone software project. linguist
, on the other hand, started as a unification of code scattered around the whole software system. Such a pattern of open-sourcing software projects was already reported elsewhere [4].Overall approach
Pull request collection
-
open: waiting for code reviews and/or a final decision.
-
closed: the code reviews were done, but the pull request was not accepted (the status in GitHub is closed/unmerged).
-
merged: the code reviews were done, and the pull request was accepted (the status in GitHub is closed/merged).
-
The time taken to process a pull request
-
The number of comments during the code reviews per pull request
-
The number of commits per pull request
-
The number of changes (e.g., additions/deletions) per pull request
Internal and external classification
site_admin
flag true for another user. If enabled, this flag promotes an ordinary user to be a site administrator. According to GitHub official documentation, a site administrator can “manage high-level application and VM settings, all users and organization account settings, and repository data.14” Therefore, for each pull request investigated, we verified whether the author has the site_admin
flag enabled. If so, we marked she as internal; external otherwise.site_admin
flag enabled), we analyzed the public profiles (e.g., GitHub affiliation, LinkedIn information, personal web page, among other sources) of the top 10 contributors (either internal or external). From the 48 profiles analyzed (2 members appeared in 2 different projects), we found 12 that worked for GitHub previously, but were not categorized as staff members. We manually identified these users as internal developers for our analysis. This misidentification is a potential threat and is further described in the “Limitations” section.Research question
RQ1. Are OSS contributions mostly made by internal developers? |
RQ1.1. Are internals the top contributors of company-owned OSS projects? |
RQ2. Who faces a harder time to get the contributions accepted? |
RQ3. Are externals more participative in the pull request review cycle? |
RQ4. What are the characteristics of the contributions made by external developers? |
atom
for manual analysis, which represents a confidence level of 95% with a ± 5% confidence interval. We also validated this analysis with another manual analysis in a random sample of 150 pull requests accepted at hubot
. The qualitative analysis was conducted in parallel by two researchers, who investigated the pull requests individually. We also quantitatively compared the characteristics of the pull requests placed by internal and external members in terms of number files changed, added lines, deleted lines, and the number of commits per pull request. We considered each pull request as an observation and, once again, we used MWW tests [6] and Cliff’s delta effect size measures [9] to compare the groups. The results for this question are presented throughout the “RQ4. What are the contributions’ characteristics made by externals?” section.Results
RQ1. Are OSS contributions mostly made by internal developers?
atom
and git-lfs
; on the other hand, external developers made a higher number of pull request in electron
, hubot
, and linguist
. For hubot
and linguist
, external developers are responsible for more than 75% of the pull requests in the project. If we consider all projects, we found 5895 pull requests provided by internal developers (43.3%) and 6266 by external ones (56.7%). However, the number of contributors greatly differ between internals and externals, as it can be observed in Table 1. As an extreme case, project electron
has 681 external contributors, and only 21 internal (while the number of contributions made by external developers is almost two times greater than those made by internal developers). That is, although the number of external developers is up to 32 × greater than internal ones, most of external developers perform few contributions.
Projects | External | Internal | ||
---|---|---|---|---|
#Contributors | #Pull requests | #Contributors | #Pull requests | |
Atom
| 365 | 1546 | 35 | 2206 |
Electron
| 681 | 2442 | 21 | 1385 |
Git-lfs
| 82 | 435 | 6 | 938 |
Hubot
| 241 | 557 | 20 | 131 |
Linguist
| 645 | 1860 | 29 | 579 |
electron
, git-lfs
, and hubot
present low rates of open pull requests, 0.88, 0.13, and 0.08%(!), respectively. For the latter, at the time of data collection, only 3 pull requests were left open.RQ1.1. Are internals the top contributors of company-owned OSS projects?
git-lfs
), the number of external developers is greater than the number of internal developers in the top 10 (6 externals, 4 internals). This finding suggests that externals are well participative. However, even in this case, by analyzing the code-churn, the top 2 developers (both internal) are by far the main contributors of the git-lfs
project (top 1: 124,197 additions and 75,831 deletions; top 2: 89,065 additions and 74,576 deletions; sum of top 3 to top 5: ≈61,300 additions and ≈33,600 deletions).
Projects | # External | # Internal |
---|---|---|
Atom
| 1 | 9 |
Electron
| 4 | 6 |
Git-lfs
| 6 | 4 |
Hubot
| 4 | 6 |
Linguist
| 3 | 7 |
hubot
, 65% of the internals are casual).
Projects | External | Internal | ||
---|---|---|---|---|
# Casuals | % | # Casuals | % | |
Atom
| 269 | 74 | 5 | 14 |
Electron
| 480 | 70 | 6 | 29 |
Git-lfs
| 55 | 67 | 6 | 33 |
Hubot
| 200 | 82 | 13 | 65 |
Linguist
| 534 | 83 | 9 | 31 |
Total | 1538 | 76 | 39 | 35 |
RQ2. Who faces a harder time to get the contributions accepted?
linguist
, we can see that the number of pull requests from externals outperforms those from employees by far, and for every month. However, analyzing the closed but unmerged pull requests (the ones that were not accepted), we could notice that many external developers are having a hard time attempting to get their contributions accepted. This is noticeable in the second column of graphics in Fig. 2. In Table 2, we could confirm that most of the unmerged (closed) pull requests were done by external developers for 4 out of 5 projects (p value ≤ 0.001), with a medium or large (negative) effect size. A possible explanation is that employees work on critical and follow project directions (defined inside the company), while external submissions are, sometimes, motivated by specific needs, not necessarily aligned with the project’s direction.hubot
and linguist
are the ones that take more time to process pull requests, either from internals (333 and 426 days for hubot
and linguist
, respectively) or externals (1144 and 832 days for hubot
and linguist
, respectively). To better understand why these pull requests made by externals are taking too much time to be processed, we investigated the ones that lasted the most.hubot
project was aimed to improve the documentation (it adds 32 lines in a Markdown file); five commits had been made to this pull request. Although project maintainers needed some time to review the contribution (the final modification suggested was about 300 days after the pull request was created), it seems that the pull request was forgotten, and only 2 years after the last change was made, another project maintainer passed through the pull request and merged the patch. On the other hand, the pull request #2070 submitted to the project linguist
is a bit more complex. It was aimed to introduce PEP8 support, which is the code convention for writing Python code. Similar to the previous pull request, in this one, the maintainers also seem to forgot to follow-up with the code review. The external member brought back the attention to this pull request, mentioning: “I’m recalling this pull request has been open for over a year now (wow, nearly two, time flies), is there anything I can do to help it being merged into master aside from fixing the conflicts that have arisen since its opening?”. Four months after this message, another maintainer provided additional comments, and 1 month after the pull request was merged.RQ3. Are externals more participative in the pull request review cycle?
atom
and electron
are processed by internal developers (83 and 94%, respectively). However, for the remaining projects, the number of pull requests processed by external developers is indeed greater than the ones processed by internal developers. In particular, project linguist
is an extreme example, with 78% of the pull requests being processed by external developers. However, after a closer look at the data, we found that few integrators are responsible for processing the majority of the pull requests. For instance, two internal integrators processed 85% of the pull requests submitted to project electron
. Figure 7 shows a different perspective: the percentage of unique integrators that are internal or external developers.
linguist
project has 15 internal integrators and 17 external). The only exception to this trend is the project hubot
, in which 11 (78%) of the integrators are external developers (which corroborates with the findings of the “RQ1.1. Are internals the top contributors of company-owned OSS projects?” section, that indicates a large proportion of internals are casual contributors for this particular project). Regarding the amount of work devoted to each kind of contributor (either internal or external), we observed that internal integrators processed more pull requests on projects atom
, hubot
, and electron
. In particular, internal integrators of project electron
processed 28 × more pull requests than their counterparts. Moreover, although the project hubot
has more unique external integrators (11 externals and 3 internals), internal integrators are responsible for managing the majority of the pull requests (internals integrators processed 3 × more than external ones). On the other hand, on projects linguist
and git-lfs
, external integrators processed more pull requests than internals (3.26 × and 1.82 ×, respectively).RQ4. What are the contributions’ characteristics made by externals?
atom
project, before creating a pull request, internal developers create an issue that describes what are the project needs. Therefore, most of the pull requests proposed are accepted because internal developers were expecting it. For externals, pull requests that fix documentation problems are the most common ones (we found 27 instances of them). Some examples include broken URL15, not enough information16, and code comments17. Notwithstanding, non-trivial code changes often come with a detailed description (images are common). We found a similar pattern for hubot
. Most of the pull requests from external developers are related to documentation issues18, although complex code changes exist19. Finally, these two projects seem to welcome external users: they not only answer most of the requests from external developers, but they also guide their contributions to an acceptable state (as mentioned before, providing comments to improve the pull request).electron
, for example, internal developers added 173,319 lines in total (mean = 130.51 lines per pull request, median = 19.5, q3 = 71.25, stdev = 630.50) and changed 10,092 files (mean = 7.60 files per pull request, median = 3, q3 = 6, stdev = 20.57), while external added 150,667 lines (mean = 75.30 lines per pull request, median = 12, q3 = 52, stdev = 267.56) and changed a total of 8067 files (mean = 4.03 files per pull request, median = 1, q3 = 4, stdev = 10.52).
hubot
; for additions, there is no statistically significance for both hubot
and linguist
. Overall, we can see that both internal and external contributions are small (few files, and small additions and deletions). As noted elsewhere, smaller changes are more likely to be accepted [10] and can also reduce the chance of breaking the continuous integration build [16].
hubot
in which we found a medium effect size (delta = 0.350). of commits are not common. This finding suggests that both groups follow well-known guidelines for contributing to OSS (small commits and few commits per pull request [10, 17]).
Discussion
The main takeaways
-
External developers are welcome. Our results showed that the external community is supporting the companies maintaining the project by means of contributing to them. In particular, we found cases which external members play crucial roles in the projects, such as reviewing and integrating pull requests. This could only be possible because the studied projects welcome external members (which is not always the case of open-source software [18]). We further support this claim by inspecting welcoming-community features 20 available in the studied projects. All of the studied projects present a description, a README.md file, a Code of Conduct file, a CONTRIBUTING.md file, and a license file.
-
External developers still need guidance. Some projects tag the issues to make it easier for externals to find a task to solve (including
atom
,electron
, andlinguist
which provide specific tags for newcomer-friendly tasks). However, given the high number of unmerged pull requests from external developers (Fig. 2), external developers have to understand the project’s direction and follow its guidelines when submitting a pull request; otherwise, their contributions are more likely not to be accepted [19]. -
Few external developers become long-term contributors. Even though we found external developers supporting the studied projects, few of them have a long-term contribution history (the only exceptions are the outliers). As one can observe in Fig. 3, the majority of external developers place a single contribution to the projects and never show up again. For some projects (
hubot
andlinguist
, in particular), even internal developers do not place too many pull requests. However, when looking from a different perspective, the total number of pull requests placed by external developers is greater than those submitted by employees, as it can be noticed from Table 1. Similarly, there are projects with small participation from employees (although they company keep contributing to it). This result might indicate that the company-owned project is now a community effort. -
External developers can wear the integrator hat. Although integrators are usually employees, we also found externals that play this role, which indicates a high involvement from the external community in company-owned OSS projects. However, when analyzing
atom
, we could find external developers who are in charge of triaging and commenting on issues (who are also among the top contributors). These externals describe themselves as “@atom community volunteer” or “@atom maintainer.” Therefore, further research is needed to understand what are the actual roles played by external and internal developers in this kind of project. Figuring out the boundaries of responsibilities is an interesting future direction for this research that can benefit companies and communities.
Wrap up
atom
, who, in his personal home page, mention that “In my free time I contribute to Atom, GitHub’s text editor, as one of the community maintainers of the project.” We found similar when analyzing the top contributors of git-lfs
and electron
. This might suggest that altruism is still present in open-source communities.Related work
Commercial involvement/paid developers in OSS projects
atom
and hubot
. Analyzing the software history, we observed that external developers often onboard company-owned OSS projects in the very first weeks after open-sourcing, but abandon few commits ahead (the so-called newcomers’ wave). In this work, we also observed that the majority of external contributors are casual ones (e.g., have contributed at most with one commit). We also observed a burst in the number of issues and pull requests right after open-sourcing the software project. In a follow-up study, we studied the reasons that motivated 50 company-owned OSS projects to delete their software history before going open-source [4]. Among the reasons, we observed that code that contains sensitive information (e.g., user credentials) is one of the most common reasons for deleting the history, although other so far uncommon reasons such as the lawyers having to inspect each commit was also observed.Casual contributors’ phenomenon
linguist
project. This is larger than the results we have in a previous study [14], in which we identified that casual contributors account for up to 61% of the contributors of open-source projects written in JavaScript (4 out of the 5 projects analyzed here are written in JavaScript). Investigating the reasons behind this large number of casual contributors in this kind of project can be an interesting future direction.Limitations
Conclusions
atom
, electron
, git-lfs
, linguist
, and hubot
projects. We found that these projects are very receptive for external developers: many externals play important role in the studied projects, such as reviewing and integrating pull requests. Considering all the projects, internal developers are responsible for 43.3% of the pull requests performed (external developers placed 56.7%). Analyzing just hubot
project, we observed that only 18% of the pull requests had been placed by internal developers. However, the absolute number of external members is many more times greater than internal ones. As a consequence, many externals are casual contributors (i.e., developers that only contributed once (although we also identified internals that are also casual contributors).