Looking into 2020’s Pull Requests: Part III
Hi there! In our previous article we did a deep dive into the merged pull requests of 283 034 starred public GitHub projects.
For this article, we will present and analyze the merged pull requests on a selected set of public projects.
Data Selection
As usual in this series, we will continue with the data from the GH Archive project from January to March of 2020 and the same initial selection of ≈ 2.6M merged pull requests. We collected for each pull request:
- The name of the project
- The login of the GitHub user that merged the Pull Request
- The number of stars of the project
- The duration (closed - opened time)
- The number of comments
- The number of files changed
- The number of lines changed (additions + deletions)
After aggregating this data per project, we wanted to select 50 out of the 280K+ projects. We decided to rank the projects based on the number of merged pull requests and the number of stars. For the number of stars, since it could be different between pull requests, we considered the stars of the project from a random merged pull request. As a ranking, we wanted to prevent selecting projects which are not interesting because they have a lot of stars but no pull request activity or vice versa. We decided to use the formula: log2(number of pull requests) * log10(number of stars). With this ranking, we prioritize the number of pull requests over the number of stars.
The following table presents the list of the top 50 projects from our ranking together with the number of stars and pull requests merged.
Analysis
Similar to our previous posts, we considered averages and percentiles to get an overview that could be easy to understand for all projects.
GitHub users that merge pull requests
We started by looking at the number of users that merge pull requests. We knew that most (if not all) of the projects we selected were mature open source projects with hundreds of contributors. Looking at the number of users that merge pull requests could provide some insights into the way these projects work in practice.
The following plot presents the number of GitHub users that merge PRs on all 50 projects. We thought it would be interesting to see the variations in the projects based on our ranking. For that reason, in the horizontal-axis of all our plots, we have ordered the projects by ranking.
By looking at three projects (kubernetes/kubernetes, rust-lang/rust and helm/charts) with a single user, we see they use automated tools for the pull request merge action. In particular, both kubernetes/kubernetes and helm/charts use the Kubernetes Prow Robot. Other projects with low number of users seem to use a hybrid approach where some of the PRs are merged by bots, and others by maintainers of the project.
On the other hand, six projects (elastic/elasticsearch, elastic/kibana, flutter/flutter, apple/swift, NixOS/nixpkgs and Automattic/wp-calypso) have more than 50 users merging pull requests. Do you think this implies these projects have a strong community of contributors and maintainers?
Duration
In our previous post, we saw that the average of the average durations across all projects is around 15 days. We also observed by looking at the percentiles over the projects that durations grow exponentially. We were curious to see if these patterns also occur in mature open-source projects.
We decided to plot the data for the average and percentiles as opposed to presenting the data as a table because it’s easier to digest and observe outliers and trends.
We noticed that as with the durations across all 280K+ public projects, the average and some of the percentiles have huge differences. For this reason, we chose to separate the data into two plots: one for the average and the median (50th percentile), and another for the 90th and 99th percentiles.
Looking at the averages and medians, we see that documentation projects such as firstcontributions and freeCodeCamp are clear outliers at opposite sides when compared to the other projects. A couple of observations we found interesting:
- The average of averages of the pull request durations in these projects is 8.6 days (almost half of the average of averages across all projects).
- 50% of the merged pull requests across the projects have a duration of under 5.5 days. That’s just a bit more than a workweek!
In the next plot, we show the 90th and the 99th percentile durations.
The plot with the 90th and 99th percentiles tells more about the distribution of the durations. First, we can see that scale from the 50th to the 90th goes from days to weeks, and that from 90th to 99th it goes from weeks to months. For instance, the average of the 90th percentiles is ≈ 4 weeks, and the average of the 99th percentiles is ≈ 4 months. In this plot, the microsoft/vscode project is not only the one with the highest 99th percentile (well over one year) but also the project with the biggest difference between the percentiles.
Files and lines of code changed
Typically, code reviews are conducted by reviewing changes on a file per-file basis. Everyone has experienced being overwhelmed when they had to review a lot of changes or changes across many files. However, it is not clear how many times developers around the world are experiencing this on a daily basis. Looking at this information over these repositories could provide some insights on what’s happening with open source projects.
In the next plot, we show the average and median number of files changed in the merged pull requests per project.
The first project that (literally) jumps in this data is the dotnet/roslyn project where the average number of files changed is almost 229 files! We haven’t yet examined what’s occurring but it’s probably related to a high number of unit tests that needs to be updated frequently. If that’s the case, does that mean that the average number of files changed in the pull requests is a good approximation of the testing efforts in a project?
The average of the averages is 8.6 files (excluding the dotnet/roslyn project which is a clear outlier). In comparison the average of the medians is 2.34 files (including the dotnet/roslyn project). That means that a lot of pull requests change one or two files. That's usually considered a good practice for pull requests: you want to make them as small as possible so that they are easier to review.
The next plot shows the percentage of pull requests that only change one file per project.
At the high-end of the spectrum we have Homebrew/homebrew-cast and firstcontributions with above 99% of pull requests changing only one file, and at the low-end we have helm/charts with 0.65%. The average of the percentages across all projects is 37% of pull requests changing only one file.
After the number of files, we looked into the number of lines of code (LOC) changed. These were computed as additions plus deletions. To simplify the presentation, we only consider the average and the 90th percentile.
Considering the plot of the changes in files it’s not a surprise to see the dotnet/roslyn project as an outlier. The average of the averages is ≈ 643 LOC which is actually quite high for a code review. Even though we don’t present this data here, the average of the medians is ≈ 36 LOC. It would be interesting to look into the projects where the average is higher than the 90th percentile. It would also be interesting to see how these correlate with major refactorings in the projects.
Comments in pull requests
Finally, we analyzed the comment activity in the pull requests. As we mentioned in the previous post, GitHub has two main types of comments associated with pull requests shown in the conversation tab:
- Pull request comments: these are single comments on the file diff between the two branches or general comments in the pull request.
- Pull request review comments: these are comments on the file diff made during a pull request review and they show up grouped in the conversation tab when the review is finalized.
We considered pull requests review comments as a representation of the code review process and all comments as a representation of the discussion of the pull request. However, we are aware that most (if not all) of the projects use some automated tool that introduces comments in the pull requests with the Continuous Integration results.
The following plot shows the average of comments and the average of review comments per project.
At the low end of the spectrum, we find the Homebrew/homebrew-cast and firstcontributions projects. This is not very surprising since most of the pull requests in these projects wouldn’t require discussion. The project with the highest average of comments is kubernetes/kubernetes followed by rust-lang/rust. Although we are not sure, this could be related to the usage of automation tools. Overall, both projects have a high average of review comments where the top project is dotnet/roslyn with 7.23 review comments.
Next, we looked at the 90th percentile for comments and review comments.
We can observe the same trends in the 90th percentiles with respect to the averages where kubernetes/kubernetes, rust-lang/rust and dotnet/roslyn are the projects with the highest percentiles. The average of the 90th percentile for comments is ≈ 19 and the average of the 90th percentile of review comments is ≈ 12.
Finally, we look into the percentage of pull requests with 0 comments and 0 review comments.
As expected, the projects with the lowest number of comments have the highest percentages. However, we surprised to see such high percentages:
- 27 projects have more than 25% of pull requests without a single comment.
- The project pandas-dev/pandas has the lowest percentage of pull requests without a single review comment. However the percentage is still above 50%.
- 19 projects have merged more than 75% of their pull requests without a single review comment.
If this data is representative of the review process happening in these projects, it is worrisome considering how many developers around the world depend on the quality of some of these projects.
Let us know what you think about these findings!