Looking into 2020’s Pull Requests: Part III

24th April 2020 ・ By Marcelo Sousa ・

Hi there! In our previous article we did a deep dive into the merged pull requests of 283 034 starred public GitHub projects.

For this article, we will present and analyze the merged pull requests on a selected set of public projects.

Data Selection

As usual in this series, we will continue with the data from the GH Archive project from January to March of 2020 and the same initial selection of ≈ 2.6M merged pull requests. We collected for each pull request:

The name of the project
The login of the GitHub user that merged the Pull Request
The number of stars of the project
The duration (closed - opened time)
The number of comments
The number of files changed
The number of lines changed (additions + deletions)

After aggregating this data per project, we wanted to select 50 out of the 280K+ projects. We decided to rank the projects based on the number of merged pull requests and the number of stars. For the number of stars, since it could be different between pull requests, we considered the stars of the project from a random merged pull request. As a ranking, we wanted to prevent selecting projects which are not interesting because they have a lot of stars but no pull request activity or vice versa. We decided to use the formula: log2(number of pull requests) * log10(number of stars). With this ranking, we prioritize the number of pull requests over the number of stars.

The following table presents the list of the top 50 projects from our ranking together with the number of stars and pull requests merged.

Rank	Project Name	Number of Stars	Number of Pull Requests
1	elastic/elasticsearch	46364	2702
2	elastic/kibana	13520	5137
3	Homebrew/homebrew-cask	16414	4191
4	flutter/flutter	83109	1150
5	kubernetes/kubernetes	61932	1352
6	apple/swift	50203	1534
7	rust-lang/rust	41568	1388
8	DefinitelyTyped/DefinitelyTyped	25834	1682
9	tensorflow/tensorflow	139655	562
10	ansible/ansible	41172	1160
11	gatsbyjs/gatsby	41057	1100
12	getsentry/sentry	23458	1535
13	electron/electron	79752	658
14	pandas-dev/pandas	22910	1432
15	python/cpython	28633	1144
16	NixOS/nixpkgs	4538	5317
17	ant-design/ant-design	55004	737
18	istio/istio	21116	1307
19	home-assistant/home-assistant	30485	965
20	grafana/grafana	32812	887
21	cockroachdb/cockroach	17571	1327
22	godotengine/godot	26979	925
23	freeCodeCamp/freeCodeCamp	307858	243
24	mui-org/material-ui	53364	530
25	pingcap/tidb	22138	875
26	microsoft/TypeScript	56969	479
27	facebook/react	141792	285
28	firstcontributions/first-contributions	9187	1495
29	dotnet/roslyn	12264	1172
30	zeit/next.js	43673	500
31	microsoft/vscode	89567	332
32	discourse/discourse	29733	606
33	Automattic/wp-calypso	11320	1169
34	ceph/ceph	7227	1636
35	twbs/bootstrap	137946	258
36	thepracticaldev/dev.to	11829	1100
37	elastic/beats	8358	1403
38	mrdoob/three.js	57643	386
39	denoland/deno	40635	461
40	home-assistant/core	32096	511
41	apache/airflow	15160	772
42	helm/charts	11493	923
43	go-gitea/gitea	17719	673
44	dotnet/aspnetcore	15470	709
45	netdata/netdata	43863	359
46	ampproject/amphtml	13372	742
47	prettier/prettier	34974	392
48	JuliaLang/julia	25013	466
49	symfony/symfony	22506	495
50	keybase/client	6301	1214

Analysis

Similar to our previous posts, we considered averages and percentiles to get an overview that could be easy to understand for all projects.

GitHub users that merge pull requests

We started by looking at the number of users that merge pull requests. We knew that most (if not all) of the projects we selected were mature open source projects with hundreds of contributors. Looking at the number of users that merge pull requests could provide some insights into the way these projects work in practice.

The following plot presents the number of GitHub users that merge PRs on all 50 projects. We thought it would be interesting to see the variations in the projects based on our ranking. For that reason, in the horizontal-axis of all our plots, we have ordered the projects by ranking.

By looking at three projects (kubernetes/kubernetes, rust-lang/rust and helm/charts) with a single user, we see they use automated tools for the pull request merge action. In particular, both kubernetes/kubernetes and helm/charts use the Kubernetes Prow Robot. Other projects with low number of users seem to use a hybrid approach where some of the PRs are merged by bots, and others by maintainers of the project.

On the other hand, six projects (elastic/elasticsearch, elastic/kibana, flutter/flutter, apple/swift, NixOS/nixpkgs and Automattic/wp-calypso) have more than 50 users merging pull requests. Do you think this implies these projects have a strong community of contributors and maintainers?

Duration

In our previous post, we saw that the average of the average durations across all projects is around 15 days. We also observed by looking at the percentiles over the projects that durations grow exponentially. We were curious to see if these patterns also occur in mature open-source projects.

We decided to plot the data for the average and percentiles as opposed to presenting the data as a table because it’s easier to digest and observe outliers and trends.

We noticed that as with the durations across all 280K+ public projects, the average and some of the percentiles have huge differences. For this reason, we chose to separate the data into two plots: one for the average and the median (50th percentile), and another for the 90th and 99th percentiles.

Looking at the averages and medians, we see that documentation projects such as firstcontributions and freeCodeCamp are clear outliers at opposite sides when compared to the other projects. A couple of observations we found interesting:

The average of averages of the pull request durations in these projects is 8.6 days (almost half of the average of averages across all projects).
50% of the merged pull requests across the projects have a duration of under 5.5 days. That’s just a bit more than a workweek!

In the next plot, we show the 90th and the 99th percentile durations.

The plot with the 90th and 99th percentiles tells more about the distribution of the durations. First, we can see that scale from the 50th to the 90th goes from days to weeks, and that from 90th to 99th it goes from weeks to months. For instance, the average of the 90th percentiles is ≈ 4 weeks, and the average of the 99th percentiles is ≈ 4 months. In this plot, the microsoft/vscode project is not only the one with the highest 99th percentile (well over one year) but also the project with the biggest difference between the percentiles.

Files and lines of code changed

Typically, code reviews are conducted by reviewing changes on a file per-file basis. Everyone has experienced being overwhelmed when they had to review a lot of changes or changes across many files. However, it is not clear how many times developers around the world are experiencing this on a daily basis. Looking at this information over these repositories could provide some insights on what’s happening with open source projects.

In the next plot, we show the average and median number of files changed in the merged pull requests per project.

The first project that (literally) jumps in this data is the dotnet/roslyn project where the average number of files changed is almost 229 files! We haven’t yet examined what’s occurring but it’s probably related to a high number of unit tests that needs to be updated frequently. If that’s the case, does that mean that the average number of files changed in the pull requests is a good approximation of the testing efforts in a project?

The average of the averages is 8.6 files (excluding the dotnet/roslyn project which is a clear outlier). In comparison the average of the medians is 2.34 files (including the dotnet/roslyn project). That means that a lot of pull requests change one or two files. That's usually considered a good practice for pull requests: you want to make them as small as possible so that they are easier to review.

The next plot shows the percentage of pull requests that only change one file per project.

At the high-end of the spectrum we have Homebrew/homebrew-cast and firstcontributions with above 99% of pull requests changing only one file, and at the low-end we have helm/charts with 0.65%. The average of the percentages across all projects is 37% of pull requests changing only one file.

After the number of files, we looked into the number of lines of code (LOC) changed. These were computed as additions plus deletions. To simplify the presentation, we only consider the average and the 90th percentile.

Considering the plot of the changes in files it’s not a surprise to see the dotnet/roslyn project as an outlier. The average of the averages is ≈ 643 LOC which is actually quite high for a code review. Even though we don’t present this data here, the average of the medians is ≈ 36 LOC. It would be interesting to look into the projects where the average is higher than the 90th percentile. It would also be interesting to see how these correlate with major refactorings in the projects.

Comments in pull requests

Finally, we analyzed the comment activity in the pull requests. As we mentioned in the previous post, GitHub has two main types of comments associated with pull requests shown in the conversation tab:

Pull request comments: these are single comments on the file diff between the two branches or general comments in the pull request.
Pull request review comments: these are comments on the file diff made during a pull request review and they show up grouped in the conversation tab when the review is finalized.

We considered pull requests review comments as a representation of the code review process and all comments as a representation of the discussion of the pull request. However, we are aware that most (if not all) of the projects use some automated tool that introduces comments in the pull requests with the Continuous Integration results.

The following plot shows the average of comments and the average of review comments per project.

At the low end of the spectrum, we find the Homebrew/homebrew-cast and firstcontributions projects. This is not very surprising since most of the pull requests in these projects wouldn’t require discussion. The project with the highest average of comments is kubernetes/kubernetes followed by rust-lang/rust. Although we are not sure, this could be related to the usage of automation tools. Overall, both projects have a high average of review comments where the top project is dotnet/roslyn with 7.23 review comments.

Next, we looked at the 90th percentile for comments and review comments.

We can observe the same trends in the 90th percentiles with respect to the averages where kubernetes/kubernetes, rust-lang/rust and dotnet/roslyn are the projects with the highest percentiles. The average of the 90th percentile for comments is ≈ 19 and the average of the 90th percentile of review comments is ≈ 12.

Finally, we look into the percentage of pull requests with 0 comments and 0 review comments.

As expected, the projects with the lowest number of comments have the highest percentages. However, we surprised to see such high percentages:

27 projects have more than 25% of pull requests without a single comment.
The project pandas-dev/pandas has the lowest percentage of pull requests without a single review comment. However the percentage is still above 50%.
19 projects have merged more than 75% of their pull requests without a single review comment.

If this data is representative of the review process happening in these projects, it is worrisome considering how many developers around the world depend on the quality of some of these projects.

Let us know what you think about these findings!