Code Review Metrics That Make Sense

22nd April 2021 ・ By Marcelo Sousa ・

It’s part of our nature to optimise every single process and code reviews are not an exception. If you are part of a team doing code reviews, at some point someone will want to measure some aspects of the process. It’s important to acknowledge that part of the ROI in code reviews is not directly measurable. Identifying the right metrics to monitor and start discussions around code reviews optimisations is problematic for developers and engineering leaders. Here are some best practices that we find work well in practice.

In our experience with Reviewpad, people look at metrics for two reasons:

They have the feeling that bottlenecks exist in the process and want to identify potential reasons;
They want to monitor the quality of their processes.

When it comes to code reviews, the metrics selected to deal with these scenarios are deeply connected to the underlying software development process and team organization.

Code reviews should always have specifications

Code reviews should always be associated with your project management tool through issues or tickets. You want to keep them correlated. If you are to understand where bottlenecks lie, you need to look at metrics in both areas. This is how you identify if the execution and/or the specification of a task are the issue.

This is why the first metric you should have is the percentage of code reviews without linked issues.

Issues and code reviews typically contain project-specific labels that describe if they are new features, bug fixes, or other improvements. If you want to drill down, split reviews down by label and consider the percentage of code reviews without linked issues for each one.

It is true some teams find it acceptable to not label some code reviews. As a best practice, you will want all bug fixes to be properly documented so that you avoid future regressions.

If you are following a model of continuous code reviews, the recommended method is that you create an issue and a corresponding draft pull request. This allows you to promote continuous discussion about the specification and code development.

Optimise for your release cycle

If your team needs to release multiple times a week, or once a month, your team organization will reflect it. As a result, the metrics that matter to you will also change in a fundamental way.

For teams that use pull requests and want to minimize their release cycle, a standard metric to have is the average time to merge pull requests. We consider this number as the average time a pull request is opened until it is merged for a given release cycle. This metric is more valuable when you have an a priori fixed release cycle (say every 2 weeks) and you want to see if pull request merge is blocking your release.

There is one issue with this method, which is a verified tendency to merge a lot of pull requests when the release deadline approaches.

We find it more informative to measure the distribution of pull requests merged over time to understand if the workload of the team is evenly distributed.

Minimise conflicts

There is a lot of information out there about how important it is to minimize the size of a pull request and the time to merge. Everyone is trying to avoid the same fundamental problem. Whenever you work in collaboration, you have to deal with conflicts.

Let’s face it: it is extremely annoying for anyone to spend a whole morning fixing git conflicts because someone else changed the codebase in a way that impacted their work.

If there is a metric you should be paying attention to, that certainly is the number of conflicts in pull requests. Make sure you monitor the distribution over time during the release cycle and ensure that it is always close to zero.

The size and duration of a pull request don’t matter as long as two things are true:

You are doing incremental and continuous reviews;
You’re not introducing potential conflicts with other team members.

Enforcing size constraints in pull requests to avoid conflicts is a kind of developer micromanagement technique. As a side effect, you will also end up with a pipeline of atomic pull requests wasting continuous integration resources and putting pressure on people to context switch to review them.

Share the review load and

measure communication

Even if your absolute priority is to lower the release cycle, you don’t want to ship low quality products. The best way to increase the quality is through code reviews. There are three key metrics that can help understand the quality of code reviews:

1 - Average time before reviewer assignment

Two facts:

There can’t be code reviews without reviewers;
A pending review is helping no one.

Nowadays, broadcasting on Slack that a pull request is ready for review is considered spam (and, to be fair, it’s hard to argue that it isn’t). Having a system picking you as a reviewer for whatever reason is a really good way to minimize this number without actually adding practical value.

When you practice continuous code reviews, you mitigate this problem by opening a draft pull request and not neglecting to immediately assign a team member to work with you as a reviewer.

2 - Distribution of reviews per team member

This distribution indicates if the review load is being evenly distributed. The team must feel code reviews are a part of the software development process. That means making them continuous and the distribution fair. If only one team member, typically the team lead or the principal developer, is in charge of the vast majority of the reviews, you are creating a bottleneck to merge. As a consequence, either this person will be spending their days doing reviews or the quality of the reviews will decrease with time. A good way to overcome this problem is to have two reviewers to break the review down between them.

3 - Distribution of comments per review

Reviews are dialogues, not checklists. Deep diving into the quality of the comments in the reviews is an excellent way to understand if the reviews themselves are any good. If most reviews have very few comments, this is an indication that either the actual review is taking place elsewhere (which might not be properly documented) or not happening at all. Even if you have a strong culture of pair programming, you will still benefit from code reviews.

Measure groups, not individuals

If you are an engineering manager, the only metrics you should pay attention to are at the team level. As with code reviews, the value of a code change is not linear. Any metric that will be focused on an individual will be biased — maybe they only performed one code review this week but the review suggested a change that saved the company a lot of money. We recommend nurturing a culture of continuous improvement where individuals can measure themselves and keep metrics used by engineering leaders at the team level.