Johannes Harkins is OptaPro’s recently appointed advanced analyst. In his first blog for OptaPro, Johannes discusses applying an expected goals metric to evaluate the impact of defenders. Johannes is based in Opta's American office and can be found on Twitter: @jmyrharkins
In the past year alone, advanced analysis of football statistics has made big strides, both within the professional club environment and across the online and academic community. Conversion rates, chance creation, and even expected goals are all familiar concepts regularly consumed and relied on as a strong foundation for performance evaluation. While these provide evidence that analysis has accelerated in its use of football statistics, the items at the forefront of this work tend to share a common thread: the evaluation of attacking skill. In the midst of a boom for the relevance of football data, use of statistical analysis as a means to evaluate defending has been much harder to come by.
To some extent this to be expected. Event-based data lends itself easily to evaluating offensive production, since a significant amount of (but certainly not all) attacking value is contributed on the ball. Defensive analysis, on the other hand, ends up quickly becoming a matter of evaluating alternate realities. A well-positioned defender may prevent a striker from even attempting a shot on goal. How do you put a number on preventing a shot that might have been?
In considering applying effective analysis for football I often look to work that has proven useful in other sports. For defensive metrics, we can look to the NBA, where some interesting progress has been made that’s worth examining. For a long time, on-ball statistics such as blocks and steals were used as a measure of individual defensive prowess in basketball. Recently, however, the basketball community and even the NBA itself has begun to look at statistics such as opponent shooting volume and efficiency from different areas while a player is on the court compared to how their team performs in these categories without the player. In this way, you can capture some of the defensive impact a player has on the team without trying to explicitly count new defensive statistics.
While the kinds of defensive actions recorded in event data can reveal interesting insights, it’s clear that interceptions and tackles alone do not necessarily make a competent defender. Using Opta’s database, I took a look at how this pioneering work from the NBA can be applied to football.
Using the an existing model for expected goals, I gathered information game by game for each defensive player in the Premier League, determining how many expected goals the opponent accumulated while the player was on the pitch, and while the player was off the pitch. I then adjusted these for the quality of opponent by subtracting the number of expected goals the opponent averaged. The final numbers for each player are an average of their expected goals allowed weighted by minutes on or off the pitch, and expressed as a per-90-minute rate statistic. Here negative numbers are desirable, since the rating measures the attacking chances allowed to the opposition.
Expected goal impact: Arsenal and Manchester United
The two teams showcased within this table have experienced different seasons in regards to defensive selection. Arsenal, particularly across the two full back positions, have used different players for regular periods of the season. Manchester United, however, have fielded ever-changing defensive line-ups in regards to both personnel and formation.
With Mathieu Debuchy injured early in the season, Arsenal featured Calum Chambers for a period of matches before Hector Bellerin was deemed first choice later in the season. Given that all three players featured for some length of time this season, there are adequate samples of Arsenal’s performance while playing with these players both on and off the pitch. The expected goals impact statistic backs Bellerin as the strongest option of the three. His rating of -0.2095 indicates that with him on the pitch, Arsenal allowed fewer expected goals than without him by 1/5th of a goal. Of course, we are only doing a defensive evaluation here - a full back would likely to be expected to contribute to the attack as well.
On the left side of the defence, both Kieran Gibbs and Nacho Monreal had spells as Arsenal’s starting full back, with Monreal holding the spot for much of the second half of the season. Monreal is rated much higher here, with Gibbs’ rating of 0.3591 implying that Arsenal performed worse defensively with him in the line-up than without.
For Manchester United, this statistic highlights the importance of Phil Jones. Having him at centre back was estimated to decrease the opponent’s expected goals by more than a third of a goal on average. Antonio Valencia and Ashley Young also rate highly, something which is mirrored by the relatively poor ratings of the players who most often filled in for them, Rafael and Luke Shaw respectively. Given the varied formations and interchanging of positions amongst Manchester United’s defensive players this season, there may be more insight to be gained by breaking down the expected goal impact of specific formations or comparing across a single position to see the influence of different players in that spot.
This metric is useful for both evaluating optimal combinations within a team as well as highlighting individual standout performers. It is not, however, a complete way to rate a defender. Basketball doesn’t map directly to football, and things such as the fluid nature of line-up changes in basketball make this a more universally applicable methodology in that case. Players like Per Mertesacker, who played nearly all of Arsenal’s minutes, have an expected goal impact which relies on a rating for him off the field based on a small sample of minutes. Furthermore, rating players based on the performance of opponents results in a number of uncontrollable variables affecting the final number. Nonetheless, this is a first step to quantifying the subtle ways in which defenders can make an impact, an area in which outside-the-box thinking is certainly required.
 This post isn’t designed to be a discussion of expected goals or the methodology therein. There are caveats to using this as a metric, including the fact that the expected goal value for a shot is a point estimate, and treating many of these as cumulative ignores a lot of variance in estimates and outcomes alike.