BLOG: Assessing the use of game state in predictive models

Article by Devin Plueler

Teams that are winning tend to behave differently from teams that are losing. For example, winning teams are less inclined to trade possession in favour of comparatively speculative scoring chances, whereas a losing team might. The reason is: “game state” and since 2010, Premier League teams that were losing have outshot their opponents 11868 to 10955. Given the 8% difference in shooting rates across the two game states, and the large sample size, the difference is as jarring as it is significant.

The ruthless management of game states aren’t just trademarks of pragmatic coaches and savvy players. Game state implications belong to the fabric of the game, ever present in matches ranging between the youngest recreational soccer to the World Cup final.

The effects of game state don’t just skew top-level metrics such as our shot volume example; they persist through every level of statistical granularity. For example, since 2010 in the Premier League, losing teams have an overall shooting conversion rate of 9.0%. That’s one goal every 11.1 shots. Conversely, winning teams have had an inflated conversion rate of 11.8%, or one goal every 8.5 shots. Given the generous sample size, we are certain that the difference between these two goal scoring rates cannot be attributed solely to pure luck. Game state is part of the underlying mechanics of our game.

There are plenty of tangential statistical artifacts that go hand-in-hand with our broad observations about game state. For example, losing teams don’t just take more shots, they also trend toward taking shots from greater distances. Game state alone doesn’t cause the conversion rate of losing teams to drop, it changes the qualifying attributes of the shots being taken. The indirectness of this effect is deeply important for the arguments that follow.

Since the effects of game state cascade across every statistical category, it’s become a mission of the soccer analytics community to create so-called “game state controlled” measurements, and it’s not easy to untangle something so utterly pervasive.

One of my recurring themes is the argument that “not all shots are created equal”, but that claim is not rooted in an attempt to control for game state. Instead it’s tied to the fact that every shot has a different probability of resulting in a goal depending on qualifying observations surrounding it.

In an ideological sense, game state has no direct effect on a goal-bound shot and therefore has no place in a predictive shooting model. If the same player is given precisely the same scoring opportunity at two different game states, the player’s scoring rate shouldn’t change. The effects of game state should theoretically be absorbed into the other observations collected around a shooting attempt. But that’s much easier said than done.

Due to having an incomplete understanding of each scoring attempt, we turn to proxies to help fill in the holes. The reason why game state does add immediate predictive value to shooting models is because it’s a decent proxy for things like defensive pressure. For example, teams that are winning are more difficult to score against because their defensive tactics usually become more robust.

But incorporating off-field context such as game state into shooting predictions, in my opinion, is a step in the direction of the cardinal sin of statistical modelling: over-fitting.

Not all one-goal leads, like shots, are made equally. One-goal leads have different values at different stages of the game, and can even stimulate different teams to react different ways. Conversely, a shot from a longer distance will always be more difficult than a shot from a shorter distance given that all other variables are constant.

While controlling for game state does remove noise from our overall prediction, it’s unclear as to where to attribute the loss of the signal’s attenuation. When a player moves the ball from goal-scoring position A to goal-scoring position B, it’s intuitive to (at least partially) credit that player with the value difference between the two states. But, with the value difference between similar scoring opportunities at differing game states, it’s unclear where the credit should be apportioned. It seems disingenuous to credit the magical game state demigod with this difference in scoring efficiency. In other words, I’m not completely comfortable crediting “game state” with a partial assist.

In truth, the efficiency gained by including game state in predictive models is mostly due to the current standards of coaching convention, but some of it undoubtedly belongs to other things. Until we can better tell the difference, it might be best to only model on-field context and err on the side of under-fitting for the sake of better understanding the residuals of our model predictions.

As our understanding grows about the importance of the events surrounding each shot and we update our models to reflect our knowledge, the reliance of game state in predictive models will decrease. Any other attribute worth modelling does not behave like this. And because of that, it feels like cheating.

Back to analysis