Over the past year of football analytics work, there has been a broad shift from applying metrics revolving around single on-ball events to taking a more holistic approach by analysing sequences of events which constitute a period of possession.
This approach can often provide the additional context which is sometimes lacking from single-event metrics. Consider expected goals (xG). It acknowledges past shots and features such as pitch location and angle to goal, and estimates the likelihood that the shot would result in a goal. Incorporating assist type within the xG model is also commonplace.
Including assists is a logical extension of thinking about what makes a good chance and you could extend this even further by considering second assists or other types of events (dribbles) which precede a shot. Single-event metrics like this often beg for a contextualising framework, which is where a possessions model fits in.
Stringing events together
We’ve written in the past about stringing events together to re-define possessions but have since formalised that framework into a model from which a number of useful and novel statistics can be derived at the player, team, and event levels. Many football analytics practitioners have also demonstrated similar ideas and algorithms.
The details of the Opta possessions model are similar to what the above article from 2012 describes, but events are organised into sequences and possessions.
Sequences are defined as passages of play which belong to one team and are ended by defensive actions, stoppages in play or a shot.
Possessions are defined as one or more sequences in a row belonging to the same team. A series of passes leading to a shot which is saved and results in a corner kick would comprise one possession since the same team retains control, but more than one sequence, since the ball has gone out of play. A possession is ended by the opposition gaining control of the ball.
It's worth noting some important features of this model:
- Not every event belongs to a sequence or a possession
- Relatedly, not every second of a match where the ball is in play is marked as belonging to a particular team
- A sequence starts with a player making a controlled action on the ball. This includes passes but not defensive events such as tackles and interceptions, unless these events are followed by a controlled action such as a pass or dribble
- The number of possessions belonging to each team can only differ by one in a given match. This may seem counterintuitive related to the traditional notion of possession percentage, but it’s logically consistent when counting individual possessions that if one team ends possession, the other team begins a possession
- However, both the time of possession by each team and the number of sequences within a given team’s possessions need not be equal
Consider this sequence example (in this case also a possession), where Liverpool start with a headed pass (in green) by Joel Matip and end in a shot off the woodwork (green arrow).
The above sequence tells us where and what type of event won the ball back for Liverpool before their shot. We also know that about 14.5 seconds elapsed between winning the ball and the shot, and that the sequence traced between all events covered 126.44 metres, 55.96 metres if we consider only the distance progressed directly up-field.
Within this sequence, we can also identify the number of passes in the build-up to this shot.
Tendencies of possessions
The below histogram shows the basic distribution of the number of possessions per game, and how often different number of possessions occur. Matches typically have slightly fewer than 200 possessions per game, or 90-100 per team.
The graph below shows the distribution of sequence lengths across all sequences in the 2016-17 Premier League season. As you can see, many sequences are short and get interrupted before covering much ground, and the general trend is that frequency tapers off for longer and longer total lengths.
One example of a statistic which can be calculated from a sequence is direct speed. We define this to mean the number of metres that the ball travels (when measuring directly up-field), divided by the total time of the sequence.
Referring to the earlier Liverpool sequence, the direct speed would be 3.85 metres per second (55.96 metres divided by 14.5 seconds). When comparing against others, it turns out that this is a relatively quick sequence. Again, this is more relevant on the sequence level rather than the possessions level, since interruptions in possessions make speed metrics less meaningful.
Below you can see the median direct speed from open play sequences for Premier League teams in 2016/17. It's interesting to note that this captures some elements of style not necessarily correlated with successful outcomes, as evidenced by Arsenal appearing between Stoke City and Leicester, while Manchester United are closer to Hull and Bournemouth.
Introducing tactical applications
The possessions framework can answer many questions which an analyst or coach might pose that single-event metrics often struggle to encapsulate. Specifically, it can provide answers to questions which deal with how patterns and actions occur in succession.
For a more detailed example, consider how a team’s directness changes depending on two factors: where they win the ball back, and whether they're playing wide or centrally.
To answer this question, it's necessary to have start location and a notion of width as features of sequences. Grouping sequences based on their starting location (own half vs opponent half) serves the first purpose. To categorise width, I defined sequences as either wide or central by considering whether most up-field progress took place within the central channel or one of the wide channels shown below.
Chelsea don't initially stand out in regards to direct speed. However, considering specific splits based on start location and width of possession, we can highlight that they rank in the top three in the 2016-17 Premier League by this metric when winning the ball back in their own half and progressing centrally (2.47 m/s, compared with 1.93 m/s when progressing centrally from their own half).
This is of course a somewhat particular example but it is a good showcase of how the context provided by multiple events can allow statistical approaches to answer questions which are sometimes difficult to answer with single-event metrics. A similar example was also adopted by Will Gürpinar-Morgan in his presentation at the 2017 OptaPro Analytics Forum.
When applying event data to analyse the game, this possessions-style framework - now becoming commonplace - can significantly aid this style of work, offering a platform to gain a more informed understanding of a player, team or league's overall stylistic approach.
On behalf of OptaPro, I'd like to take this opprtunity to thank Michael Caley, Garry Gelade, Sam Green and Ian Graham for their thoughs and feedback regarding this model.