In 2011, a software developer named Sarah rudd got fed up with conventional football statistics. It was easy to count the number of passes a player attempted and completed, but not all passes are equal. “We know the hit percentage is a terrible measure of how good a passer you are,” she told me. What were these actions really worth?
At a Harvard talk that fall, Rudd, who started an analytical blog after talking to Seattle Sounders owner Adrian Hanauer about “Moneyball, But For Football,” presented a better way to value the actions. She divided the game into 37 different states, determined by factors such as location of the ball and defensive organization, and calculated the likelihood that each state would result in a goal or turnover. Anytime a player passed the ball from one state to another, such as dribbling down the wing or playing an assist, he was credited with the change in his team’s scoring chances. Suddenly, a pass wasn’t just complete or incomplete – it had expected goal value.
The concept that Rudd hit on is sometimes referred to as “possession value”, and it’s an important part of football analysis. In recent years, as data has become more widely available, more and more people have started trying to measure the confusing vast majority of the game between shots. We’ve seen a boom in various approaches to the same problem Rudd was curious: how much does an action on the pitch change the likelihood of scoring (and, for some models, conceding)?
“The work Rudd has done really covers all the important parts of the philosophy of ownership value models,” said Javier Fernandez, the former head of sports analysis at FC Barcelona. “The basic question everyone wants to solve is, ‘How do we model the state of the game at all times, and what can we learn about the future reward? “”
Nowadays there is a whole periodic table of different possession value patterns. But why are there so many – and what exactly is each showing us?
The simplest type of possession value model is location-based: the average probability of continuing to score wherever the ball is. Passing or dribbling towards the opponent’s goal usually improves the team’s scoring chances, so players who advance the ball add value with their actions, while players who pass back or flip the ball. ball lose value. This is the idea behind Karun singhpopular expected threat model (xT).
But location isn’t everything. A pass from the center circle to the top of the box can be a valuable through pass that puts an attacker one-on-one with the keeper, or it can be a worthless lob in a crowd of defenders. “With just xy pitches, it’s really hard to say, ‘Is this really a productive pass to make? “Rudd said. For more accurate possession values, you need context on what your teammates and opponents are doing away from the ball.
The problem is, most football data doesn’t tell you what’s going on off the ball. To guess what they can’t see, some models use possession history features, such as how fast the ball has moved up the pitch, as indicators of defensive disorganization. “There might be a slight bias” when using proxies, said Club Brugge data scientist Jan Van Haaren, who helped develop the VAEP model with a machine learning group at KU Leuven. “But I still think it’s better than using no context at all.”
Such machine learning models can take into account more information about the game than just a location. They can tell the difference between a pass and a carry, and they can measure the value of an action like a take-on that doesn’t advance the ball. On the other hand, it is more difficult to interpret why a tree-based model values a situation the way it does. “We knew there would be a lot of possession that we wanted to measure. But we needed a lot of data, ”said Matthias Kullowatz, who designed the goals for American Soccer Analysis added the (g +) model. “If you only have one example of a right-back having the ball at the corner flag, you’re going to get a potentially crappy estimate of the value.”
Oddly enough, one of the hardest parts of owning value is figuring out what “possession” is. Fernández’s peak tracking data EPV model defines it as the window between a kickoff and a goal for either team (or the end of a half). Most other models like g + understand “possession” as the fans do: a streak of ball control by a team that ends in a goal or a turnover. But it’s not always easy to tell which team is in control, and the idea that a team loses any chance to score after a turnaround can produce drastic values that don’t make sense for football. For example, a midfielder whose last pass from the top of the box is blocked may get hammered by the model for throwing up valuable possession, even though his team is still in a good position to retrieve the ball and score.
Nils mackay, who released his own model while still a student, addressed these issues by changing Stats Perform’s PV model window to estimate the probability of scoring on current possession in the next 10 seconds. “That way you never have one event to be the end of something,” he said. “When we did that, we saw that the numbers aligned much more with intuition. Players can still be severely penalized if they lose the ball, but often times it won’t be as severe if there is still value in what they have done.
The equally delicate flip side of the turnover problem is how to assess the moves. You might think that just like a winger increases her team’s scoring chances by 5-30% when passing the ball to an attacker in front of goal, the attacker sends it 100% when she finds the net or 0. % when it is missing. But this approach leads to large and volatile shot values similar to the inconstant difference between goals and expected goals. When the KU Leuven team introduced a version of VAEP that uses xG type shooting values, the number of players became much more stable.
Future possession value models will see more of the game. StatsBomb reworks its OBV model using broadcast stills to capture off-ball positions. Knowing the ball carrier’s environment will not only make possession values more accurate; this will allow types of measurement that event data cannot. Fernández’s tracking data EPV model divides ownership value into component models for finer values and better insight into player decision making. He can even assign a goal value to actions like mock runs that manipulate defense but never touch the ball.
What are all these ownership value models for? In clubs, one of the most valuable applications is recruiting. Liverpool have used possession value to help spot their side which conquered Europe on a modest budget. Tracking data patterns can also offer tactical insights. “If you can tell a player, when the opponent is playing a 4-4-2, both WCs leave space in that particular situation, sometimes that’s enough,” Fernández said. “A coach once told me that if you can only tell a player one thing that can improve their game, they are going to love you forever.”
As for Rudd, her possession value work landed her a job at Arsenal, where she was until recently head of analysis. But it also offered the intrinsic reward of getting a little closer to this impossible sport. “I remember reading a comment that said: ‘We cannot quantify Santi Cazorla’s ability to separate two defenders,” “she said. “And I was like, ‘Oh yeah?’ “
Discover our latest football predictions.