TEKVA
CoursesBlogConsultancy
Learn More
← Back to Blog

Machine Learning in Football

12/14/2025

If you’re an avid football fan that uses X/Twitter, you’ve probably come across accounts like Opta and XGPhilosophy. Both accounts, particularly Opta, provide detailed stats about Premier League clubs and game statistics. I’ll be honest, sometimes I’m taken completely aback when I see some of the stats because I’m just left questioning: “How the hell did you even find that out?”

Exhibit A:

Blog Image

The truth is, Opta has been collecting data for years now, and very smart people that work for them, with the help of machine learning, are able to identify these quite random stats and facts. Everything, and I mean EVERYTHING, that happens on the football field is recorded. Every touch, every possession win/loss, every duel, every corner, every in-swinging corner, every pass—you get my point by now—it’s all recorded and stored.

This data has completely revolutionised the game. Because with this data, clubs are able to make better ‘informed’ decisions. For example, it can help them identify players to recruit, provide tactical insights about how teams play, identify the best methods of attack, and identify where most threat on the pitch comes from.

Decisions may be better informed, but they may still not be the right decision. Making the right decision is heavily reliant on the data being analysed AND used correctly. Liverpool, with the help of Ian Graham, are a prime example of a club who used data correctly to help them become one of the greatest teams English football has ever seen. In this blog, we will go into detail about how Graham’s model helped Liverpool identify players like Salah and Mané to recruit.


The Goal Probability Added Model

Ian Graham was Liverpool's director of research between 2012 and 2023. His model for analysing data has been attributed as a huge reason as to why Liverpool have been so successful recently. He coined the name of the model “Goal Probability Added”...

As stated in the introduction, every single detail about a football game is recorded. This provided data scientists like Graham with tons of data to identify patterns in. What Graham opted to do was use the data to estimate the probability of scoring a goal from a given game situation.

Was that a bit of a mouthful?

Let me try to give an example. Let’s say we have a player in midfield; he has the ball in his own half, and he is in open play. Graham wanted to create a model that would answer the following question: “What is the probability that this midfielder, from this position he is in, will do something that would lead to Liverpool scoring a goal BEFORE they lose possession?”

How does Graham figure this out?

Well, as I mentioned, literally everything that happens on the football field is recorded. So he would analyse the data, and he may find that the data show: “There have been 1,000 times a midfielder has had the ball in open play in his own half. Out of these 1,000 instances, four goals were scored before the team lost possession. Therefore, the probability a team scores from this position (before losing possession) is 4/1,000, or 0.4%.”

When you think about it, there are only so many situations you can be in on a football field: ball in the final third, winger on a flank, corners, free kicks, etc. So Graham could easily run his model for each of these situations—which he would call a “game state”—to give him insight into the probability of scoring a goal from almost any position on the football field.

Okay, so he has all these probabilities—what good is it?

This is where it gets interesting, at least for me. Graham would use these probabilities to help him rate players. For example, let's say the example I mentioned previously was true:

If you have the ball in your own half, there is a 0.4% chance you will score.

Let’s also say that if the ball is on the edge of the opposition's box, the probability you score from there is 1.7%.

If we have a player that has the ball in midfield in his own half and he loses the ball, well then he just reduced the chance of his team scoring by 0.4%, right?

In that case, we will reduce this player's rating.

But what if, from his own half, he made a successful pass to a teammate on the edge of the opposition box? From his own half, it was a 0.4% chance to score, but now from the edge of the box, it’s a 1.7% chance. He just increased the chance of his team scoring by 1.3%!

In that case, we increase the player’s rating.

When Graham first presented the model, according to the model, it rated all the players that we would have all agreed to have been world class at the time (Ronaldo, Messi, Ribéry, Bale, etc.) very highly. It was a fantastic model to base recruitment decisions off, as it could highlight players who added goal threat that may have gone under the radar for many clubs.

Conclusion

This is just one example of how data and machine learning was used to bring huge success to a club. There are plenty of other examples out there, and I would also highly recommend reading Ian Graham’s book, “How to Win the Premier League.” He goes into a lot more depth on different models used and how they work.