From Basketball to Football: Spatial Structure of Man-Marking using Tracking Data
It might seem that I’m incredibly productive with many articles coming out of my pen — or fingers if you will- these last few months, but they all were quite on the surface of what I wanted to achieve. In the background I’ve been looking into this for the better part of 2 years: how can we effectively take data concepts or analysis from other sports and apply them to the beautiful game we call football?
In this article, I will take 2 concepts from Basketball data analytics, explore their theoretical framework and apply them to football data analytics. This is quite a wild undertaking with much room for errors and challenges, which is research in itself. My aim is not to have the perfect, waterproof, airtight analysis, but to further explore how we can learn from data analytics in other sports to enhance our own way of looking at data in football.
Contents
- Why this article?
- Data explanation and sources
- Introducing the topic: Defensive structure of man-marking
- Existing research in basketball
- Converting it into football data
- Metric I: Average attention drawn
- Metric II: Defensive entropy
- Challenges
- Final thoughts
- Sources
Why this article?
It’s safe to say that I’m a bit obsessed with off-ball data or out-of-possession data. Like I have said before, football is based on goals and the entertaining part of the game in the eyes of broadcasters and fans is often the scoring of goals. While I understand the sentiment behind it, I would love it if there was some more balance in the analytics space. Defending is a big part of the game and is reflected in tactics, but the next step is to have more defensive-minded and out-of-possession data.
In basketball, players need to be able to attack and defend which has always interested me. I would like to know if we can convert or transfer data analysis from basketball to football to see where we can learn and gain an edge in terms of defence. We often speak about man-marking and zonal marking in football, but we have next to no data on this. In basketball, they call it guarding, but they more data available on the guarding of players and whether 1 player or 2 players guard a player. That’s why I wanted to see if the latter can be applied to football.
Data explanation and sources
For this specific research, I’m not going to use my regular data providers such as Opta, StatsPerform, Statsbomb or Wyscout. I’m using a free data set from Metrica Sports that allows me to use tracking data. You can find it here: https://github.com/metrica-sports/sample-data/tree/master/data/Sample_Game_1
This dataset is completely anonymised, so we don’t know which game it or details about the players. However, it gives us a good insight in how tracking data works, how it can be utilised and gives us a platform to contintue to build our research on.
Introducing the topic: Defensive structure of man-marking
Before we look at what has been written about the spatial structure in basketball, I want to nail down the definition of man-marking. In football,, we often have zonal marking, but to make the fields level — in basketball, we almost never see pure zonal marking- we look at man-marking.
Man marking in football is a defensive strategy where each defender is assigned to closely follow and mark a specific opponent player throughout the game. The goal is to restrict the marked player’s movement, limit their influence on the game, and reduce their opportunities to receive the ball or make effective plays.
Existing research in basketball
We seek to fill a void in basketball analytics by pinging the first quantitative characterization of man-to-man defensive effectiveness in different regions of the court. To this end, we propose a model which explains shot selection (who shoots and where) as well as the expected outcome of the shots. We term these quantities shot frequency and efficiency, respectively; see National Basketball Association (2014) for a glossary for other basketball terms used throughout the paper. Despite the abundance of data, critical information for determining these defensive habits is unavailable. And, most importantly, the defensive matchups are unknown. While it is often clear to the human observer who is guarding whom, such information is absent from the data.
While in theory, we could use crowd-sourcing to learn who is guarding who
notating the data set is a subjective and labor-intensive task. Second, in o
provide meaningful spatial summaries of player ability, we must define the
court regions in a data-driven way. Thus, before we can begin modeling de
ability, we devise methods to learn these features from the available data.
Our results reveal other details of play that are not readily apparent. (Characterizing the spatial structure of the defensive skill in professional basketball, Alexander Franks, Andrew Miller, Luke Bornn, Kirk GoldsberryThe Annals of Applied Statistics, Vol. 9, №1 (March 2015), pp. 94–121 (28 pages)
This research is the theoretical framework for what I seek to do. They found a data-driven way to measure man-marking in terms of time, shot efficiency and shot frequency in professional basketball in the NBA. The research is from 2015, but its value is still high as it has been conducted by scientists from Harvard Statistical Department.
Converting into football data
It might seem quite abstract right now and rightly so. Let’s make it into something tangible. What do we need to make it work for football? We need the following:
- Tracking data: tracking the locations and movements of both offensive and defensive players
- Shot data: shot frequency and expected goals numbers per player and team
- Time: minutes played, games played, possessions
- Event data: XY-data. These are also from the shot data, but we need more in that data frame, which I will touch upon a bit later.
Metric I: Average attention drawn
The first metric I want to talk about is the average attention drawn. What does this mean? It is the average attention a player receives from all defensive players at the point in time. We only focus on when the player is in the attacking half of the pitch because otherwise,the intention will be too broad.
We can calculate it as follows: tthe otal amount of time guarded by each defender divided by the total amount of playing time.
Here is the first difficult thing. The difficulty of this metric lies in the following fact when we use tracking data in different sports, it gives different results. However, if you want to transfer basketball tracking data into football, we need to understand and visualize what it means.
The first big challenge is that we have players in basketball that have to do a total percentage of offense and percentage of defense, which means that if one team is attacking the other team the fact is that all 10 players are in the same half of the game. This is not the same when we deal with football because, in football, we hardly ever have 11 players against 11 players in one specific half. This means it’s more difficult to track data. We are tracking data or video footage to establish if a player is man-marking. That’s the first challenge we need to solve and after that, we needto find a solution to the fact that we can measure double-marking in football via this data.
Officially this means that we need to make some alterations to our analysis by man-marking. In football, we look at the distances of the defending player to the attacking player to establish man-marking in this metric.
For example, player A defending an attacking player shorter than five meters or within 2 meters will be registered as a man-marking event. If not, it’s not marking. I’m well aware that in terms of football we also usually have the zonal marking or hybrid marking, which is a combination. I will leave this out of the question for this part of my research because I’m purely looking at how we can transfer basketball data analytics to football data analytics and that’s why I have chosen this approach.
The first step is to make the tracking data into visuals so we can visually see where the players are situated or positioned on the pitch at specific times in the game. Here you can see a set piece goal by the HomeTeam, who play in red. Blue is the away team and they are defending.
What follows is that we pick out a certain player that will defend/mark an attacking player to see how much time they are spending marking that player. By looking at that we can find the average attention drawn; this signifies the threat or danger a play radiates by how closely they are marked.
If we look at the home team, we see the total attention drawn by player. This means that Player9 has the most attention drawn by the away team and is marked 35,79% of the time that he was on the pitch.
When we look at the away team, we see the total attention drawn by the player. This means that Player24 has the most attention drawn by the away team and is marked 22,5% of the time that he was on the pitch.
What we can conclude from this data is that the home team has a very dangerous or threat-imposing player in Player9, but the rest of the players on both the home and away sides, are evenly divided. In the perception of the away team, Player9 is one that needs more attention.
Metric II: Defensive entropy
So let’s take that Player9, because the data leads us to believe that he is a very important, dangerous and threat-imposing player. Maybe this player beats his direct opponent every time in a 1v1 and he needs to be double-marked. How can see if that’s the case? We can illustrate that with defensive entropy.
Defensive entropy measures the uncertainty with whom a defender or defensive player is associated throughout the opposition’s possession. In other words: who is guarding who? This might be useful as it illustrates how active a defensive player is on the pitch. If the player only focuses on one specific attacking player their defensive entropy is 0. If they divide their focus equally between multiple attacking players, their defensive entropy is 1. By averaging all defensive players’ defensive entropy we get an idea of tendencies: do players double-mark a high-threat attacker or switch places with other defensive players?
Before we get there, we need to figure out how to calculate it. We can do it via the following formula:
In this formula, Zn (j, k) is the fraction of the time where defensive player j marks attacking player k. This gives us a few results.
In the visual above you can see how the players score on defensive entropy. Player11 scores the highest, but that’s the goalkeeper so we have to take him out of the results. What we can see is that most players have the tendency to rather mark 1 player than they are to mark more players or switch.
The same goes for the away team. Player24 scores the highest, but that’s the goalkeeper so we have to take him out of the results. What we can see is that most players have the tendency to rather mark 1 player than they are to mark more players or switch.
When we look at the averages for the whole team, we can see that the home side has a defensive entropy of 0,31 and the away side has a defensive entropy of 0,32. These numbers are very close to each other, but it says that the away side is slightly more inclined to double-marking or defensive switches than the home side is.
Challenges
There are two challenges that I faced and need to have a closer look at:
- I have looked out of possession moments in the game. However, that doesn’t mean that it’s completely representative. There is a difference in marking a player on the ball, so literally having the ball, and marking a player that plays on a team with possession. Another one is to look at the marking when the defensive player’s team has the ball but is still marking the opponent.
- The defensive entropy comes from basketball, but they chose to focus on 1 or 2 players. In football, it often happens that players mark more players throughout the game. This also means I have to reevaluate how I choose in the data what marking means.
Final thoughts
Defensive entropy measures a player’s defensive versatility, indicating how effectively they disrupt offensive play by marking multiple players or reacting to various threats. A higher score suggests greater engagement and adaptability. Average attention drawn reflects how much focus a defender places on opposing players, with higher values showing more involvement in defensive actions. Together, these metrics reveal a player’s defensive workload: high entropy and attention drawn suggest active engagement but can lead to overcommitment, while balanced values indicate effective positioning. Understanding these metrics helps teams optimise defensive strategies, ensuring players are engaged without being overwhelmed.
In the follow up article, we are going to look at what these man-marking tendencies mean for the quality and quantity of shots: how does it impact that? Stay tuned for 2025!
Sources
- Characterizing Spatial structure in defensive skills in professional basketball: https://www.jstor.org/stable/24522412
- Metrica sports tracking data: https://github.com/metrica-sports/sample-data/tree/master/data/Sample_Game_1