EURO 2020: Will athletes’ early or late specialisation tell us who will win?
Late Specialisation as a concept
David Epstein, in his book Range: How Generalists Triumph in a Specialised World, starts his narrative with two different stories from the sports world. The one of Tiger Woods, the famous golf player, who at the tender age of 4 was already practising around eight hours per day and the one of Roger Federer, the famous tennis player, who commited to tennis professionaly no earlier than his late teens, after having sampled a very wide array of sports.
As David Epstein states,
“Elite athletes devote less time early on to deliberate practice the activity in which they will eventually become experts”
The concept of late specialisation is becoming more popular and according to the study Late specialization: the key to success in centimeters, grams, orseconds (cgs) sports:
“ The results clearly reveal that elite athletes specialized at a later age and trained less in childhood. However, elite athletes were shown to intensify their training regime during late adolescence more than their near-elite peers.”
EURO 2020 and my curiosity
As I am reading this book, UEFA Euro 2020 is being held and the fans are preparing to watch the quarter-final matches. Inevitably, it is rather tempting to me to investigate the careers of the athletes of the remaining eight teams;
Is the career starting age of the players enough to predict the winner of the competition? Is there any correlation between the presence of “late specialisation” of a team and its performance?
I’m a Data Scientist, so collecting and analysing data is the first thing that comes to mind when it comes to answering a question. Thus, I collected data from Wikipedia for each one of the players of the 8 remaining teams.
- Czech Republic
I used Python and the wikipedia API for the data retrieval. You can find the GitHub repository here. The data used for this project is also here.
For each player in Wikipedia, we get a very nice table that contains information about the athlete’s youth and senior career.
Note: The data collection and the project has been developed in a very short time, so there might be some data inconcistencies. This is not by any means a thorough research work, much like an arbitrary attempt to investigate the topic.
For each player, I have collected the following data points:
- Whether they had any youth career at all
- Age their youth career started
- Age their senior career started
Note that sometimes although we have information that a player had some youth career, their is no date provided as to when this started, so the age might be missing in that case.
A dive into the data
We have approximately the same number of players per team. There is a slight variation due to the unequal number of athletes but also the inability to retrieve data for one reason or the other.
What is the percentage of players per team that didn’t have a youth career?
Czech Republic has the highest number of players with no youth career, followed by Switzerland, Belgium and Denmark. We observe that the all players of the rest of the teams (England, Italy, Spain, Ukraine) had pursued some youth career.
Let’s investigate now how the age that players started their youth career varies per team. Below, I provide two different visualisations of the same data, a distribution of the youth career starting age per team, and a box plot of the ages per team.
In addition to this, let’s report the median starting age of the youth career per team:
| team | Median age youth career started |
| Belgium | 5.5 |
| Denmark | 7 |
| England | 7 |
| Spain | 8 |
| Switzerland | 8 |
| Italy | 9 |
| Czech Republic | 11 |
| Ukraine | 13 |
What do these plots and numbers tell us?
- Ukraine is the only team that has a significantly higher starting age of the youth career. Most of the players start their youth career around 12 or 13 years old with only a couple of exceptions.
- Czech Republic has very big range of starting ages, from the young age of 6 all the way to 14.
- Belgium is the team with the players that have the earliest start of their youth career. The reported median age is 5.5 years old.
- England and Switzerland are very similar, with players starting age concentrating around 7 or 8 for most of the cases.
- Denmark, Spain and Italy follow a wider span of starting ages (around 5 years) but slightly shifted for each case with Danish, Spanish and Italian youth careers starting around 7, 8, and 9 years old respectively.
Next, let’s explore how the age that players started their senior career varies per team. Again, I provide two different visualisations of the same data, a distribution of the senior career starting age per team, and a box plot of the ages per team.
- Most of the teams have very similar age profile for the senior career, with 17 or 18 being the most popular age to start it.
- Ukraine is slightly skewed towards the younger ages, while Czech Republic towards the older ones.
Let’s also look at the distribution of the ages of the players per team. Not entirely convinced that current age actually plays any significant role on this, but reporting out of curiosity.
In addition to this, let’s also report the median age.
| team | Median Age |
| England | 25.0 |
| Ukraine | 25.0 |
| Spain | 26.0 |
| Italy | 26.5 |
| Switzerland | 27.0 |
| Czech Republic | 28.0 |
| Denmark | 28.0 |
| Belgium | 29.0 |
Given the new data:
- Belgium is the team with the oldest players, and possibly most experiences players, while England is the one with the youngest ones.
- Spain, Switzerland, and Italy players have very similar age profile, with a median age of 26–27 and most of the players between 24 and 28, with Spain possibly having slightly younger athletes.
- Ukraine is the team with the most varied ages; players are from 18 to 37 years old, although most of them again are between 24 and 30.
- Czech Republic and Denmark show very consistent view of ages, with a median age of 28 and most of players between 25 and 29.
Which team will be the winner?
So, based on this very basic data points, which team has the most possibilities to win?
Overall, if we summarise our observations, we have the following:
- 11.5% of Czech Republic’s players didn’t pursue a youth career. Switzerland, Belgium and Denmark follow with 8.3%, 8% and 4%.
- Ukraine’s players start their youth career late in comparison to the rest of the teams, around 12 or 13 years old with only a couple of exceptions. Czech Republic also shows a late start (~11) but has a bigger range of starting ages (6–14). Belgium is the team with the players that have the earliest start of their youth career, reporting a median age of 5.5.
- In contrast with the youth career start age, for the senior career one, Ukraine shows a skew towards the younger ages this time (16–17), while Czech Republic towards the older ones (18–19). The rest of the teams are pretty much the same with ages concentrating around 17–18.
If our assumption that late specialisation and late adolesence (18–21) intensified training is the key to success (and not starting practising from a very early age), Czech Republic appear to be the best candidate for the win.
Czech Republic is the team with the highest percentage of players that didn’t have any youth career at all, holds the second older median starting age for the youth career and the oldest median starting age of the senior career (18–19), which apparently consists the age of the late specialisation.
Is it going to be the winner of EURO 2020?
Good luck, Czech Republic!
Please feel free to make any comment and/or hit clap in case you found this a good idea.
Follow me in Twitter Christina Boididou.