The Olympics medals table is notoriously all about countries and national teams. But what if the Olympic medal table was based on the number of medals won by regions, not by countries?
We check this based on the place of birth for all athletes born in one of the European NUTS regions; NUTS - Nomenclature of Territorial Units for Statistics - is a standardised classification of administrative entities defined by the European Union (here’s the geographic dataset for download).
We conduct this analysis for the 2024 Summer Olympics, and we checked these data for the 2020 Summer Olympics. But in principle the procedure can be expanded in order to include all modern Olympics, all over the world.
A map including all known places of birth including also non-European-born athletes, the corresponding dataset, and all code used to generate them are shared in this repository.
This is an updated version of a similar excercise I conducted for the 2020 Summer Olympics.
This whole data retrieval process is based on Wikipedia and Wikidata. Wikidata provides data in a more machine-readable format, but, at least in the past, the Wikipedia page listing all medalists was more complete and was updated much more quickly (there’s a WikiProject Olympics, if you’re looking for inputs on how to contribute). In brief, we get medalists from the Wikipedia page, then retrieve data about their place of birth from Wikidata, then with some geo-matching we attribute each of them to a NUTS region.
We can then combine these data with other statistics about these regions, such as population and economic indicators.
Due to statistical and administrative changes, geographic data about NUTS regions and other indicators such as population or economic activity are not always published at the same time. In particular, the latest NUTS dataset realeased for 2024 covers more non EU-regions in the Balkans, as well as Ukraine, but does not include the UK. In some countries, such as Netherlands and Portugal, administrative boundaries have changed, so population data are not immediately available for all NUTS 2024 regions. In order to maximise coverage and data availability, we include NUTS boundaries from 2021 for the UK, Netherlands and Portugal, as well as NUTS 2024 for all other countries. Population data for some NUTS regions (e.g. in Ukraine and Kosovo) are not available.
It is worth highlighting that each person receiving a medal is counted as a medal. This is very much unlike the official medal table: in the following table a victory, e.g., in a quadruple sculls rowing competition results in four golden medals, not one; the same is true for all other team sports. This is coherent with the principle of counting medals based on place of birth of athletes: ultimately, each of them will bring home a shiny medal each, not, e.g., a quarter of a medal. In terms of ranking, compared to the most established medal table, this favours regions and countries that are strongest in team sports.
Even if the main objective of this initiative is to get data about ongoing and recent summer Olympics, this repository includes data for all Summer Olympics since 1960 which have a dedicated page with all medal winners on Wikipedia (see a list of all Olympics with full medalist pages available on Wikipedia). This second criteria is important. Only starting with 1988 all Olympics are present. The data for missing Olympics seem to be available scattered around Wikipedia (and of course, elsewhere on-line), but until some patient Wikipedia contributor collates this data in a somewhat standardised page, this automatic approach cannot work. Data on medal winners in Wikidata itself are much more incomplete even for recent Olympics events; when they are available, they are used as a quality check to ensure consistency with data parsed from the medal table on Wikipedia.
Warning: data have not been thoroughly checked for accuracy, with the partial exception of 2020 and 2024 Olympics!
This dataset has been generated by Giorgio Comai (OBCT/CCI) within the scope of EDJNet, the European Data Journalism Network. It is released under a CC-BY license (Giorgio Comai/OBCT/EDJNet). Please do include a reference to EDJNet - the European Data Journalism Network if you use these data.