LAU and NUTS

Can they be friends?

Giorgio Comai https://giorgiocomai.eu (OBCT/EDJNet)https://www.europeandatajournalism.eu/
2021-11-16

Combining LAU and NUTS using publicly available datasets is not as straightforward as it may seem, due to small inconsistencies in the data, the cumbersome and internally inconsistent format chosen by GISCO to distribute the concordance tables, and first of all, the puzzling choice not to include reference to NUTS in the datasets with local administrative units to begin with.

There does not seem to be any combination of concordance tables that fully match with the LAU dataset for the relevant year. Mixing and matching between different years for individual countries may eventually bring us closer to the desired outcome, but may lead to some inconsistencies of its own (see some details about the process in the page botched attempts at combining LAU and NUTS)

Find below more details about proposed viable tables that offer full matching with the LAU of the relevant year, i.e. 100% of the LAUs are paired to a NUTS region.

To be clear, the full matching inevitably leads to some artifacts: if a LAU changed its borders between 2016 and 2021 (when NUTS regions were updated), and it’s geographically located across the border, there is no right matching. These issues involve however a relatively small number of LAUs: if your goal is to use this for data visualisations, then it’s probably fine. If you are using statistics from other sources, and you expect totals to add up, then even the odd LAU out of place may be something to be concerned about (likely, part of the reason why the concordance tables are not complete).

This repository includes concordance datasets based on matching by geometry, as well as a combination: based on concordance tables as distributed by Gisco, and falling back to the ones based on geometry for missing LAUs, which are probably the ones you want to use.

LAU 2020 / NUTS 2021, based on geometry

Until Gisco finally distributes a consistent dataset (hopefully LAU 2021/NUTS2021), we decided to calculate belonging to a NUTS region based on the most recently distributed geographic dataset: LAU 2020/NUTS 2021. This may still potentially lead to some inconsistencies for LAUs with recently changed borders along the adminsitrative boundary of some NUTS region, but, all things considered, this is expected to offer an accurate dataset, possibly bar a handful of cases.

The area covered by both datasets (LAU 2020 / NUTS 2021) includes 35 countries: AL, AT, BE, BG, CH, CY, CZ, DE, DK, EE, EL, ES, FI, FR, HR, HU, IE, IS, IT, LI, LT, LU, LV, MK, MT, NL, NO, PL, PT, RO, RS, SE, SI, SK, UK.

How reliable is the matching?

Given some inconsistencies and different resolution we expect LAUs located along a NUTS boundary line to not match exactly with NUTS. We expect a difference up to 10, perhaps 20 percent, to be likely attributable to mismatches in the geo dataset, no actual changes on the ground.

As a consequence, there are probably just a handful of cases where the matching is not perfectly accurate.

Here is a full list of LAUs which, according to available datasets, do not have more than 90% of their area within a given NUTS:

gisco_id nuts_3 area area_share
DE_07333056 DEB3D 1706254.5 0.8986020
RO_154308 RO317 61297860.9 0.8974089
NL_GM0394 NL329 184536245.3 0.8943929
CH_CH9327 CH052 167441.0 0.8940607
EL_45080201 EL303 687968.5 0.8655517
EL_02050206 EL514 3765168.8 0.8515297
FR_39140 FRC22 759160.4 0.8509431
DE_01053075 DEF06 2673362.4 0.8364477
RO_153062 RO317 75527939.8 0.8363154
DE_16072013 DEG0H 80315831.6 0.7433631
NL_GM1952 NL113 213183034.7 0.7230271
NL_GM1961 NL33A 110046868.2 0.7204488
DE_16054000 DEG04 101274434.3 0.7112290
FR_44180 FRG01 125043370.5 0.6576246
FR_50592 FRD12 21616442.2 0.6192706
AT_61061 AT225 29079891.7 0.6074245
DE_16066095 DEG0B 53852351.4 0.5690889

How consistent is the matching?

Do all LAUs have their own NUTS? Here is a complete table of LAUs that are not paired to any NUTS:

gisco_id nuts_3

Yes. Full match. 😍

How different is this from the official concordance tables?

If we only consider those LAUs that are actually present in the official concordance table, how many LAUs would be miscategorised relying on the geometries as described above?

In the case of LAU 2020, NUTS 2021 (with official concordance tables still provincial), only about 100 LAUs are misplaced: one in Greece, one in the Netherlands, all others in France (France has by far the highest number of LAUs of any country in Europe: more than a third of LAUs in Europe are located in France, and changes are frequent).

country total_nuts_2_different total_nuts_3_different total
AT 0 0 2095
BG 0 0 265
CH 0 0 2212
CY 0 0 615
CZ 0 0 6258
DE 0 0 11007
DK 0 0 99
EL 1 1 6135
ES 0 0 8131
FI 0 0 310
FR 108 108 34967
HU 0 0 3155
IE 0 0 105
LI 0 0 11
LT 0 0 60
LU 0 0 102
LV 0 0 119
MT 0 0 68
NL 1 1 355
PL 0 0 2477
PT 0 0 3092
RO 0 0 3181
SE 0 0 290
SI 0 0 212
SK 0 0 2927

The situation is quite similar for LAU 2019, NUTS 2016 (the most recent for which there is a validated concordance table): the records for Albania are different only in format (the official concordance tables have e.g. “AL11” rather than “AL011”). Almost all other miscategorised are in France. .

country total_nuts_2_different total_nuts_3_different total
AL 61 61 61
AT 0 0 2096
BE 0 0 589
BG 0 0 265
CH 0 0 2209
CY 0 0 615
CZ 0 0 6258
DE 0 0 11087
DK 0 0 99
EE 0 0 79
EL 1 1 6134
ES 0 0 8131
FI 0 0 311
FR 108 108 34970
HR 0 0 556
HU 0 0 3155
IE 0 0 105
IS 0 0 72
IT 1 2 7910
LI 0 0 11
LT 0 0 60
LU 0 0 102
LV 0 0 119
MK 0 0 80
MT 0 0 68
NL 1 1 355
PL 0 0 2478
PT 0 0 3092
RO 0 0 3181
SE 0 0 290
SI 0 0 212
SK 0 0 2927
UK 7 7 400

Accessing the dataset

The following datasets generated with this approach (i.e. attributing the LAU to the NUTS region where the largest part of its area is located according to available gemostries) are currently available.

N.B. Unless you are looking for a specific combination of LAU and NUTS, or you really want them matched by geometry, you probably want to download the main datasets, using the official concordance tables, and falls back on these datases only when the relevant data is missing.

Datasets with the surface of each LAU recorded in each NUTS are available in the lau_nuts_area folder. They are likely most useful for pre-caching when processing large amounts of data.