Why is this such a pain?
This page outlines early and unfinished attempts to verify the full coverage of the concordance tables. Its only purpose is to illustrate how matching LAUs to NUTs is not straightforward, in spite of the fact that concordance tables exist.
Let’s start from the 2018 LAU dataset distributed by GISCO.
lau_2018_df <- ll_get_lau_eu(year = 2018) %>%
sf::st_drop_geometry()
The dataset has some inconsistencies and incomplete data.
Looking only at mainland Europe (excluding from the map French overseas territories for clarity), we already notice that Bosnia and (as we’ll see) to some extent Kosovo are not included in the dataset.
ll_get_nuts_eu(year = 2016, level = 0) %>%
dplyr::filter(CNTR_CODE %in% unique(lau_2018_df$CNTR_CODE)) %>%
ggplot() +
geom_sf() +
scale_x_continuous(limits = c(-30, 35)) +
scale_y_continuous(limits = c(25, NA)) +
theme_minimal()
There are however other issues.
For some reason, a considerable number of municipalities included in the dataset do not have the name included:
ll_get_lau_eu(year = 2018) %>%
sf::st_drop_geometry() %>%
dplyr::group_by(CNTR_CODE) %>%
dplyr::add_count(name = "total_lau_per_country") %>%
filter(is.na(LAU_NAME)) %>%
dplyr::group_by(CNTR_CODE, total_lau_per_country) %>%
dplyr::count(name = "missing_lau_name_per_country") %>%
dplyr::ungroup() %>%
dplyr::mutate(missing_share = missing_lau_name_per_country/total_lau_per_country)
# A tibble: 5 × 4
CNTR_CODE total_lau_per_country missing_lau_name_per_… missing_share
<chr> <int> <int> <dbl>
1 CH 2266 48 0.0212
2 ME 3 3 1
3 NO 422 422 1
4 SI 212 99 0.467
5 XK 39 39 1
So all LAU names are missing for Montenegro, Norway, and Kosovo. Almost half of them are missing for Slovenia. About two per cent are missing in Switzerland.
The total number of LAUs in Montenegro is suspicioulsy low, so we will have to check that as well.
missing_ch_df <- ll_get_lau_eu(year = 2018) %>%
sf::st_drop_geometry() %>%
dplyr::filter(CNTR_CODE == "CH") %>%
dplyr::filter(is.na(LAU_NAME))
In Switzerland there are 48 municipalities with missing name. They are apparently overwhelmingly from mountain and/or border locations.
ggplot() +
geom_sf(data = ll_get_nuts_eu(year = 2016,
level = 0,
resolution = 1) %>%
dplyr::filter(CNTR_CODE == "CH")) +
geom_sf(data = ll_get_lau_eu(year = 2018) %>%
dplyr::filter(CNTR_CODE == "CH") %>%
dplyr::filter(is.na(LAU_NAME)==FALSE), fill = "lightgreen") +
geom_sf(data = ll_get_lau_eu(year = 2018) %>%
dplyr::filter(CNTR_CODE == "CH") %>%
dplyr::filter(is.na(LAU_NAME)), fill = "pink") +
theme_minimal()
The concordance tables for 2018 are of no help.
missing_ch_df %>%
dplyr::left_join(y = ll_get_lau_nuts_concordance(lau_year = 2018) %>%
dplyr::filter(country == "CH") %>%
dplyr::rename(GISCO_ID = gisco_id),
by = "GISCO_ID")
# A tibble: 48 × 15
GISCO_ID CNTR_CODE LAU_ID LAU_NAME POP_2018 POP_DENS_2 AREA_KM2
<chr> <chr> <chr> <chr> <int> <dbl> <dbl>
1 CH_CH6417 CH CH6417 <NA> 8964 209. 42.9
2 CH_CH9040 CH CH9040 <NA> NA NA 8.03
3 CH_CH9051 CH CH9051 <NA> NA NA 55.7
4 CH_CH9052 CH CH9052 <NA> NA NA 17.0
5 CH_CH9053 CH CH9053 <NA> NA NA 10.4
6 CH_CH9073 CH CH9073 <NA> NA NA 45.9
7 CH_CH9089 CH CH9089 <NA> NA NA 28.4
8 CH_CH9149 CH CH9149 <NA> NA NA 37.7
9 CH_CH9150 CH CH9150 <NA> NA NA 0.401
10 CH_CH9152 CH CH9152 <NA> NA NA 2.12
# … with 38 more rows, and 8 more variables: YEAR <int>, FID <chr>,
# country <chr>, nuts_2 <chr>, nuts_3 <chr>, lau_id <chr>,
# lau_name_national <chr>, lau_name_latin <chr>
ggplot() +
geom_sf(data = ll_get_nuts_eu(year = 2016,
level = 0,
resolution = 1) %>%
dplyr::filter(CNTR_CODE == "CH")) +
geom_sf(data = ll_get_lau_eu(year = 2019) %>%
dplyr::filter(CNTR_CODE == "CH") %>%
dplyr::filter(is.na(LAU_NAME)==FALSE), fill = "lightgreen") +
geom_sf(data = ll_get_lau_eu(year = 2018) %>%
dplyr::filter(CNTR_CODE == "CH") %>%
dplyr::filter(is.na(LAU_NAME)), fill = "pink") +
theme_minimal()
The dataset for 2018 does not include the name of municipalities in Norway. So we have their boundaries, but not their name. Unfortunately, they are also not included in the relevant LAU/NUTS concordance tables for 2018.
no_2018_df <- ll_get_lau_eu(year = 2018) %>%
sf::st_drop_geometry() %>%
filter(CNTR_CODE=="NO") %>%
dplyr::select(GISCO_ID, CNTR_CODE, LAU_ID, LAU_NAME) %>%
dplyr::arrange(GISCO_ID)
no_2018_df
# A tibble: 422 × 4
GISCO_ID CNTR_CODE LAU_ID LAU_NAME
<chr> <chr> <chr> <chr>
1 NO_0101 NO 0101 <NA>
2 NO_0104 NO 0104 <NA>
3 NO_0105 NO 0105 <NA>
4 NO_0106 NO 0106 <NA>
5 NO_0111 NO 0111 <NA>
6 NO_0118 NO 0118 <NA>
7 NO_0119 NO 0119 <NA>
8 NO_0121 NO 0121 <NA>
9 NO_0122 NO 0122 <NA>
10 NO_0123 NO 0123 <NA>
# … with 412 more rows
ll_get_lau_nuts_concordance(lau_year = 2018) %>%
dplyr::filter(country=="NO")
# A tibble: 0 × 7
# … with 7 variables: country <chr>, nuts_2 <chr>, nuts_3 <chr>,
# lau_id <chr>, gisco_id <chr>, lau_name_national <chr>,
# lau_name_latin <chr>
lau_no_with_names <- ll_get_lau_eu(year = 2018) %>%
sf::st_drop_geometry() %>%
filter(CNTR_CODE=="NO") %>%
dplyr::select(-LAU_NAME) %>%
dplyr::left_join(y = ll_get_lau_eu(year = 2019) %>%
sf::st_drop_geometry() %>%
dplyr::filter(CNTR_CODE=="NO") %>%
dplyr::select(GISCO_ID, LAU_NAME),
by = "GISCO_ID")
lau_no_with_names %>%
dplyr::filter(is.na(LAU_NAME))
# A tibble: 1 × 9
GISCO_ID CNTR_CODE LAU_ID POP_2018 POP_DENS_2 AREA_KM2 YEAR FID
<chr> <chr> <chr> <int> <dbl> <dbl> <int> <chr>
1 NO_1567 NO 1567 2026 3.21 632. 2017 NO_1567
# … with 1 more variable: LAU_NAME <chr>
But we’re lucky enough to find the name of that municipality in the 2017 dataset:
lau_no_with_names$LAU_NAME[lau_no_with_names$GISCO_ID=="NO_1567"] <-
ll_get_lau_eu(gisco_id = "NO_1567", year = 2017) %>%
sf::st_drop_geometry() %>%
dplyr::pull(LAU_NAME)
Is there more?
…
Keeping the LAU for 2019 as point of reference has the advantage of being available to rely on validated concordance tables between LAU and NUTS, not yet available for 2020 as of this writing in October 2021.
Let’s start from the 2019 LAU dataset distributed by GISCO.
lau_2019_df <- ll_get_lau_eu(year = 2019) %>%
sf::st_drop_geometry()
Looking only at mainland Europe (excluding from the map French overseas territories for clarity), we notice that part of the Western Balkans is missing (namely, Bosnia Hercegovina, Montenegro, Kosovo).
ll_get_nuts_eu(year = 2016, level = 0) %>%
dplyr::filter(CNTR_CODE %in% unique(lau_2019_df$CNTR_CODE)) %>%
ggplot() +
geom_sf() +
scale_x_continuous(limits = c(-30, 35)) +
scale_y_continuous(limits = c(25, NA)) +
theme_minimal()
missing_df <- ll_get_lau_eu(year = 2019, silent = TRUE) %>%
sf::st_drop_geometry() %>%
dplyr::transmute(gisco_id = GISCO_ID) %>%
dplyr::left_join(y = ll_get_lau_nuts_concordance(lau_year = 2019,
nuts_year = 2016) %>%
dplyr::rename(lau_name = lau_name_national),
by = "gisco_id") %>%
dplyr::filter(is.na(lau_name))
It appears that 1 130 LAUs are missing from the official concordance tables for 2019.
…