Botched attempts at combining LAU and NUTS

Why is this such a pain?

Giorgio Comai https://giorgiocomai.eu (OBCT/EDJNet)https://www.europeandatajournalism.eu/
2021-11-16

Requirements

This page outlines early and unfinished attempts to verify the full coverage of the concordance tables. Its only purpose is to illustrate how matching LAUs to NUTs is not straightforward, in spite of the fact that concordance tables exist.

LAU 2018

Let’s start from the 2018 LAU dataset distributed by GISCO.

lau_2018_df <- ll_get_lau_eu(year = 2018) %>%
  sf::st_drop_geometry()

The dataset has some inconsistencies and incomplete data.

Looking only at mainland Europe (excluding from the map French overseas territories for clarity), we already notice that Bosnia and (as we’ll see) to some extent Kosovo are not included in the dataset.

ll_get_nuts_eu(year = 2016, level = 0) %>% 
  dplyr::filter(CNTR_CODE %in% unique(lau_2018_df$CNTR_CODE)) %>% 
  ggplot() +
  geom_sf() +
  scale_x_continuous(limits = c(-30, 35)) +
  scale_y_continuous(limits = c(25, NA)) +
  theme_minimal()

There are however other issues.

Missing place names

For some reason, a considerable number of municipalities included in the dataset do not have the name included:

ll_get_lau_eu(year = 2018) %>% 
    sf::st_drop_geometry() %>% 
  dplyr::group_by(CNTR_CODE) %>% 
  dplyr::add_count(name = "total_lau_per_country") %>% 
  filter(is.na(LAU_NAME)) %>% 
  dplyr::group_by(CNTR_CODE, total_lau_per_country) %>% 
  dplyr::count(name = "missing_lau_name_per_country") %>% 
  dplyr::ungroup() %>% 
  dplyr::mutate(missing_share = missing_lau_name_per_country/total_lau_per_country)
# A tibble: 5 × 4
  CNTR_CODE total_lau_per_country missing_lau_name_per_… missing_share
  <chr>                     <int>                  <int>         <dbl>
1 CH                         2266                     48        0.0212
2 ME                            3                      3        1     
3 NO                          422                    422        1     
4 SI                          212                     99        0.467 
5 XK                           39                     39        1     

So all LAU names are missing for Montenegro, Norway, and Kosovo. Almost half of them are missing for Slovenia. About two per cent are missing in Switzerland.

The total number of LAUs in Montenegro is suspicioulsy low, so we will have to check that as well.

Switzerland: missing place names

missing_ch_df <- ll_get_lau_eu(year = 2018) %>% 
  sf::st_drop_geometry() %>% 
  dplyr::filter(CNTR_CODE == "CH") %>% 
  dplyr::filter(is.na(LAU_NAME))

In Switzerland there are 48 municipalities with missing name. They are apparently overwhelmingly from mountain and/or border locations.

ggplot() +
    geom_sf(data = ll_get_nuts_eu(year = 2016,
               level = 0,
               resolution = 1) %>% 
  dplyr::filter(CNTR_CODE == "CH")) +
  geom_sf(data = ll_get_lau_eu(year = 2018) %>% 
  dplyr::filter(CNTR_CODE == "CH") %>% 
  dplyr::filter(is.na(LAU_NAME)==FALSE), fill = "lightgreen") +
  geom_sf(data = ll_get_lau_eu(year = 2018) %>% 
  dplyr::filter(CNTR_CODE == "CH") %>% 
  dplyr::filter(is.na(LAU_NAME)), fill = "pink") +
  theme_minimal()

The concordance tables for 2018 are of no help.

missing_ch_df %>% 
  dplyr::left_join(y = ll_get_lau_nuts_concordance(lau_year = 2018) %>% 
  dplyr::filter(country == "CH") %>% 
    dplyr::rename(GISCO_ID = gisco_id),
  by = "GISCO_ID")
# A tibble: 48 × 15
   GISCO_ID  CNTR_CODE LAU_ID LAU_NAME POP_2018 POP_DENS_2 AREA_KM2
   <chr>     <chr>     <chr>  <chr>       <int>      <dbl>    <dbl>
 1 CH_CH6417 CH        CH6417 <NA>         8964       209.   42.9  
 2 CH_CH9040 CH        CH9040 <NA>           NA        NA     8.03 
 3 CH_CH9051 CH        CH9051 <NA>           NA        NA    55.7  
 4 CH_CH9052 CH        CH9052 <NA>           NA        NA    17.0  
 5 CH_CH9053 CH        CH9053 <NA>           NA        NA    10.4  
 6 CH_CH9073 CH        CH9073 <NA>           NA        NA    45.9  
 7 CH_CH9089 CH        CH9089 <NA>           NA        NA    28.4  
 8 CH_CH9149 CH        CH9149 <NA>           NA        NA    37.7  
 9 CH_CH9150 CH        CH9150 <NA>           NA        NA     0.401
10 CH_CH9152 CH        CH9152 <NA>           NA        NA     2.12 
# … with 38 more rows, and 8 more variables: YEAR <int>, FID <chr>,
#   country <chr>, nuts_2 <chr>, nuts_3 <chr>, lau_id <chr>,
#   lau_name_national <chr>, lau_name_latin <chr>
ggplot() +
    geom_sf(data = ll_get_nuts_eu(year = 2016,
               level = 0,
               resolution = 1) %>% 
  dplyr::filter(CNTR_CODE == "CH")) +
  geom_sf(data = ll_get_lau_eu(year = 2019) %>% 
  dplyr::filter(CNTR_CODE == "CH") %>% 
  dplyr::filter(is.na(LAU_NAME)==FALSE), fill = "lightgreen") +
  geom_sf(data = ll_get_lau_eu(year = 2018) %>% 
  dplyr::filter(CNTR_CODE == "CH") %>% 
  dplyr::filter(is.na(LAU_NAME)), fill = "pink") +
  theme_minimal()

Norway: missing place names

The dataset for 2018 does not include the name of municipalities in Norway. So we have their boundaries, but not their name. Unfortunately, they are also not included in the relevant LAU/NUTS concordance tables for 2018.

no_2018_df <- ll_get_lau_eu(year = 2018) %>% 
  sf::st_drop_geometry() %>% 
  filter(CNTR_CODE=="NO") %>% 
  dplyr::select(GISCO_ID, CNTR_CODE, LAU_ID, LAU_NAME) %>% 
  dplyr::arrange(GISCO_ID)

no_2018_df
# A tibble: 422 × 4
   GISCO_ID CNTR_CODE LAU_ID LAU_NAME
   <chr>    <chr>     <chr>  <chr>   
 1 NO_0101  NO        0101   <NA>    
 2 NO_0104  NO        0104   <NA>    
 3 NO_0105  NO        0105   <NA>    
 4 NO_0106  NO        0106   <NA>    
 5 NO_0111  NO        0111   <NA>    
 6 NO_0118  NO        0118   <NA>    
 7 NO_0119  NO        0119   <NA>    
 8 NO_0121  NO        0121   <NA>    
 9 NO_0122  NO        0122   <NA>    
10 NO_0123  NO        0123   <NA>    
# … with 412 more rows
ll_get_lau_nuts_concordance(lau_year = 2018) %>% 
  dplyr::filter(country=="NO")
# A tibble: 0 × 7
# … with 7 variables: country <chr>, nuts_2 <chr>, nuts_3 <chr>,
#   lau_id <chr>, gisco_id <chr>, lau_name_national <chr>,
#   lau_name_latin <chr>
The names are however included in the 2019 dataset. Bar for one municipality.
lau_no_with_names <- ll_get_lau_eu(year = 2018) %>% 
  sf::st_drop_geometry() %>% 
  filter(CNTR_CODE=="NO") %>% 
  dplyr::select(-LAU_NAME) %>% 
  dplyr::left_join(y = ll_get_lau_eu(year = 2019) %>% 
                     sf::st_drop_geometry() %>% 
                     dplyr::filter(CNTR_CODE=="NO") %>% 
                     dplyr::select(GISCO_ID, LAU_NAME),
                   by = "GISCO_ID")

lau_no_with_names %>% 
  dplyr::filter(is.na(LAU_NAME))
# A tibble: 1 × 9
  GISCO_ID CNTR_CODE LAU_ID POP_2018 POP_DENS_2 AREA_KM2  YEAR FID    
  <chr>    <chr>     <chr>     <int>      <dbl>    <dbl> <int> <chr>  
1 NO_1567  NO        1567       2026       3.21     632.  2017 NO_1567
# … with 1 more variable: LAU_NAME <chr>

But we’re lucky enough to find the name of that municipality in the 2017 dataset:

lau_no_with_names$LAU_NAME[lau_no_with_names$GISCO_ID=="NO_1567"] <- 
  ll_get_lau_eu(gisco_id = "NO_1567", year = 2017) %>% 
  sf::st_drop_geometry() %>% 
  dplyr::pull(LAU_NAME)

Is there more?

LAU 2019

Keeping the LAU for 2019 as point of reference has the advantage of being available to rely on validated concordance tables between LAU and NUTS, not yet available for 2020 as of this writing in October 2021.

Let’s start from the 2019 LAU dataset distributed by GISCO.

lau_2019_df <- ll_get_lau_eu(year = 2019) %>%
  sf::st_drop_geometry()

Looking only at mainland Europe (excluding from the map French overseas territories for clarity), we notice that part of the Western Balkans is missing (namely, Bosnia Hercegovina, Montenegro, Kosovo).

ll_get_nuts_eu(year = 2016, level = 0) %>% 
  dplyr::filter(CNTR_CODE %in% unique(lau_2019_df$CNTR_CODE)) %>% 
  ggplot() +
  geom_sf() +
  scale_x_continuous(limits = c(-30, 35)) +
  scale_y_continuous(limits = c(25, NA)) +
  theme_minimal()

missing_df <- ll_get_lau_eu(year = 2019, silent = TRUE) %>% 
  sf::st_drop_geometry() %>% 
  dplyr::transmute(gisco_id = GISCO_ID) %>% 
  dplyr::left_join(y = ll_get_lau_nuts_concordance(lau_year = 2019,
                            nuts_year = 2016) %>% 
                     dplyr::rename(lau_name = lau_name_national),
                   by = "gisco_id") %>% 
  dplyr::filter(is.na(lau_name))

It appears that 1 130 LAUs are missing from the official concordance tables for 2019.