Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ export(create_project)
export(fetch_countries)
export(fetch_lads)
export(fetch_las)
export(fetch_lsip)
export(fetch_mayoral)
export(fetch_pcons)
export(fetch_regions)
Expand Down
26 changes: 26 additions & 0 deletions R/datasets_documentation.R
Original file line number Diff line number Diff line change
Expand Up @@ -198,3 +198,29 @@
#' https://get-information-schools.service.gov.uk/Guidance/LaNameCodes and
#' https://tinyurl.com/EESScreenerLAs
"wd_pcon_lad_la_rgn_ctry"

#' Local Skills Improvement Plan (LSIP) areas to
#' Local Authority District (LAD) Lookup
#'
#' A lookup table mapping Local Skills Improvement Plan (LSIP)
#' areas to Local Authority Districts (LADs) in England. This dataset provides
#' a mapping between LSIP areas and LADs as provided by
#' the ONS Geography Portal.
#'
#' @details
#' - Each LAD is assigned to a single LSIP area.
#' - Mappings may change over time and can be tracked using the
#' `most_recent_year_included` and `first_available_year_included`
#' columns.
Comment thread
mzayeddfe marked this conversation as resolved.
#' @format ## `lsip_lad`
#' A data frame with one row per LAD in England and the following columns:
Comment thread
mzayeddfe marked this conversation as resolved.
#' \describe{
#' \item{lad_code}{9-character code for the Local Authority District}
#' \item{lad_name}{Name of the Local Authority District}
#' \item{lsip_code}{9-character code for the LSIP area}
#' \item{lsip_name}{Name of the Local Skills Improvement Plan (LSIP) area}
#' \item{most_recent_year_included}{The most recent year in which this location appears in the lookup}

Check warning on line 222 in R/datasets_documentation.R

View workflow job for this annotation

GitHub Actions / lint

file=R/datasets_documentation.R,line=222,col=81,[line_length_linter] Lines should not be more than 80 characters. This line is 104 characters.
#' \item{first_available_year_included}{The first year in which this location appears in the lookup}

Check warning on line 223 in R/datasets_documentation.R

View workflow job for this annotation

GitHub Actions / lint

file=R/datasets_documentation.R,line=223,col=81,[line_length_linter] Lines should not be more than 80 characters. This line is 102 characters.
#' }
#' @source https://geoportal.statistics.gov.uk/search?q=lad%20lsip
"lsip_lad"
78 changes: 78 additions & 0 deletions R/datasets_utils.R
Original file line number Diff line number Diff line change
Expand Up @@ -586,3 +586,81 @@ get_cauth_lad <- function(year) {
# Tidy up the output file (defined earlier in this script)
tidy_raw_lookup(output)
}

#' Fetch and combine LSIP-LAD lookup data for multiple years
#'
#' Downloads, binds, and tidies LSIP-LAD lookup data from ONS Geography portal
#' API for multiple years.
#' The function constructs the correct URLs for each year, fetches the data,
#' adds a year column, and combines all years into a single data frame.
#' It then collapses the time series to add `first_available_year_included`
#' and `most_recent_year_included` columns, and removes duplicates.
#' Currently supports data for the years 2023 and 2025.
#' To add support for additional years, update the `yr_specific_url` list
#' with the appropriate year and URL segment.
#'
#' @return A data frame containing the combined LSIP-LAD lookup for all years,
#' with columns for codes, names, year, and operational period.
#' @keywords internal
#' @noRd
get_lsip_lad <- function() {
# Base URL components
url_prefix_1 <- "https://services1.arcgis.com/"
url_prefix_2 <- "ESMARspQHYMw9BZ9/arcgis/rest/services/"
url_suffix <- "/FeatureServer/0/query?outFields=*&where=1%3D1&f=json"

# Year-specific URL segments
yr_specific_url <- list(
"2023" = "LAD23_LSIP23_EN_LU",
"2025" = "LAD25_LSIP25_EN_LU"
)
#Create an empty list to store data frames
data_frames <- list()
#Loop through each year and fetch data
for (year in names(yr_specific_url)) {
#Construct the full URL
full_url <- paste0(
url_prefix_1,
url_prefix_2,
yr_specific_url[[year]],
url_suffix
)

#Make the GET request and parse the JSON response
response <- httr::GET(full_url)
# get the content and convert from json
data <- jsonlite::fromJSON(httr::content(response, "text"))

#Extract the attributes and convert to data frame
df <- as.data.frame(data$features$attributes) |>
#create a year column
dplyr::mutate(year = as.integer(year)) |>
#rename columns based on position so binding works
dplyr::select(
year,
lad_code = 1,
lad_name = 2,
lsip_code = 3,
lsip_name = 4
)

#put the data frame into the list
data_frames[[year]] <- df
}
#Combine all data frames into one
combined_df <- do.call(rbind, data_frames)
#get first_available and most_recent year columns
combined_df <- combined_df |>
collapse_timeseries() |>
# strip extra whitespace from all columns
dplyr::mutate(
dplyr::across(
dplyr::everything(),
~ trimws(.x)
)
) |>
#make sure we remove duplicates
dplyr::distinct()
Comment on lines +607 to +663
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is functioning perfectly well, so no need to change this unless you want to. While I'm here I wanted to highlight a couple of things in case they're helpful though!

  1. httr has been overtaken by httr2, so moving forwards it's generally better to use that for making http requests / API queries

  2. You could also use the get_ons_api_data() helper that's already in dfeR to do this, which would follow the approach I took for other functions, if you do this, you'd need to do a couple of extra steps

a) Add additional column shorthands for LSIP into the ons_geog_shorthands table (and then rerun that script / update that data object), e.g.

## code to prepare `ons_geog_shorthands` data set goes here

ons_level_shorthands <- c(
  "WD",
  "PCON",
  "LAD",
  "UTLA",
  "CTYUA",
  "LSIP",
  "CAUTH",
  "GOR",
  "RGN",
  "CTRY"
)
name_column <- paste0(
  c(
    "ward",
    "pcon",
    "lad",
    "la",
    "la",
    "lsip",
    "cauth",
    "region",
    "region",
    "country"
  ),
  "_name"
)
code_column <- paste0(
  c(
    "ward",
    "pcon",
    "lad",
    "new_la",
    "new_la",
    "lsip",
    "cauth",
    "region",
    "region",
    "country"
  ),
  "_code"
)

ons_geog_shorthands <- data.frame(
  ons_level_shorthands,
  name_column,
  code_column
)

usethis::use_data(ons_geog_shorthands, overwrite = TRUE)

b) Update the get_lsip_lad() function to use get_ons_api_data(), e.g. something like

#' Fetch and combine LSIP-LAD lookup data for multiple years
#'
#' Helper function to extract data from the LSIP-LAD lookups
#'
#' @param year four digit year of the lookup
#'
#' @return data.frame for the individual year of the lookup
#'
#' @keywords internal
#' @noRd
get_lsip_lad <- function(year) {
  year_end <- year %% 100

  data_id <- paste0("LAD", year_end, "_LSIP", year_end, "_EN_LU")

  fields <- paste0(
    "LSIP",
    year_end,
    "CD,LSIP",
    year_end,
    "NM,LAD",
    year_end,
    "CD,LAD",
    year_end,
    "NM"
  )

  output <- get_ons_api_data(
    data_id = data_id,
    params <- list(
      where = "1=1",
      outFields = fields,
      outSR = 4326,
      f = "json"
    )
  )

  tidy_raw_lookup(output)
}

c) udpate the data-raw/lsip_lad.R script to lapply the get_lsip_lad() function over every year you want the lookup for (then this is the place you come back to to edit and update when new lookups are published), e.g.

# First boundaries published in 2023, ONS didn't publish a 2024 set
lsip_lad <- lapply(c(2023, 2025), get_lsip_lad) |>
  create_time_series_lookup()

# Save the data to the package's data directory
usethis::use_data(lsip_lad, overwrite = TRUE)

I think for lsip_lad as it is now, you could leave your code as it is if you didn't want to make these changes, as there's very few rows and the logic you've written seems to return everything as expected (the years come back as character instead of numeric, but that's the only difference I could spot). This is mostly a suggestion for how I'd have written this / how I intended for the helper functions to be used as I know you wanted to use this to learn more about the code in here so far!

One of the reasons for the way I wrote the other code is that there's a limit for the amount of rows you can get in a single query so the approach you've used wouldn't work for larger tables (and therefore you need to use some kind of batching logic like I've put in get_ons_api_data() to send multiple queries to a dataset on the Open Geography Portal to get all the rows).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for this @cjrace ! definitely interested in learning more about how the these helper functions work so will try to use them before getting you to re-review. also completely understand the point about the queries getting cut off!


combined_df
}
44 changes: 44 additions & 0 deletions R/fetch.R
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,10 @@
#'
#' head(fetch_mayoral())
#'
#' head(fetch_lsip())
#'
#' head(fetch_lsip(2024))
#'
#' fetch_lads(2024, "Wales")
#'
#' fetch_las(2022, "Northern Ireland")
Expand Down Expand Up @@ -192,3 +196,43 @@ fetch_regions <- function() {
fetch_countries <- function() {
dfeR::countries
}

#' Fetch Local Skills Improvement Plan (LSIP) areas lookup
#'
#' Fetch a data frame of Local Skills Improvement Plan (LSIP) areas
#' for a given year based on `dfeR::lsip_lad`.
#'
#' @param year Year to filter the lookup to, default is "All".
#' @family fetch_locations
#' @return data frame of LSIP for a given year.
#' @export
#' @inherit fetch examples
fetch_lsip <- function(year = "All") {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've generally used the fetch functions for getting a list of locations of one specific type. So from the way the other fetch_* functions work I'd have expected fetch_lsip() to only return lsip_name and lsip_code, not the LADs too (Regions / Countries are exceptions as they have data frames that only have one kind of location in)

Users can get the full lookup using dfeR::lsip_lad and easily filter that, so for consistency with other functions I'd drop LADs from this one?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this was only producing the LSIP but the documentation was wrong - fixed and will push

#convert year input to numeric if possible
if (is.character(year) && year != "All") {
year_num <- suppressWarnings(as.numeric(year))
if (!is.na(year_num)) {
year <- year_num
}
}

#add a check for year input to see if it's in range
min_year <- min(dfeR::lsip_lad$first_available_year_included)
max_year <- max(dfeR::lsip_lad$most_recent_year_included)
if (
!(year == "All" || (year %% 1 == 0 && year >= min_year && year <= max_year))
) {
stop(
paste0(
"year must either be 'All' or a valid year between ",
min_year,
" and ",
max_year
),
call. = FALSE
)
}
lookup_data <- dfeR::lsip_lad
cols <- c("lsip_code", "lsip_name")
summarise_locations_by_year(lookup_data, cols, year)
}
88 changes: 47 additions & 41 deletions R/fetch_utils.R
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way Git shows the changes in this script takes a bit of getting your head around but had a dig through this and it's a nice breaking out of the logic into smaller parts - I like it (good typo spot too in one of the comments)!

Original file line number Diff line number Diff line change
Expand Up @@ -27,35 +27,26 @@ check_fetch_location_inputs <- function(year_input, country_input) {
}
}

#' Fetch locations for a given lookup
#' Summarise and filter locations by operational years
#'
#' Helper function for the fetch_xxx() functions to save repeating code
#' Shared logic for summarising location lookups and filtering
#' by operational years.
#' Used by fetch_locations and fetch_lsip.
#'
#' @param lookup_data lookup data to use to extract locations from
#' @param cols columns to extract from the main lookup table
#' @param year year of locations to extract, "All" will skip any filtering and
#' return all possible locations
#' @param countries countries for locations to be take from, "All" will skip
#' any filtering and return all
#'
#' @return a data frame of location names and codes
#' @param lookup_data The lookup data frame.
#' @param cols Character vector of columns to keep and group by.
#' @param year Year to filter to, or "All" for no filtering.
#' @return A data frame summarised and filtered by operational years.
#' @keywords internal
#' @noRd
fetch_locations <- function(lookup_data, cols, year, countries) {
# Return only the cols we specified
# We know their position from the dplyr selection of the lookup
# This is used wherever this function returns an output
summarise_locations_by_year <- function(lookup_data, cols, year = "All") {
cols_to_return <- seq_along(cols)

# Pull in main lookup data
lookup <- dplyr::select(
lookup_data,
dplyr::all_of(
c(cols, "first_available_year_included", "most_recent_year_included")
)
)

# Resummarise the years to each unique location
resummarised_lookup <- lookup |>
dplyr::summarise(
"first_available_year_included" = min(
Expand All @@ -64,34 +55,49 @@ fetch_locations <- function(lookup_data, cols, year, countries) {
"most_recent_year_included" = max(.data$most_recent_year_included),
.by = dplyr::all_of(cols)
)

# Return early without filtering if defaults are used
if (all(year == "All", countries == "All")) {
if (year == "All") {
return(dplyr::distinct(resummarised_lookup[, cols_to_return]))
}

# Filter based on year selection if specified
if (year != "All") {
# Flag the rows that are in the year asked for
resummarised_lookup <- resummarised_lookup |>
dplyr::mutate(
"in_specified_year" = ifelse(
as.numeric(.data$most_recent_year_included) >= year &
as.numeric(.data$first_available_year_included) <= year,
TRUE,
FALSE
)
resummarised_lookup <- resummarised_lookup |>
dplyr::mutate(
"in_specified_year" = ifelse(
as.numeric(.data$most_recent_year_included) >= year &
as.numeric(.data$first_available_year_included) <= year,
TRUE,
FALSE
)
)
resummarised_lookup <- with(
resummarised_lookup,
subset(resummarised_lookup, in_specified_year == TRUE)
) |>
dplyr::select(-c("in_specified_year"))
dplyr::distinct(resummarised_lookup[, cols_to_return])
}

#' Fetch locations for a given lookup
#'
#' Helper function for the fetch_xxx() functions to save repeating code
#'
#' @param lookup_data lookup data to use to extract locations from
#' @param cols columns to extract from the main lookup table
#' @param year year of locations to extract, "All" will skip any filtering and
#' return all possible locations
#' @param countries countries for locations to be take from, "All" will skip
#' any filtering and return all
#'
#' @return a data frame of location names and codes
#' @keywords internal
#' @noRd
fetch_locations <- function(lookup_data, cols, year, countries) {
resummarised_lookup <- summarise_locations_by_year(lookup_data, cols, year)

# Filter to only those locations
resummarised_lookup <- with(
resummarised_lookup,
subset(resummarised_lookup, in_specified_year == TRUE)
) |>
dplyr::select(-c("in_specified_year")) # remove temp column
# Return early without filtering if defaults are used
if (all(year == "All", countries == "All")) {
return(resummarised_lookup)
}

# Filter based on country selcetion if specified
# Filter based on country selection if specified
if (paste0(countries, collapse = "") != "All") {
# Get the code column
# Take new_la_code if present (as sometimes there may also be old_la code)
Expand Down Expand Up @@ -119,5 +125,5 @@ fetch_locations <- function(lookup_data, cols, year, countries) {
)
}

dplyr::distinct(resummarised_lookup[, cols_to_return])
resummarised_lookup
}
1 change: 1 addition & 0 deletions _pkgdown.yml
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ reference:
- regions
- geog_time_identifiers
- wd_pcon_lad_la_rgn_ctry
- lsip_lad

- title: Fetch geography lists
desc: Pull geography lookups from the ONS Geography Portal
Expand Down
38 changes: 38 additions & 0 deletions data-raw/lsip_lad.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
# ------------------------------------------------------------------------------
# Script to create the LSIP-LAD lookup dataset for the dfeR package
#
# This script fetches, processes, and saves the Local Authority District (LAD)
# to Local Skills Improvement Plan (LSIP) lookup for multiple years.
#
# Data Source:
# - ONS Open Geography Portal API (https://geoportal.statistics.gov.uk/)
# - Each year's data is accessed via a unique ArcGIS REST API endpoint from
# the ONS Geography portal x, constructed from a common prefix, a
# year-specific path, and a query suffix. See the get_lsip_lad() function
# in R/datasets_utils.R for details on URL construction and year coverage.
# What this script does:
# 1. Calls get_lsip_lad() to fetch and combine LSIP-LAD data
# for all available years.
# 2. The resulting data frame includes columns for LAD and LSIP codes
# and names, the year, and operational period columns.
# 3. Saves the processed lookup as an internal package dataset.
#
# Usage:
# - Run this code when new LSIP-LAD data is available or to refresh the lookup
# - Ensure get_lsip_lad() is up to date with the correct endpoints for all
# years required.
# How to update the data:
# 1. Check the ONS Open Geography Portal for new or updated LSIP-LAD datasets
# and note the new URLs.
# 2. Update the `yr_specific_url` list in get_lsip_lad() (R/datasets_utils.R)
# to include the section of the URL that corresponds to that year.
# 3. Run this script to fetch, process, and save the latest data.
# 4. Re-document and test the package as needed.
#
# ------------------------------------------------------------------------------

# use get_lsip_lad to get the data from ONS
lsip_lad <- get_lsip_lad()

# Save the data to the package's data directory
usethis::use_data(lsip_lad, overwrite = TRUE)
Binary file added data/lsip_lad.rda
Binary file not shown.
3 changes: 3 additions & 0 deletions inst/WORDLIST
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,8 @@ DfE
EESScreenerLAs
GOR
JBLOGGS
LADs
LSIP
LUP
LaNameCodes
Lifecycle
Expand Down Expand Up @@ -41,6 +43,7 @@ gov
las
lauraselby
lockfile
lsip
lup
num
odbc
Expand Down
4 changes: 4 additions & 0 deletions man/fetch.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

5 changes: 5 additions & 0 deletions man/fetch_countries.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading
Loading