-
Notifications
You must be signed in to change notification settings - Fork 3
Created lsip_lad data and fetch_lsip() function #133
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
f2065fa
4f9d58b
06cee98
956b207
47c2115
bfb95de
1eccddf
407d536
c4aa2a9
5d05392
dfbfc92
6a0d6ab
51ef8e0
36fa9af
50c638e
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -586,3 +586,81 @@ get_cauth_lad <- function(year) { | |
| # Tidy up the output file (defined earlier in this script) | ||
| tidy_raw_lookup(output) | ||
| } | ||
|
|
||
| #' Fetch and combine LSIP-LAD lookup data for multiple years | ||
| #' | ||
| #' Downloads, binds, and tidies LSIP-LAD lookup data from ONS Geography portal | ||
| #' API for multiple years. | ||
| #' The function constructs the correct URLs for each year, fetches the data, | ||
| #' adds a year column, and combines all years into a single data frame. | ||
| #' It then collapses the time series to add `first_available_year_included` | ||
| #' and `most_recent_year_included` columns, and removes duplicates. | ||
| #' Currently supports data for the years 2023 and 2025. | ||
| #' To add support for additional years, update the `yr_specific_url` list | ||
| #' with the appropriate year and URL segment. | ||
| #' | ||
| #' @return A data frame containing the combined LSIP-LAD lookup for all years, | ||
| #' with columns for codes, names, year, and operational period. | ||
| #' @keywords internal | ||
| #' @noRd | ||
| get_lsip_lad <- function() { | ||
| # Base URL components | ||
| url_prefix_1 <- "https://services1.arcgis.com/" | ||
| url_prefix_2 <- "ESMARspQHYMw9BZ9/arcgis/rest/services/" | ||
| url_suffix <- "/FeatureServer/0/query?outFields=*&where=1%3D1&f=json" | ||
|
|
||
| # Year-specific URL segments | ||
| yr_specific_url <- list( | ||
| "2023" = "LAD23_LSIP23_EN_LU", | ||
| "2025" = "LAD25_LSIP25_EN_LU" | ||
| ) | ||
| #Create an empty list to store data frames | ||
| data_frames <- list() | ||
| #Loop through each year and fetch data | ||
| for (year in names(yr_specific_url)) { | ||
| #Construct the full URL | ||
| full_url <- paste0( | ||
| url_prefix_1, | ||
| url_prefix_2, | ||
| yr_specific_url[[year]], | ||
| url_suffix | ||
| ) | ||
|
|
||
| #Make the GET request and parse the JSON response | ||
| response <- httr::GET(full_url) | ||
| # get the content and convert from json | ||
| data <- jsonlite::fromJSON(httr::content(response, "text")) | ||
|
|
||
| #Extract the attributes and convert to data frame | ||
| df <- as.data.frame(data$features$attributes) |> | ||
| #create a year column | ||
| dplyr::mutate(year = as.integer(year)) |> | ||
| #rename columns based on position so binding works | ||
| dplyr::select( | ||
| year, | ||
| lad_code = 1, | ||
| lad_name = 2, | ||
| lsip_code = 3, | ||
| lsip_name = 4 | ||
| ) | ||
|
|
||
| #put the data frame into the list | ||
| data_frames[[year]] <- df | ||
| } | ||
| #Combine all data frames into one | ||
| combined_df <- do.call(rbind, data_frames) | ||
| #get first_available and most_recent year columns | ||
| combined_df <- combined_df |> | ||
| collapse_timeseries() |> | ||
| # strip extra whitespace from all columns | ||
| dplyr::mutate( | ||
| dplyr::across( | ||
| dplyr::everything(), | ||
| ~ trimws(.x) | ||
| ) | ||
| ) |> | ||
| #make sure we remove duplicates | ||
| dplyr::distinct() | ||
|
Comment on lines
+607
to
+663
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is functioning perfectly well, so no need to change this unless you want to. While I'm here I wanted to highlight a couple of things in case they're helpful though!
a) Add additional column shorthands for LSIP into the b) Update the c) udpate the I think for lsip_lad as it is now, you could leave your code as it is if you didn't want to make these changes, as there's very few rows and the logic you've written seems to return everything as expected (the years come back as character instead of numeric, but that's the only difference I could spot). This is mostly a suggestion for how I'd have written this / how I intended for the helper functions to be used as I know you wanted to use this to learn more about the code in here so far! One of the reasons for the way I wrote the other code is that there's a limit for the amount of rows you can get in a single query so the approach you've used wouldn't work for larger tables (and therefore you need to use some kind of batching logic like I've put in
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. thanks for this @cjrace ! definitely interested in learning more about how the these helper functions work so will try to use them before getting you to re-review. also completely understand the point about the queries getting cut off! |
||
|
|
||
| combined_df | ||
| } | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -28,6 +28,10 @@ | |
| #' | ||
| #' head(fetch_mayoral()) | ||
| #' | ||
| #' head(fetch_lsip()) | ||
| #' | ||
| #' head(fetch_lsip(2024)) | ||
| #' | ||
| #' fetch_lads(2024, "Wales") | ||
| #' | ||
| #' fetch_las(2022, "Northern Ireland") | ||
|
|
@@ -192,3 +196,43 @@ fetch_regions <- function() { | |
| fetch_countries <- function() { | ||
| dfeR::countries | ||
| } | ||
|
|
||
| #' Fetch Local Skills Improvement Plan (LSIP) areas lookup | ||
| #' | ||
| #' Fetch a data frame of Local Skills Improvement Plan (LSIP) areas | ||
| #' for a given year based on `dfeR::lsip_lad`. | ||
| #' | ||
| #' @param year Year to filter the lookup to, default is "All". | ||
| #' @family fetch_locations | ||
| #' @return data frame of LSIP for a given year. | ||
| #' @export | ||
| #' @inherit fetch examples | ||
| fetch_lsip <- function(year = "All") { | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I've generally used the fetch functions for getting a list of locations of one specific type. So from the way the other Users can get the full lookup using
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this was only producing the LSIP but the documentation was wrong - fixed and will push |
||
| #convert year input to numeric if possible | ||
| if (is.character(year) && year != "All") { | ||
| year_num <- suppressWarnings(as.numeric(year)) | ||
| if (!is.na(year_num)) { | ||
| year <- year_num | ||
| } | ||
| } | ||
|
|
||
| #add a check for year input to see if it's in range | ||
| min_year <- min(dfeR::lsip_lad$first_available_year_included) | ||
| max_year <- max(dfeR::lsip_lad$most_recent_year_included) | ||
| if ( | ||
| !(year == "All" || (year %% 1 == 0 && year >= min_year && year <= max_year)) | ||
| ) { | ||
| stop( | ||
| paste0( | ||
| "year must either be 'All' or a valid year between ", | ||
| min_year, | ||
| " and ", | ||
| max_year | ||
| ), | ||
| call. = FALSE | ||
| ) | ||
| } | ||
| lookup_data <- dfeR::lsip_lad | ||
| cols <- c("lsip_code", "lsip_name") | ||
| summarise_locations_by_year(lookup_data, cols, year) | ||
| } | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The way Git shows the changes in this script takes a bit of getting your head around but had a dig through this and it's a nice breaking out of the logic into smaller parts - I like it (good typo spot too in one of the comments)! |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,38 @@ | ||
| # ------------------------------------------------------------------------------ | ||
| # Script to create the LSIP-LAD lookup dataset for the dfeR package | ||
| # | ||
| # This script fetches, processes, and saves the Local Authority District (LAD) | ||
| # to Local Skills Improvement Plan (LSIP) lookup for multiple years. | ||
| # | ||
| # Data Source: | ||
| # - ONS Open Geography Portal API (https://geoportal.statistics.gov.uk/) | ||
| # - Each year's data is accessed via a unique ArcGIS REST API endpoint from | ||
| # the ONS Geography portal x, constructed from a common prefix, a | ||
| # year-specific path, and a query suffix. See the get_lsip_lad() function | ||
| # in R/datasets_utils.R for details on URL construction and year coverage. | ||
| # What this script does: | ||
| # 1. Calls get_lsip_lad() to fetch and combine LSIP-LAD data | ||
| # for all available years. | ||
| # 2. The resulting data frame includes columns for LAD and LSIP codes | ||
| # and names, the year, and operational period columns. | ||
| # 3. Saves the processed lookup as an internal package dataset. | ||
| # | ||
| # Usage: | ||
| # - Run this code when new LSIP-LAD data is available or to refresh the lookup | ||
| # - Ensure get_lsip_lad() is up to date with the correct endpoints for all | ||
| # years required. | ||
| # How to update the data: | ||
| # 1. Check the ONS Open Geography Portal for new or updated LSIP-LAD datasets | ||
| # and note the new URLs. | ||
| # 2. Update the `yr_specific_url` list in get_lsip_lad() (R/datasets_utils.R) | ||
| # to include the section of the URL that corresponds to that year. | ||
| # 3. Run this script to fetch, process, and save the latest data. | ||
| # 4. Re-document and test the package as needed. | ||
| # | ||
| # ------------------------------------------------------------------------------ | ||
|
|
||
| # use get_lsip_lad to get the data from ONS | ||
| lsip_lad <- get_lsip_lad() | ||
|
|
||
| # Save the data to the package's data directory | ||
| usethis::use_data(lsip_lad, overwrite = TRUE) |
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Uh oh!
There was an error while loading. Please reload this page.