Created lsip_lad data and fetch_lsip() function#133
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #133 +/- ##
===========================================
- Coverage 68.80% 56.40% -12.41%
===========================================
Files 15 18 +3
Lines 1106 1569 +463
===========================================
+ Hits 761 885 +124
- Misses 345 684 +339 Continue to review full report in Codecov by Sentry.
🚀 New features to boost your workflow:
|
cjrace
left a comment
There was a problem hiding this comment.
Thanks @mzayeddfe, this is great, some nice improvements in here!
Some of the approach you've used looks like you've figured it out yourself rather than reusing the approach we have already so I've detailed an alternative approach for get_lsip_lad() that I'd have done. Mostly for interest / future awareness as I know you wanted to use this as a chance to get stuck into the code, and I'm aware looking back at it the logic I was using isn't the most obvious (partly as geo_hierarchy has become a bit of a b-EES-t! Happy to have chat through if helpful 😄
For code cov I wouldn't worry, that's just showing how many lines of code are ran in tests versus not, for a PR like this adding a lot of internal code bespoke to querying the ONS API I'd expect coverage to go down slightly as not all of those lines of code are worth testing.
| #' @return data frame of LSIP-LAD relationships | ||
| #' @export | ||
| #' @inherit fetch examples | ||
| fetch_lsip <- function(year = "All") { |
There was a problem hiding this comment.
I've generally used the fetch functions for getting a list of locations of one specific type. So from the way the other fetch_* functions work I'd have expected fetch_lsip() to only return lsip_name and lsip_code, not the LADs too (Regions / Countries are exceptions as they have data frames that only have one kind of location in)
Users can get the full lookup using dfeR::lsip_lad and easily filter that, so for consistency with other functions I'd drop LADs from this one?
There was a problem hiding this comment.
I think this was only producing the LSIP but the documentation was wrong - fixed and will push
| # Base URL components | ||
| url_prefix_1 <- "https://services1.arcgis.com/" | ||
| url_prefix_2 <- "ESMARspQHYMw9BZ9/arcgis/rest/services/" | ||
| url_suffix <- "/FeatureServer/0/query?outFields=*&where=1%3D1&f=json" | ||
|
|
||
| # Year-specific URL segments | ||
| yr_specific_url <- list( | ||
| "2023" = "LAD23_LSIP23_EN_LU", | ||
| "2025" = "LAD25_LSIP25_EN_LU" | ||
| ) | ||
| #Create an empty list to store data frames | ||
| data_frames <- list() | ||
| #Loop through each year and fetch data | ||
| for (year in names(yr_specific_url)) { | ||
| #Construct the full URL | ||
| full_url <- paste0( | ||
| url_prefix_1, | ||
| url_prefix_2, | ||
| yr_specific_url[[year]], | ||
| url_suffix | ||
| ) | ||
|
|
||
| #Make the GET request and parse the JSON response | ||
| response <- httr::GET(full_url) | ||
| # get the content and convert from json | ||
| data <- jsonlite::fromJSON(httr::content(response, "text")) | ||
|
|
||
| #Extract the attributes and convert to data frame | ||
| df <- as.data.frame(data$features$attributes) |> | ||
| #create a year column | ||
| dplyr::mutate(year = as.integer(year)) |> | ||
| #rename columns based on position so binding works | ||
| dplyr::select( | ||
| year, | ||
| lad_code = 1, | ||
| lad_name = 2, | ||
| lsip_code = 3, | ||
| lsip_name = 4 | ||
| ) | ||
|
|
||
| #put the data frame into the list | ||
| data_frames[[year]] <- df | ||
| } | ||
| #Combine all data frames into one | ||
| combined_df <- do.call(rbind, data_frames) | ||
| #get first_available and most_recent year columns | ||
| combined_df <- combined_df |> | ||
| collapse_timeseries() |> | ||
| # strip extra whitespace from all columns | ||
| dplyr::mutate( | ||
| dplyr::across( | ||
| dplyr::everything(), | ||
| ~ trimws(.x) | ||
| ) | ||
| ) |> | ||
| #make sure we remove duplicates | ||
| dplyr::distinct() |
There was a problem hiding this comment.
This is functioning perfectly well, so no need to change this unless you want to. While I'm here I wanted to highlight a couple of things in case they're helpful though!
-
httr has been overtaken by httr2, so moving forwards it's generally better to use that for making http requests / API queries
-
You could also use the
get_ons_api_data()helper that's already in dfeR to do this, which would follow the approach I took for other functions, if you do this, you'd need to do a couple of extra steps
a) Add additional column shorthands for LSIP into the ons_geog_shorthands table (and then rerun that script / update that data object), e.g.
## code to prepare `ons_geog_shorthands` data set goes here
ons_level_shorthands <- c(
"WD",
"PCON",
"LAD",
"UTLA",
"CTYUA",
"LSIP",
"CAUTH",
"GOR",
"RGN",
"CTRY"
)
name_column <- paste0(
c(
"ward",
"pcon",
"lad",
"la",
"la",
"lsip",
"cauth",
"region",
"region",
"country"
),
"_name"
)
code_column <- paste0(
c(
"ward",
"pcon",
"lad",
"new_la",
"new_la",
"lsip",
"cauth",
"region",
"region",
"country"
),
"_code"
)
ons_geog_shorthands <- data.frame(
ons_level_shorthands,
name_column,
code_column
)
usethis::use_data(ons_geog_shorthands, overwrite = TRUE)
b) Update the get_lsip_lad() function to use get_ons_api_data(), e.g. something like
#' Fetch and combine LSIP-LAD lookup data for multiple years
#'
#' Helper function to extract data from the LSIP-LAD lookups
#'
#' @param year four digit year of the lookup
#'
#' @return data.frame for the individual year of the lookup
#'
#' @keywords internal
#' @noRd
get_lsip_lad <- function(year) {
year_end <- year %% 100
data_id <- paste0("LAD", year_end, "_LSIP", year_end, "_EN_LU")
fields <- paste0(
"LSIP",
year_end,
"CD,LSIP",
year_end,
"NM,LAD",
year_end,
"CD,LAD",
year_end,
"NM"
)
output <- get_ons_api_data(
data_id = data_id,
params <- list(
where = "1=1",
outFields = fields,
outSR = 4326,
f = "json"
)
)
tidy_raw_lookup(output)
}
c) udpate the data-raw/lsip_lad.R script to lapply the get_lsip_lad() function over every year you want the lookup for (then this is the place you come back to to edit and update when new lookups are published), e.g.
# First boundaries published in 2023, ONS didn't publish a 2024 set
lsip_lad <- lapply(c(2023, 2025), get_lsip_lad) |>
create_time_series_lookup()
# Save the data to the package's data directory
usethis::use_data(lsip_lad, overwrite = TRUE)
I think for lsip_lad as it is now, you could leave your code as it is if you didn't want to make these changes, as there's very few rows and the logic you've written seems to return everything as expected (the years come back as character instead of numeric, but that's the only difference I could spot). This is mostly a suggestion for how I'd have written this / how I intended for the helper functions to be used as I know you wanted to use this to learn more about the code in here so far!
One of the reasons for the way I wrote the other code is that there's a limit for the amount of rows you can get in a single query so the approach you've used wouldn't work for larger tables (and therefore you need to use some kind of batching logic like I've put in get_ons_api_data() to send multiple queries to a dataset on the Open Geography Portal to get all the rows).
There was a problem hiding this comment.
thanks for this @cjrace ! definitely interested in learning more about how the these helper functions work so will try to use them before getting you to re-review. also completely understand the point about the queries getting cut off!
There was a problem hiding this comment.
The way Git shows the changes in this script takes a bit of getting your head around but had a dig through this and it's a nice breaking out of the logic into smaller parts - I like it (good typo spot too in one of the comments)!
Brief overview of changes
This PR introduces support for LSIP-LAD lookups and adds a new fetch_lsip_lad function, along with documentation, tests and refactoring for shared fetch filtering logic.
NOTE: I'm not sure how to test for the things outlined by the code coverage report. If you have suggestions, please let me know!
Why are these changes being made?
Detailed description of changes
Still for me to do after initial review:
Essential Checklist
devtools::test())devtools::document()Consider (as applicable)
NEWS.mdfile with a summary of my changesDESCRIPTIONstyler::style_pkg()) and lintr issues (lintr::lint_package())Issue ticket number/s and link
Closes #121