fwfpos <-
c(1, 15, 1, 20, 3, 60, 20, 2, 10, 10, 10, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 4, 100) |>
setNames(migbirdHIP:::REF_FIELDS_ALL)
raw_data <-
duckplyr::read_file_duckdb(
path = my_path,
table_function = "read_csv_auto",
options = list(header = FALSE)
)
season_data <-
raw_data |>
duckplyr::as_duckdb_tibble(prudence = "lavish") |>
separate_wider_position(
cols = everything(),
widths = fwfpos,
too_few = "align_start",
too_many = "drop") |>
duckplyr::collect()
Reading in HIP data is slow. By switching to a data lake model with
{duckplyr}, which will allow better inter-download (#46) and inter-state deduplication, we may also see some speed increases.Running
migbirdHIP::read_hip()on the 3.2 million records of 2025-2026 season data takes ~5.4 minutes. In comparison, running the following{duckplyr}reading and column delimitation functions (which does not include several data checks etc) takes only 40 seconds.