Skip to content

hydrogen_demand_export: footnote rows not filtered from parsed data #19

@ymiftah

Description

@ymiftah

Bug

isp2025.hydrogen_demand_export() returns rows where isp_subregion contains multi-sentence footnote strings from the source Excel workbook (e.g. "This load is estimated based on ACIL Allen 2025..."), and rows where region = 'Region' (repeated header rows).

Impact

Downstream consumers must apply their own filter (e.g. WHERE region IN ('NSW','QLD','SA','TAS','VIC')) to get clean data. Without this, aggregation and dimension rendering are corrupted by footnote text appearing as dimension values.

Expected behaviour

hydrogen_demand_export() should return only valid data rows. Footnote rows and repeated header rows should be stripped during parsing (before the DataFrame is returned), consistent with how other read_timeseries-based functions behave.

Suggested fix

In _parse_timeseries_block (or specifically in hydrogen_demand_export), add a post-parse filter:

data = data.filter(pl.col("region").is_in(["NSW", "QLD", "SA", "TAS", "VIC"]))

Or more generically, filter rows where the first id column contains strings longer than a reasonable maximum for a region/subregion name.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions