perf: add batch scope resolution API to eliminate redundant validate+explode per (property × timespan × category)

## Problem

`prop_for_scope` is called once per (property × timespan × category) combination by downstream consumers such as `cube_wrangler`'s `split_properties_by_time_period_and_category`. On a MetCouncil-scale network this means **35+ independent calls** (3 simple properties × 5 time periods + \`price\` with 4 categories × 5 time periods = 35 combinations).

Each call independently executes:

\`\`\`python
# network_wrangler/roadway/links/scopes.py — prop_for_scope()
links_df = validate_df_to_model(links_df, RoadLinksTable)            # (1) full schema validation
candidate = _create_exploded_df_for_scoped_prop(links_df, prop_name) # (2) explode + json_normalize + datetime conversion
filtered  = _filter_exploded_df_to_scope(candidate, timespan, category) # (3) cheap filter — varies per call
\`\`\`

### Redundancy 1: validation called 35× on an already-valid DataFrame

\`validate_df_to_model\` does significant work on every call:

| Step | Cost |
|------|------|
| \`copy.deepcopy(df.attrs)\` | full deep copy of attrs |
| \`_convert_string_dtype_to_object\` | full DataFrame copy + column iteration |
| \`model.validate(df, lazy=True)\` | Pandera checks all 28+ columns of \`RoadLinksTable\` |
| \`fill_df_with_defaults_from_model\` | fills NaN defaults across columns |

When called from \`split_properties_by_time_period_and_category\`, the input is always \`roadway_net.links_df\` — a DataFrame that was already validated and coerced when the \`RoadwayNetwork\` was loaded. There is no caching or memoization, so the full validation is repeated 35 times on the same object.

### Redundancy 2: explode called 35× when the result is the same per property

\`_create_exploded_df_for_scoped_prop\` builds a tidy exploded DataFrame from the \`sc_{prop}\` list column. Its output **depends only on the property name** — not on the timespan or category being queried. It is therefore identical for every call on the same property, yet it runs once per (timespan, category) combination.

### Measured impact (stpaul network, 66,253 links)

| Step | Time per call | × calls | Total |
|------|--------------|---------|-------|
| validate + explode | ~0.3–0.6s | 35 | **~10–15s** |
| filter (step 3 only) | ~0.01s | 35 | ~0.4s |

On a 200k-link production network: **~30–45s** for \`split_properties\` alone.

---

## Proposed fix

Add a public \`props_for_scopes\` function that:
1. Validates **once**
2. Explodes **once per property**
3. Filters cheaply for each (timespan, category) scope

\`\`\`python
def props_for_scopes(
    links_df: pd.DataFrame,
    prop_name: str,
    scopes: list[dict],
    strict_timespan_match: bool = False,
    min_overlap_minutes: int = 60,
    allow_default: bool = True,
) -> dict[str, pd.Series]:
    """Resolve one property for multiple (timespan, category) scopes in a single pass.

    Validates and explodes links_df once; filters once per scope.

    Args:
        links_df: RoadLinksTable DataFrame.
        prop_name: Name of the property to resolve.
        scopes: List of dicts, each with keys:
            - "label": str — key in the returned dict
            - "timespan": list[TimeString]
            - "category": str | int | None
        strict_timespan_match: passed to _filter_exploded_df_to_scope.
        min_overlap_minutes: passed to _filter_exploded_df_to_scope.
        allow_default: if True, return the default column when no scoped values exist.

    Returns:
        dict mapping each scope label to a pd.Series of resolved values.
    """
    links_df = validate_df_to_model(links_df, RoadLinksTable)  # validate ONCE

    base = links_df[prop_name].copy()

    if f"sc_{prop_name}" not in links_df.columns or links_df[f"sc_{prop_name}"].isna().all():
        if not allow_default:
            raise ValueError(f"{prop_name} has no scoped values and allow_default=False")
        return {s["label"]: base.copy() for s in scopes}

    exploded = _create_exploded_df_for_scoped_prop(links_df, prop_name)  # explode ONCE

    result = {}
    for scope in scopes:
        filtered = _filter_exploded_df_to_scope(
            exploded,
            timespan=scope["timespan"],
            category=scope.get("category"),
            strict_timespan_match=strict_timespan_match,
            min_overlap_minutes=min_overlap_minutes,
        )
        col = base.copy()
        col.loc[filtered.index] = filtered["scoped"]
        result[scope["label"]] = col

    return result
\`\`\`

### Caller change in cube_wrangler

\`\`\`python
# cube_wrangler/roadway.py
# Before: one prop_for_scope call per combination
for time_suffix, category_suffix in itertools.product(time_periods, categories):
    roadway_net.links_df[out_var + "_" + ...] = prop_for_scope(
        roadway_net.links_df, params["v"], category=..., timespan=...
    )[params["v"]]

# After: one props_for_scopes call per property
scopes = [
    {"label": f"{out_var}_{cat_sfx}_{ts_sfx}",
     "timespan": params["time_periods"][ts_sfx],
     "category": params["categories"][cat_sfx]}
    for ts_sfx, cat_sfx in itertools.product(params["time_periods"], params["categories"])
]
resolved = props_for_scopes(roadway_net.links_df, params["v"], scopes)
for label, series in resolved.items():
    roadway_net.links_df[label] = series
\`\`\`

---

## Expected speedup

| Operation | Before | After | Reduction |
|-----------|--------|-------|-----------|
| \`validate_df_to_model\` | 35× | 1× | 34 avoided |
| \`_create_exploded_df_for_scoped_prop\` | 35× | 7× (one per property) | 28 avoided |
| \`_filter_exploded_df_to_scope\` | 35× | 35× | unchanged (cheap) |
| **Total split_properties (stpaul, 66k links)** | **~12s** | **~1s** | **~10–30×** |
| **Total split_properties (200k links)** | **~40s** | **~3s** | **~13×** |

---

## Implementation notes

- \`props_for_scopes\` should be exported from \`network_wrangler.roadway.links.scopes\` alongside \`prop_for_scope\`
- \`_create_exploded_df_for_scoped_prop\` is already well-factored; no changes needed to it
- The existing \`prop_for_scope\` should remain unchanged for backwards compatibility
- A benchmark test can be added using a pre-loaded \`RoadwayNetwork\` fixture (the stpaul test network), calling \`props_for_scopes\` with a realistic set of MetCouncil-style scopes

## Context

Identified by profiling \`cube_wrangler\`'s log → project card pipeline. Full analysis in \`cube_wrangler/PERFORMANCE_ANALYSIS.md\` on branch \`feature/perf-link-changes\`. The \`_process_link_changes\` bottleneck (O(N_network × N_changes) boolean mask scan) has already been fixed; this is the next highest-impact item.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: add batch scope resolution API to eliminate redundant validate+explode per (property × timespan × category) #451

Problem

network_wrangler/roadway/links/scopes.py — prop_for_scope()

Redundancy 1: validation called 35× on an already-valid DataFrame

Redundancy 2: explode called 35× when the result is the same per property

Measured impact (stpaul network, 66,253 links)

Proposed fix

Caller change in cube_wrangler

cube_wrangler/roadway.py

Before: one prop_for_scope call per combination

After: one props_for_scopes call per property

Expected speedup

Implementation notes

Context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Step	Cost
`copy.deepcopy(df.attrs)`	full deep copy of attrs
`_convert_string_dtype_to_object`	full DataFrame copy + column iteration
`model.validate(df, lazy=True)`	Pandera checks all 28+ columns of `RoadLinksTable`
`fill_df_with_defaults_from_model`	fills NaN defaults across columns

Step	Time per call	× calls	Total
validate + explode	~0.3–0.6s	35	~10–15s
filter (step 3 only)	~0.01s	35	~0.4s

Operation	Before	After	Reduction
`validate_df_to_model`	35×	1×	34 avoided
`_create_exploded_df_for_scoped_prop`	35×	7× (one per property)	28 avoided
`_filter_exploded_df_to_scope`	35×	35×	unchanged (cheap)
Total split_properties (stpaul, 66k links)	~12s	~1s	~10–30×
Total split_properties (200k links)	~40s	~3s	~13×

perf: add batch scope resolution API to eliminate redundant validate+explode per (property × timespan × category) #451

Description

Problem

network_wrangler/roadway/links/scopes.py — prop_for_scope()

Redundancy 1: validation called 35× on an already-valid DataFrame

Redundancy 2: explode called 35× when the result is the same per property

Measured impact (stpaul network, 66,253 links)

Proposed fix

Caller change in cube_wrangler

cube_wrangler/roadway.py

Before: one prop_for_scope call per combination

After: one props_for_scopes call per property

Expected speedup

Implementation notes

Context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions