Skip to content

LOTC-1523: walk nested JSON pointers in shifter and freshness validator#197

Open
kevinborkman-hub wants to merge 2 commits into
mainfrom
LOTC-1523-pointer-walking-shifter
Open

LOTC-1523: walk nested JSON pointers in shifter and freshness validator#197
kevinborkman-hub wants to merge 2 commits into
mainfrom
LOTC-1523-pointer-walking-shifter

Conversation

@kevinborkman-hub

@kevinborkman-hub kevinborkman-hub commented Apr 29, 2026

Copy link
Copy Markdown
Collaborator

Summary

Fixes LOTC-1523. Both the configurator's stale-timestamp auto-shifter and the Rust freshness validator silently skipped transforms whose primary timestamp column is sourced from a multi-segment JSON pointer (e.g. /httpMessage/start). SIEM-shaped bundles passed validation with arbitrarily stale fixtures because both code paths looked up sample_data by post-transform output column name (a flat key) instead of walking the JSON pointer to the nested location. Uncovered by LOTC-691 (trafficpeak/siem) where the fixture's primary value sat at 1491303422 (2017-04-04) without anyone noticing.

Changes

  • scripts/configurator/transform_organizer.py — replaced internal _resolve_sample_key(col, sample) -> str | None with _resolve_sample_path(col, sample) -> tuple[str, ...] | None. Added _get_at_path and _set_at_path helpers. Both _shift_stale_timestamps and _shift_stale_datetime_primary now read/write through path tuples (single- or multi-segment).
  • src/validate/sample_data_freshness.rs — added resolve_primary_value helper that mirrors the Python algorithm (output_name → from_json_pointers via serde_json::Value::pointer → from_input_field). Primary-epoch lookup now also accepts Value::String containing a numeric epoch (the actual SIEM fixture stores "1491303422" as a JSON string).
  • tests/test_timestamp_freshness.pyTestResolveSampleKey rewritten as TestResolveSamplePath for new tuple semantics; added 4 nested-pointer tests covering numeric/string/fresh/datetime cases. Existing single-segment tests preserved as regression coverage.
  • Rust teststest_stale_nested_pointer_numeric_epoch_warns, test_stale_nested_pointer_string_epoch_warns, test_fresh_nested_pointer_epoch_passes.

Resolver priority (output_name > from_json_pointers > from_input_field > null-fallback) preserved across the refactor. Backward compat for cdn-insights / bot-insights / akamai_ds2 (single-segment pointers) confirmed by existing tests still passing.

Test plan

  • python3 -m pytest tests/ — 135/135 pass
  • cargo test --bin bundle-validator — 60/60 pass
  • cargo fmt clean
  • cargo clippy --bin bundle-validator -- -D warnings clean (4 pre-existing --all-targets clippy errors in verify.rs and summary_table_references.rs are unrelated)
  • CI green on this branch
  • Stale trafficpeak/siem fixture (current value 1491303422) gets auto-refreshed when its .originals/ re-runs the configurator pipeline

Out of scope

  • Re-touching the LOTC-691 SIEM fixture in this branch — the next configurator run on that bundle's .originals/ will auto-refresh it.

🤖 Generated with Claude Code

kevinborkman-hub and others added 2 commits April 29, 2026 10:39
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…tetime tests

Two related gaps surfaced by trafficpeak/apicontext (datetime primary `startTime`
sourced from single-segment `/start_time`, format `2006-01-02T15:04:05.999999Z`):

1. Pointer walking from the prior commit already covered the column-name vs
   raw-key mismatch (`startTime` ≠ `start_time`), but the format string was
   silently failing to parse — the Go-layout translator didn't recognize
   fractional-second tokens (`.999999`, `.000000`, etc.), so chrono/strptime
   tried to match the literal characters and `continue`d on parse failure.
   Added all six common variants (`.999999999`/`.000000000` down to
   `.999`/`.000`) to both the Rust and Python translators, mapped to chrono's
   `%.f` and Python's `.%f`.

2. Test coverage didn't include a datetime + single-segment + renamed-column
   shape end-to-end. Added `test_stale_renamed_datetime_warns` and
   `test_fresh_renamed_datetime_passes` (Rust) and
   `test_datetime_renamed_column_microseconds_shifted` (Python) to lock in
   the apicontext shape as a regression case.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

1 participant