LOTC-1502: validator error on non-object sample_data#187
Open
kevinborkman-hub wants to merge 1 commit into
Open
LOTC-1502: validator error on non-object sample_data#187kevinborkman-hub wants to merge 1 commit into
kevinborkman-hub wants to merge 1 commit into
Conversation
Add src/validate/sample_data_shape.rs — a new hard-failing validator
that rejects settings.sample_data when it is a JSON array, scalar, or
empty object. Previously get_sample_data_as_json in src/deploy/default.rs
filtered through .as_object() and silently returned null for arrays/
scalars, causing insert_sample_data_if_present and verify_rows_ingested
to no-op; deploy reported success with no data inserted. The new
validator runs on every CI invocation (Track 1 and Track 2) so arrays
and scalars are caught uniformly before the deploy path runs.
Contract: settings.sample_data must be a non-empty JSON object when
present. Missing sample_data is allowed (legitimate no-op for
transforms with no sample). Wired in src/main.rs after
sample_data_exists and before sample_data_freshness.
Normalize 9 existing offenders so the validator lands green:
- aws/bot-insights: 5 transforms reduced from embedded array to the
first element; default's rich 23-field separate sample_data.json
promoted into the embedded copy so test ingests land a realistic
row rather than a {timestamp} stub. Separate sample_data.json files
normalized to match embedded for 4 of these transforms.
- aws/cdn-insights: 3 transforms reduced from embedded array to the
first element (separate files were already single objects).
- aws/cloudfront-to-kinesis: embedded raw tab-separated log string
replaced with the wrapper object {data_type, tsv} from the separate
sample_data_template.json.
Covered by 5 unit tests: non-empty object passes, missing passes,
array fails with configure_bundle.py hint, scalar fails, empty object
fails. All 62 tests in the main bin pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
kcorbett-hdx
requested changes
Apr 30, 2026
kcorbett-hdx
left a comment
Collaborator
There was a problem hiding this comment.
Can you remove the bot-insights bundle from this? Or are these changes necessary?
9143e4b to
18b6aa3
Compare
18b6aa3 to
9143e4b
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
src/validate/sample_data_shape.rshard-fails when a transform's embeddedsettings.sample_datais a JSON array, scalar, or empty object. Non-empty object passes; missing passes (legitimate no-op).src/deploy/default.rs:536(get_sample_data_as_json) where.as_object()returned null for arrays/scalars and causedinsert_sample_data_if_present+verify_rows_ingestedto no-op while deploy reported success.src/main.rsaftersample_data_existsand before the freshness warnings.Jira
LOTC-1502
Changes
New validator (
src/validate/sample_data_shape.rs)settings.sample_datamust be a non-empty JSON object when present.scripts/configure_bundle.pyfor normalization.Bundle normalizations (13 files)
aws/bot-insights(4 transforms: akamai_ds2, cloudflare, cloudfront_firehose, fastly)settings.sample_datareduced from array to first element; separatesample_data.jsonfiles likewise normalizedaws/bot-insights/defaultsample_data.jsonpromoted into embedded copy (was a{timestamp}-only stub)aws/cdn-insights(3 transforms: akamai_ds2, cloudflare, cloudfront_firehose)aws/cloudfront-to-kinesis{data_type, tsv}from separatesample_data_template.jsonOverlap with PR #184
The bot-insights changes overlap with commit
23e3e44on thefix-bot-insights-ai-categorybranch (PR #184). If #184 merges first, this PR rebases cleanly on bot-insights. The AI-category transform fix (554569c) from that branch is not included here and remains for #184.Test plan
cargo test— all 62 tests in main binary pass (57 pre-existing + 5 new)cargo run— every bundle returnsSUCCESSsample_dataacrossaws/andtrafficpeak/returns 0 offendersRisk
sample_data_exists(runs first) permits empty objects;sample_data_shapecatches them. The division of responsibility is intentional but worth documenting in a follow-up comment.🤖 Generated with Claude Code