Skip to content

LOTC-646: [TrafficPeak] Bot Insights CDN#175

Open
kevinborkman-hub wants to merge 56 commits into
mainfrom
LOTC-646-traffic-peak-bot-insights-cdn
Open

LOTC-646: [TrafficPeak] Bot Insights CDN#175
kevinborkman-hub wants to merge 56 commits into
mainfrom
LOTC-646-traffic-peak-bot-insights-cdn

Conversation

@kevinborkman-hub

Copy link
Copy Markdown
Collaborator

Summary

Contents

  • Raw bundle: 1 dashboard (Bot Insights (CDN).json), 1 transform (akamai.json), 3 summaries (bot_summary_day/hour/month.sql), 14 UDFs under functions/.
  • .originals/trafficpeak/bot-insights-cdn/ contains the preserved raw assets for pipeline re-runs.
  • bundle-config.json: data_category: security, table_name: logs, version: 1.0.0.

CI expectations

  • Routes to Track 1 (full pipeline) — verified via scripts/detect_track.py.
  • Pipeline will regenerate bundle.json, sample_data.json, rename the transform, inject template variables into the dashboard/summaries, and produce portables at portables/trafficpeak/bot_insights_cdn/1.0.0/.

Links

Test plan

  • CI runs the full pipeline successfully
  • Portables are generated at portables/trafficpeak/bot_insights_cdn/1.0.0/
  • Rust validator passes
  • Dashboards render with expected panels

Co-Authored-By: Claude Opus 4.7 (1M context) noreply@anthropic.com

kevinborkman-hub and others added 30 commits February 4, 2026 17:27
Reorganized transformations into provider subdirectories (akamai, cloudflare, cloudfront_firehose, default, fastly), cleaned metadata fields, extracted sample data, and updated SQL prefixes from reference_ to akamai_ for trafficpeak bundle. Created bundle.json with multi_stream method configuration, updated summary SQL files with template variables, and fixed dashboard structure with proper template variable patterns for primary dashboard.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ydrolix/integration-deployment-templates into LOTC-646-traffic-peak-bot-insights-cdn
github-actions Bot and others added 26 commits April 8, 2026 19:13
Fix originals with manual corrections from branch history:
- ai_category → user_agent_category in dashboard and summaries
- cacheStatus suppress: true → false in transform
- Summary field names updated to match transform output columns
- Remove bundle.json so CI regenerates it from bundle-config.json

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Update primary_url to Akamai techdocs URL
- Remove old portables/bot_insights_cdn/ (CI generates portables/trafficpeak/)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Shift reqTimeSec values to Apr 1 2026 UTC to pass the 183-day
freshness threshold. The pipeline's _shift_stale_timestamps() doesn't
handle transforms where the raw JSON key differs from the output
column name (reqTimeSec vs timestamp) — bug ticket to follow.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Previous commit only patched akamai.json on top of processed files.
This properly replaces the bundle dir with raw originals including
all fixes (ai_category, cacheStatus suppress, summary field names,
fresh timestamps) and removes stale portables for full pipeline re-run.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Resolve conflicts from prior CI run by keeping deletions — bundle dir
has only raw originals with all fixes (ai_category, cacheStatus suppress,
summary field names, fresh timestamps) plus bundle-config.json.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Restore original stale reqTimeSec values (188-510 days old) in originals
and bundle dir to verify the _shift_stale_timestamps() fix from LOTC-1412
can now resolve from_json_pointers and freshen them automatically.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Replace ${VAR_SUMMARY_HOUR/DAY/MONTH} with direct __PROJECT_NAME__.bot_summary_* refs to bypass the configurator's self-reference misrouting (tracked in LOTC-1435)
- Replace `count()` backticked column name with cnt_all in 3 panel queries (the summary's count() is aliased as cnt_all)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The summary column is stored as AggregateFunction(count) and when referenced
by its alias cnt_all, ClickHouse resolves it back to countMerge(`count()`),
which nests inside the outer countMergeIf() and fails with error 184.
Using `count()` directly (matching aws/bot-insights) avoids the resolution
round-trip.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…eline re-run

CI Stage 2 will auto-freshen stale primary timestamps via transform_organizer._shift_stale_timestamps.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
__extend__: dashboards/bot_insights_cdn.json
folderUid: hdx-security-folder
inputs:
DS_HYDROLIX-HYDROLIX-DATASOURCE: hdx-hydrolix-datasource

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

summary tables are missing...was that not automatically generated?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can delete this file since this uses summary tables explicitly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants