DO NOT MERGE: TECH-6484 collector PR-env validation (throwaway)#1443
Closed
chong-techops wants to merge 8 commits into
Closed
DO NOT MERGE: TECH-6484 collector PR-env validation (throwaway)#1443chong-techops wants to merge 8 commits into
chong-techops wants to merge 8 commits into
Conversation
Standalone single-replica service that serves the DB-sourced Prometheus gauges (the former /api/metrics/db scrape) off the request-serving pods. Executor-style: build-context = repo root, imports lib/metrics verbatim via tsx, so the exposed gauge families are identical to today's endpoint. - index.ts boots a node:http server (PORT, default 9090) with graceful shutdown - server.ts serves GET /metrics (updateDbMetrics -> getDbMetrics, TTL- and pool-gated in lib/) and GET /health - server.test.ts mocks the metrics module and exercises the HTTP wiring on an ephemeral port (200 + gauges, /health, 500 on refresh failure, 404) Containerization, CI, and deploy wiring follow in later commits (TECH-6484).
Dockerfile metrics-collector stage (executor-style: reuses lib/ + root
node_modules via tsx, shims server-only, serves on :9090) and a
docker-bake.hcl target/group + METRICS_COLLECTOR_ECR_REPO var. Bake config
validated with 'buildx bake --print'.
Image build/push runs once the ECR repo (keeperhub-metrics-collector-{env})
exists and a deploy workflow wires the bake target -- following commits.
…(Stage 3)
- deploy/metrics-collector/{staging,prod}/values.yaml: replicaCount 1, serves
/metrics on :9090, ServiceMonitor scrapes the DB gauges from this one pod
(deterministic, no hashmod). Minimal env -- only DATABASE_URL; no SQS/Turnkey.
- .github/workflows/deploy-metrics-collector.yaml: standalone (events-style)
trigger on staging/prod + dispatch; bakes the metrics-collector target and
helm-deploys via the shared techops-services/common chart, release name
keeperhub-metrics-collector, namespace keeperhub.
Does NOT cut over: the app's db-metrics ServiceMonitor and /api/metrics/db
route stay in place until the collector is verified in staging (Stage 4).
Requires the ECR repo + TFC workspace (terraform drafted in
techops_infrastructure, applied by infra) before the first deploy.
Validated by building the image: the metrics-collector stage now (1) is included in the Docker source stage (COPY keeperhub-metrics-collector/) so the build doesn't fail, and (2) drops the COPY --from=builder generated-file lines -- the metrics import graph references no builder-generated file at runtime (lib/db/schema only uses lib/types/integration via import type, which tsx erases). The stage now depends on source only, so the image builds without the expensive Next builder stage. Confirmed: image builds, prometheus-api + db-metrics import cleanly, /health 200, /metrics runs the real queries.
…ing (TECH-6484)
Stage 4 (cutover):
- Gate the app's /api/metrics/db route to 404 via METRICS_DB_OFFLOADED so the
heavy aggregate scan never runs on the request-serving pods.
- Remove the db-metrics ServiceMonitor from deploy/keeperhub/{staging,prod} and
set METRICS_DB_OFFLOADED=true. /api/metrics/api is unchanged.
Stage 5 (PR-env wiring + docs):
- deploy/pr-environment/metrics-collector.template.yaml (single replica, PR DB,
ServiceMonitor off).
- deploy-pr-environment.yaml: opt-in deploy-pr-metrics label -> build-collector-image
job + a gated deploy step. Default off, so existing PR envs are unaffected.
- METRICS_REFERENCE.md note on the collector + offload.
Depends on the collector being live + verified in staging (PR #1439). Cutover
must merge only after that, else a DB-metrics gap.
…uild-check CI Review feedback on PR #1439 (suisuss): - #3 Startup defined twice: drop the helm command/args override from deploy/metrics-collector/{staging,prod}/values.yaml; rely on the Dockerfile CMD (tsx keeperhub-metrics-collector/index.ts) as the single source of truth. - #4 Nothing builds this stage in CI: add .github/workflows/collector-build-check.yml -- a PR smoke job that builds the metrics-collector target and runs keeperhub-metrics-collector/import-check.ts inside the image, asserting the runtime import graph resolves. Catches a future value-import of a builder-generated module (server.test.ts mocks the module, so it can't). Validated locally: image builds, import-check prints IMPORTS_OK, exit 0. - #1 ServiceMonitor port name: matches the executor pattern; will verify the actual Prometheus target in staging (replied on the PR).
…-collector-validate
PR Environment DeployedYour PR environment has been deployed! Environment Details:
Components:
The environment will be automatically cleaned up when this PR is closed or merged. |
Author
|
Validation complete (B-now): collector image built from PR sha, deployed as a real pod (Running 1/1), /health 200 and /metrics 200 with gauge families against the PR DB. Combined with the chart-render port-binding check (Jacob #1) and validation A (48 families on the populated staging DB), the collector is verified end-to-end. Throwaway PR - closing + tearing down. |
🧹 PR Environment Cleaned UpThe PR environment has been successfully deleted. Deleted Resources:
All resources have been cleaned up and will no longer incur costs. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Throwaway PR to validate the real metrics-collector pod + the deploy-pr-metrics PR-env wiring before #1442 merges. Combines #1439 (service+image) and #1442 (PR-env wiring). Will be closed + torn down after validation. Do not review/merge.