Skip to content

fix(release-gate): enable DTrack in test-full and wire API key into security-tests (closes #200)#202

Open
brandonrc wants to merge 1 commit into
mainfrom
fix/200-dtrack-in-test-full
Open

fix(release-gate): enable DTrack in test-full and wire API key into security-tests (closes #200)#202
brandonrc wants to merge 1 commit into
mainfrom
fix/200-dtrack-in-test-full

Conversation

@brandonrc

Copy link
Copy Markdown
Contributor

Summary

The security-tests DTrack subsuite was hard-failing under RELEASE_GATE=1 because DEPENDENCY_TRACK_API_KEY and DEPENDENCY_TRACK_URL were never set in the test cluster. The fail-on-skip behavior is correct (it catches silent-success regressions); the actual gap was that DTrack was deliberately omitted from values-test-full.yaml with the comment "requires complex bootstrapping, handled separately".

This PR closes that gap. DTrack is a 3,190 LoC backend integration (services/dependency_track_service.rs + handler + chart subchart); leaving it without release-gate coverage means SBOM upload + findings sync regressions ship undetected.

Changes

helm/values-test-full.yaml

Enable DependencyTrack with the same sizing as values-smoke-with-deps.yaml (250m/1Gi requests, 2 CPU/4Gi limits). The chart's bootstrap init container in artifact-keeper-iac already handles the admin force-password-change + team API key provisioning and writes the key to /shared/dtrack-api-key on the backend pod. Just need a non-empty adminPassword to unblock the flow.

.github/workflows/release-gate.yml security-tests job

New step Wait for Dependency-Track bootstrap and extract API key runs between the existing runner-pod-IP resolution and Run security tests:

  1. Wait for the DTrack deployment to become Available (5 min budget; DTrack JVM startup is slow).
  2. Poll the backend pod for /shared/dtrack-api-key via kubectl exec (up to 3 min).
  3. Export DEPENDENCY_TRACK_URL and DEPENDENCY_TRACK_API_KEY into the job env. The key is ::add-mask::'d so it stays out of logs.

On bootstrap failure the step warns and exits 0 (the rest of the security suite still runs). The DTrack subsuite then surfaces the bootstrap issue as a clear skip-fail rather than masking it.

Resource impact

Adds ~2 CPU / 4 Gi memory to the test-full deploy (limits). Combined stack ~7 CPU / 10 Gi. The ARC namespace has no enforced ResourceQuota (verified: create-test-namespace.sh does not create one, and no chart template defines one). Cluster headroom on Rocky is adequate; if this ever changes, the right answer is to route security-tests to the ak-beefy-runners pool already configured in artifact-keeper-iac/argocd/arc-beefy-runners-values.yaml.

Why not just remove the DTrack subsuite

It would leave a 3,190 LoC backend integration with zero gate coverage. clean-install-smoke-with-deps covers chart-wiring (does DTrack reach Ready), not actual SBOM upload + findings sync. The latter is what regresses silently when DTrack integration code drifts.

Test plan

  • YAML validates for both files
  • Manual workflow_dispatch of Release Gate after merge; verify the new step extracts the API key and the DTrack subsuite either passes or fails on real DTrack integration findings (not on missing env vars)
  • Cluster headroom holds (no DTrack pod stuck in Pending due to insufficient resources)

Closes #200

…ck API key into security-tests

The security-tests DTrack subsuite asserts on real BOM upload + findings
sync. values-test-full.yaml previously omitted DTrack with a comment
"requires complex bootstrapping, handled separately", which left the
subsuite hard-failing under RELEASE_GATE=1 (the gate exists precisely
to catch silent-success skips on integrations).

Two changes:

1. helm/values-test-full.yaml: enable dependencyTrack with sizing from
   helm/values-smoke-with-deps.yaml (250m / 1Gi requests, 2 CPU / 4Gi
   limits). The chart's bootstrap init container handles the admin
   force-password-change + team API key provisioning and writes the
   key to /shared/dtrack-api-key on the backend pod.

2. .github/workflows/release-gate.yml security-tests job: add a step
   between the runner-pod-IP resolution and "Run security tests" that:
     - Waits for the DTrack deployment to become Available (5 min budget)
     - Polls the backend pod for /shared/dtrack-api-key via kubectl exec
     - Exports DEPENDENCY_TRACK_URL + DEPENDENCY_TRACK_API_KEY into the
       job env, masking the key
   On bootstrap failure the step warns and exits 0 so the rest of the
   security suite still runs; the DTrack subsuite will then skip-fail
   as before, surfacing the bootstrap-side issue rather than masking it.

The 3,190 LoC DTrack integration in the backend now gets real release-
gate coverage, not just chart-wiring coverage from
clean-install-smoke-with-deps.

Closes #200
@brandonrc

Copy link
Copy Markdown
Contributor Author

Investigation update from a focused triage on the self-test failures:

The DTrack standup is NOT the cause of the self-test failures

The agent that dug into this confirmed DTrack does not write scan_results rows. It only consumes SBOMs as a one-way push (backend/src/services/scanner_service.rs:2965, submit_sbom_to_dependency_track). Enabling DTrack adds zero findings to the backend's scan-completion query.

The actual scanners that produce findings are hardcoded at scanner_service.rs:2312-2389:

  • DependencyScanner (always on, queries OSV + GitHub Advisory independently)
  • ImageScanner + TrivyFsScanner + IncusScanner (when TRIVY_URL is set)
  • GrypeScanner (always on, CLI-based, parses package-lock.json locally)
  • OpenScapScanner (when OPENSCAP_URL is set)

For fixture A (Trivy scaled to zero): DependencyScanner + GrypeScanner still run unconditionally and find CVE-2019-10744 on lodash 4.17.4 via OSV/GHSA without needing the Trivy pod. The gate's findings_count >= 1 assertion is satisfied.

For fixture B (clean lodash 4.17.21): same scanners fire. The gate polls GET /security/artifacts/{id}/scans which returns rows from all scanners ordered created_at DESC and takes .items[0]. Whichever scanner finished most recently wins, and at least one of them is reporting findings the test doesn't expect.

So the self-test contract was broken BEFORE my DTrack changes. The failures here surface because of background scanner behavior, not DTrack.

The right fix (separate from this PR)

The self-tests need to either:

  1. Pin the gate assertion to a specific scan_type=trivy so non-Trivy findings don't satisfy it
  2. Disable DependencyScanner + GrypeScanner in the fixture overlays (env vars or chart toggles)
  3. Use a fresh in-house fixture that has no known CVEs in any of OSV/GHSA/Trivy/Grype

This is independent from this PR's scope.

Proposed path for THIS PR

Two options:

Option A (recommended): split DTrack out of values-test-full.yaml. Move the DTrack enablement to a new helm/values-test-dtrack.yaml and have only the security-tests job apply it via --values (or helm upgrade --reuse-values --values). The self-tests stay on the lean values-test-full.yaml and remain green.

Option B: keep this PR as-is, file a separate issue for the self-test contract. Self-tests stay red until that lands, but the gate-side benefit of this PR (real DTrack release-gate coverage) materializes immediately.

I'd vote A for cleaner separation. Will rework if you concur.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[gate-fix] DependencyTrack subsuite hard-fails security-tests because DTrack not deployed in test-full

1 participant