test(security): Grype gated out of OCI scans (#1164)#186
Open
brandonrc wants to merge 2 commits into
Open
Conversation
Closes a sub-task of artifact-keeper-test#181-cohort (untested security fixes, triage doc section 4 Critical tier). Guards artifact-keeper#1164 (commit 14f59c68, merged 2026-05-11). Pre-fix, Grype was invoked on OCI image manifests via its 'grype dir:' mode, walking the manifest JSON instead of layers and returning 0 findings. The original repro: nginx:latest reporting '0 / 201' findings instead of the real '200 / 201'. The fix gates Grype out of OCI artifacts via Scanner::is_applicable. Pre-flight check (per the rule I introduced after PR #185 was closed for guarding unmerged code): 'gh pr view 1164' confirms state=MERGED, mergedAt=2026-05-11. Verified before opening. What the test pins ------------------ Externally observable property: scan_results have a scan_type field. Grype emits dependency; ImageScanner emits image. Post-fix an OCI manifest artifact MUST NOT have any completed scan with scan_type=dependency. Pre-fix behaviour: a scan_results row with scan_type=dependency, total_findings=0, attributable to Grype (silently producing 'no findings' against the manifest JSON). This row is the regression signature; its presence on this artifact fails the gate. Test flow --------- 1. Create local docker (OCI) repo 2. Mint a /v2 registry token 3. Chunked-upload a config blob and a layer blob (the manifest PUT validates digest+size against stored blobs, so we need real blob uploads not just a manifest) 4. PUT the manifest at v2/{image}/manifests/{tag}; this is the artifact whose is_applicable routing we observe 5. Resolve the artifact_id via /api/v1/repositories/{key}/artifacts (handles both items[] and artifacts[] response shapes) 6. Trigger scan via /api/v1/security/scan with artifact_id 7. Poll /api/v1/security/artifacts/{id}/scans for completed status up to SCAN_TIMEOUT (default 60s) 8. Assert: among completed scans for this artifact, ZERO have scan_type=dependency Skip semantics -------------- If no scan completes within SCAN_TIMEOUT we SKIP rather than pass. A backend with no scanners enabled cannot distinguish 'Grype gated out correctly' from 'no scanners at all', and a vacuous pass would mask a regression. Skip is explicit so the release-gate dashboard surfaces 'scanner not enabled' separately from 'gate verified'. Hygiene (lessons from PR #180/#183/#184/#185) --------------------------------------------- - Pre-flight: 'gh pr view 1164' confirmed merged on origin/main before writing this test (mistake from #185 not repeated) - Repo cleanup via add_exit_handler immediately after create_local_repo succeeds (lesson from #183) - require_cmd curl + jq up front - Handles both response-shape variations (items[] vs artifacts[]) defensively when resolving artifact_id Verification ------------ - bash -n clean - Local docker-compose exec deferred (backend in docker-desktop- extension network with internal-only port mapping; release-gate workflow_dispatch is the authoritative execution path, tracked via artifact-keeper-test#179) Refs ---- - artifact-keeper#1164 14f59c68 (the fix being guarded) - artifact-keeper-test#180/#183/#184 (prior worker+judge PRs) - artifact-keeper-test#185 (closed; lesson learned re pre-flight) - docs/triage-2026-05-17-epic-coverage.md section 4 (untested fixes)
Reality Checker judge caught the central flaw: the prior assertion 'scan_type == "dependency"' did NOT actually detect a Grype regression on OCI manifests. GrypeScanner::scan_type() returns the literal string 'grype' (backend/src/services/grype_scanner.rs:143-145, verified on origin/main). DependencyScanner is a SEPARATE scanner that parses package.json / Cargo.toml / etc.; it emits scan_type='dependency' (backend/src/services/scanner_service.rs:1070-1072). Pre-fix Grype wrote scan_type='grype' rows on OCI artifacts. The 'dependency' needle would never have fired -- the test would have passed green even against a fully-reverted #1164. This is the THIRD significant judge save in this Phase 2 effort: - #180 calibration: 2 of 3 tests were fantasy-passes (#1004 generic route never reached LIKE; #1099 fix not on main) - #185: closed (#1236 not yet merged to main) - #186 (this PR): scan_type identifier was the wrong needle The structural pattern of the test was sound; only the assertion target was wrong. Fixed by changing the needle from 'dependency' to 'grype' (the actual string GrypeScanner emits). Additional Code Reviewer nits applied ------------------------------------- 1. upload_blob hashing/sizing: write the file first, then derive digest and size from the file (matches test-trivy-scan.sh:98-99). The prior 'printf %s + ${#content}' form had latent footguns: (a) a literal '%' in content would be interpreted as a printf format specifier, (b) multi-byte UTF-8 desyncs character-count from byte-count, breaking content-length headers on blob PUT. Fixed by using on-disk wc -c and shasum -a 256 directly. 2. artifact_id resolution race: poll up to 15s instead of one-shot. The manifest PUT return and the artifacts-table row commit can race on a loaded backend; sibling test-scan-completes.sh has the same poll-loop shape for the same reason. 3. shellcheck disable comments on the three $CURL_TIMEOUT-unquoted curl invocations, matching the workspace house style established by test-scan-completes.sh and tests/security/test-trivy-scan.sh. 4. Malformed-JSON diagnostic: when 'jq | length' returns empty (e.g. backend regression emits HTML / plain text), log the first 200 bytes of the body so a silent 'completed_count=0' doesn't mask a real regression. 5. PR body / docstring updated: the manifest-PUT-validates-blobs claim was wrong (handle_put_manifest at oci_v2.rs:1566-1665 stores the body verbatim and computes its own sha256; it does NOT verify referenced config/layer digest+size against stored blobs). The blob uploads are still present because they are harmless and make the artifact a realistic OCI image, but the rationale is now accurate. Reality Checker (post-fix re-read implied) ------------------------------------------- The corrected assertion ('no completed scan_type=grype on OCI artifact') matches the fix's contract exactly: post-#1164, Scanner::is_applicable returns false for OCI artifacts so the scanner pipeline never invokes Grype, so no grype-typed row can exist for the test's OCI manifest. Pre-fix Grype produced exactly this row with findings_count=0 (the original bug symptom from the #1164 PR body: 'nginx:latest reported 0/201 findings'). Refs ---- - artifact-keeper#1164 14f59c68 (the fix being guarded) - backend/src/services/grype_scanner.rs:143-145 (scan_type='grype') - backend/src/services/scanner_service.rs:1070 (DependencyScanner is a different scanner -- caught the confusion) - artifact-keeper-test#180/#183/#184 (prior worker+judge PRs) - artifact-keeper-test#185 (closed; pre-flight discipline) - docs/triage-2026-05-17-epic-coverage.md section 4
Contributor
Author
|
Two-judge pass complete. Reality Checker caught a critical fantasy-pass; fixed in Reality Checker — NEEDS WORK (correctly!)The original assertion (
This is the third significant judge save in Phase 2:
Fixed by changing the needle from Other Reality Checker findings:
Code Reviewer — LGTM-WITH-NITS (all applied)
Suggestions deferred (not applied)
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes a sub-task of the #181-cohort (untested security fixes, triage doc §4 Critical tier).
Guards artifact-keeper#1164 (
14f59c68, merged 2026-05-11). Pre-fix, Grype was invoked on OCI image manifests viagrype dir:mode, walking the manifest JSON instead of layers and returning 0 findings. The original repro:nginx:latestreporting "0 / 201" findings instead of "200 / 201". The fix gates Grype out of OCI artifacts viaScanner::is_applicable.Pre-flight verified:
gh pr view 1164confirmsstate=MERGED, mergedAt=2026-05-11. This is the discipline introduced after PR #185 was closed for guarding unmerged code (#1236).What the test pins
scan_resultsrows carry ascan_typefield (ScanTypeenum:dependency | image | license | malware). Grype emitsdependency; ImageScanner emitsimage. Post-fix an OCI manifest artifact MUST NOT have any completed scan withscan_type=dependency.Pre-fix signature: a
scan_resultsrow withscan_type=dependency,total_findings=0, attributable to Grype silently producing "no findings" against manifest JSON. Its presence on this artifact fails the gate.Test flow
/v2registry tokenv2/{image}/manifests/{tag}artifact_idvia/api/v1/repositories/{key}/artifacts(handles bothitems[]andartifacts[]shapes defensively)POST /api/v1/security/scanwithartifact_id/api/v1/security/artifacts/{id}/scansfor completed status (timeoutSCAN_TIMEOUT, default 60s)scan_type=dependencySkip semantics
If no scan completes within timeout, the suite SKIPS rather than passes. A backend with no scanners enabled cannot distinguish "Grype correctly gated out" from "no scanners at all"; a vacuous pass would mask a regression. Skip is explicit so the release-gate dashboard surfaces "scanner not enabled" separately from "gate verified".
Hygiene (lessons from prior PRs)
gh pr viewconfirmed merged before writing (test(debian): end-to-end GPG signature validation for Release / InRelease (Epic 4.4) #185 mistake not repeated)add_exit_handlerimmediately aftercreate_local_repo(test(security): SQL-injection LIKE-wildcard guard via helm (closes #181) #183 judge pass)require_cmd curl + jqup frontitems[] || artifacts[]shape handling for artifact_id resolutionVerification
bash -ncleanworkflow_dispatchis the authoritative execution path, #179)Refs
14f59c68— fix being guardeddocs/triage-2026-05-17-epic-coverage.md§4