Fix Scorecard SARIF upload warnings#198
Conversation
📝 WalkthroughSummary by CodeRabbit릴리스 노트
Walkthrough기본 브랜치 전용 Changes
Sequence Diagram(s)sequenceDiagram
participant GH as "GitHub Actions"
participant Art as "Artifact\n(ossf-scorecard-results)"
participant Script as "normalize_scorecard_sarif.py"
participant Upload as "upload-sarif action"
participant Verifier as "verify_supply_chain.py"
GH->>Art: 다운로드 `results.sarif`
GH->>Script: 실행 (input: results.sarif)
Script-->>Script: runs→results 순회\n플레이스홀더 URI 대체\nregion.startLine 보정\nproperties 추가\n파일 쓰기: normalized-scorecard-results.sarif
GH->>Upload: normalized-scorecard-results.sarif 업로드
Verifier->>GH: 워크플로우 검사\n(upload-sarif 존재 시 정규화 스크립트 참조 + sarif_file 확인)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Review rate limit: 4/5 reviews remaining, refill in 12 minutes. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@scripts/checks/normalize_scorecard_sarif.py`:
- Around line 18-37: The loop assumes runs/results/locations are dicts and can
call .get() which can raise AttributeError on malformed SARIF; add type guards
to skip non-dict entries: check isinstance(run, dict) before iterating
run.get("results", []), check isinstance(result, dict) before iterating
result.get("locations", []), and check isinstance(location, dict) before
accessing location.get("physicalLocation"); retain the existing isinstance
checks for physical_location and artifact_location and only mutate
artifact_location["uri"], physical_location["region"], and properties when those
checks pass (use the same variable names sarif, run, result, location,
physical_location, artifact_location, properties as in the diff).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: b96bfbd6-c3af-4e23-81ba-941f50b48c8c
📒 Files selected for processing (4)
.github/workflows/ossf-scorecard.ymlscripts/checks/normalize_scorecard_sarif.pyscripts/checks/verify_supply_chain.pyservices/analysis-engine/tests/test_supply_chain_policy.py
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
scripts/checks/normalize_scorecard_sarif.py (1)
37-37:⚠️ Potential issue | 🟠 Major | ⚡ Quick win
region.startLine보장이 현재 구현으로는 완전하지 않습니다.
region딕셔너리가 이미 존재하지만startLine이 없는 경우, 현재 코드는 값을 채우지 못해 업로드 경고가 재발할 수 있습니다.🔧 제안 수정안
- physical_location.setdefault("region", {"startLine": 1}) + region = physical_location.get("region") + if not isinstance(region, dict): + region = {} + physical_location["region"] = region + region.setdefault("startLine", 1)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/checks/normalize_scorecard_sarif.py` at line 37, The current setdefault call only creates a region when missing but doesn't add startLine if region exists without it; update the logic around the physical_location handling so that after ensuring physical_location has a "region" dict you also ensure that region contains "startLine" with a default of 1 (e.g., obtain or create physical_location["region"] and set region["startLine"]=1 only if missing). Target the physical_location / "region" / "startLine" keys in normalize_scorecard_sarif.py when normalizing SARIF entries.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@scripts/checks/normalize_scorecard_sarif.py`:
- Around line 18-26: The loop assumes runs/results/locations are lists and will
raise TypeError for unexpected container types; update the iteration guards to
check container types before iterating: ensure sarif.get("runs") yields a list
(e.g., verify isinstance(runs, list) before the for runs loop) and similarly
verify isinstance(run.get("results"), list) before iterating results and
isinstance(result.get("locations"), list) before iterating locations so non-list
values are safely skipped; reference the variables sarif, runs, run, result, and
location when applying these type guards.
---
Duplicate comments:
In `@scripts/checks/normalize_scorecard_sarif.py`:
- Line 37: The current setdefault call only creates a region when missing but
doesn't add startLine if region exists without it; update the logic around the
physical_location handling so that after ensuring physical_location has a
"region" dict you also ensure that region contains "startLine" with a default of
1 (e.g., obtain or create physical_location["region"] and set
region["startLine"]=1 only if missing). Target the physical_location / "region"
/ "startLine" keys in normalize_scorecard_sarif.py when normalizing SARIF
entries.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 0ae4be42-3b7e-4ff3-934a-749103fc8fc1
📒 Files selected for processing (2)
scripts/checks/normalize_scorecard_sarif.pyservices/analysis-engine/tests/test_supply_chain_policy.py
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@scripts/checks/normalize_scorecard_sarif.py`:
- Around line 37-41: physical_location.region.startLine must be normalized to a
valid positive integer (not just left as-is when present); in the block handling
physical_location and region (variables physical_location and region in
normalize_scorecard_sarif.py) replace the region.setdefault("startLine", 1) with
logic that: ensures region exists, reads region.get("startLine"),
coerces/validates it as an int >= 1, and if missing/None/not-an-int/<1 then sets
region["startLine"] = 1 (otherwise keep the validated int); this prevents
leaving values like 0, null, or strings that trigger SARIF upload warnings.
In `@services/analysis-engine/tests/test_supply_chain_policy.py`:
- Around line 464-468: The test only asserts bandscopeOriginalUri but misses the
normalized metadata flag; update the assertions after the existing checks (which
reference rewritten and location) to also validate the repository-level flag by
asserting that location["properties"]["bandscopeRepositoryLevelFinding"] is the
expected boolean (e.g., True) so regressions for bandscopeRepositoryLevelFinding
are caught; locate the assertions using the variables rewritten and location in
test_supply_chain_policy.py and add the single boolean assertion for that
properties key.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: ba7227a5-3d34-4e7f-8113-0169d1488f7e
📒 Files selected for processing (2)
scripts/checks/normalize_scorecard_sarif.pyservices/analysis-engine/tests/test_supply_chain_policy.py
There was a problem hiding this comment.
♻️ Duplicate comments (1)
scripts/checks/normalize_scorecard_sarif.py (1)
18-26:⚠️ Potential issue | 🟠 Major | ⚡ Quick win
runs/results/locations컨테이너 타입 가드가 없어 비정상 SARIF에서 정규화가 중단될 수 있습니다.Line 18, Line 22, Line 25에서
.get(..., [])결과를 바로 순회해None/int/dict같은 값이 들어오면TypeError또는 비의도 순회가 발생합니다. 리스트 타입 검증 후 순회하세요.🔧 제안 수정안
- runs = sarif.get("runs", []) if isinstance(sarif, dict) else [] + runs_raw = sarif.get("runs", []) if isinstance(sarif, dict) else [] + runs = runs_raw if isinstance(runs_raw, list) else [] for run in runs: if not isinstance(run, dict): continue - for result in run.get("results", []): + results = run.get("results", []) + if not isinstance(results, list): + continue + for result in results: if not isinstance(result, dict): continue - for location in result.get("locations", []): + locations = result.get("locations", []) + if not isinstance(locations, list): + continue + for location in locations: if not isinstance(location, dict): continueAs per coding guidelines, “Treat files, URLs, metadata, model artifacts, and project files as untrusted input”.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/checks/normalize_scorecard_sarif.py` around lines 18 - 26, The code iterates over .get(..., []) for runs/results/locations without verifying those values are lists, which can raise TypeError on malformed SARIF; update the loops that use runs, run.get("results", []), and result.get("locations", []) by first assigning each .get(...) result to a local variable (e.g., runs_list, results_list, locations_list), check isinstance(..., list) before iterating, and skip or default to [] if not a list; keep the existing dict guards for run/result/location (run/result/location variables) and only enter the inner loops when the container type checks pass.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@scripts/checks/normalize_scorecard_sarif.py`:
- Around line 18-26: The code iterates over .get(..., []) for
runs/results/locations without verifying those values are lists, which can raise
TypeError on malformed SARIF; update the loops that use runs, run.get("results",
[]), and result.get("locations", []) by first assigning each .get(...) result to
a local variable (e.g., runs_list, results_list, locations_list), check
isinstance(..., list) before iterating, and skip or default to [] if not a list;
keep the existing dict guards for run/result/location (run/result/location
variables) and only enter the inner loops when the container type checks pass.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: e636df77-2d34-40eb-959b-5a5591f36721
📒 Files selected for processing (2)
scripts/checks/normalize_scorecard_sarif.pyservices/analysis-engine/tests/test_supply_chain_policy.py
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@scripts/checks/normalize_scorecard_sarif.py`:
- Around line 52-57: The code uses physical_location.setdefault("properties",
{}) but leaves non-dict existing values untouched, causing missing
repository-level metadata; change normalization to replace non-dict values with
a fresh dict before setting keys: retrieve properties =
physical_location.get("properties"), if not isinstance(properties, dict) then
assign a new dict to physical_location["properties"], and then set
bandscopeOriginalUri and bandscopeRepositoryLevelFinding on that dict (mirroring
how region is normalized) so untrusted or malformed inputs are sanitized.
In `@services/analysis-engine/tests/test_supply_chain_policy.py`:
- Around line 634-677: Add a new malformed-properties case to the
test_scorecard_sarif_normalizer_skips_malformed_locations test: update the SARIF
input written in the test so one of the location objects contains a
physicalLocation whose properties field is an invalid type (e.g., null or a
string) in addition to the existing malformed entries; ensure the assertion
still expects normalization to succeed (rewritten == 1) and that the subsequent
normalized["runs"][...]["results"][...]["locations"][...]["physicalLocation"]
access is valid (so the normalizer function normalize_scorecard_sarif must
tolerate and sanitize physicalLocation.properties being null or a string and
produce a proper object or omit it without raising).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: d17be6cc-295b-4126-af72-4d84477a29db
📒 Files selected for processing (2)
scripts/checks/normalize_scorecard_sarif.pyservices/analysis-engine/tests/test_supply_chain_policy.py
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@scripts/checks/verify_supply_chain.py`:
- Around line 459-466: The current check that looks for
"github/codeql-action/upload-sarif" only inspects a single file string and can
be fooled by unrelated occurrences; update the logic so that instead of testing
the whole repository text, you iterate over workflows (like the existing
workflow-level loop at lines ~467-475) and for each workflow ensure that when a
step uses "github/codeql-action/upload-sarif" in that workflow's steps, the same
workflow contains OSSF_SARIF_NORMALIZER and OSSF_NORMALIZED_SARIF_UPLOAD (i.e.,
check step-level entries inside the same workflow object), and only append the
missing message if any of those two are absent within that same workflow
context.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 3605d6aa-7fee-4559-9327-90984d3b05e3
📒 Files selected for processing (4)
.github/workflows/ossf-scorecard.ymlscripts/checks/normalize_scorecard_sarif.pyscripts/checks/verify_supply_chain.pyservices/analysis-engine/tests/test_supply_chain_policy.py
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
|
@coderabbitai resume |
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@scripts/checks/verify_supply_chain.py`:
- Around line 364-388: The check in verify_supply_chain.py currently searches
the whole upload-sarif step string (step_content) for sarif_file tokens which
can be spoofed in env or comments; change the logic to extract only the 'with:'
block inside the found step_lines and parse the 'sarif_file:' key/value there
(e.g., find the line index of a line that stripped == "with:" and collect
subsequent more-indented lines until dedent, then match the sarif_file value)
and compare that value against OSSF_NORMALIZED_SARIF_UPLOAD and the allowed
filenames instead of scanning step_content; update the branch that uses
has_normalizer_step and OSSF_NORMALIZED_SARIF_UPLOAD to use the parsed
sarif_file value, and add a regression unit test that constructs a step where
sarif_file appears in env but with.sarif_file is results.sarif to assert the
check fails.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 09c094f1-7977-402c-9a6f-223b5305798d
📒 Files selected for processing (4)
.github/workflows/ossf-scorecard.ymlscripts/checks/normalize_scorecard_sarif.pyscripts/checks/verify_supply_chain.pyservices/analysis-engine/tests/test_supply_chain_policy.py
✅ Actions performedReviews resumed. |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@scripts/checks/verify_supply_chain.py`:
- Around line 420-454: The current detection uses a workflow-wide
has_normalizer_step and string heuristics to mark Scorecard SARIF uploads;
instead, make normalizer/dowload-artifact/with.sarif_file checks job-local:
inside the loop over step_blocks (using index, step_indent, step_lines) compute
whether that job contains a download-artifact step, a normalizer step that
references OSSF_SARIF_NORMALIZER or produces OSSF_NORMALIZED_SARIF, and whether
the upload step’s sarif_file (via upload_step_sarif_file) matches the
normalizer’s output; replace the global has_normalizer_step usage with this
per-job check (references: has_normalizer_step, upload_step_sarif_file,
OSSF_NORMALIZED_SARIF, workflow_job_content, scorecard_sarif_upload), and add
regression tests covering (a) a separate upload job that uploads generic
results.sarif and (b) a job that runs normalizer but writes a different filename
to ensure only uploads from the same job with matching normalized output pass.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 80f5fc8f-dd09-4cab-a75d-f6215d479b2c
📒 Files selected for processing (2)
scripts/checks/verify_supply_chain.pyservices/analysis-engine/tests/test_supply_chain_policy.py
There was a problem hiding this comment.
♻️ Duplicate comments (1)
scripts/checks/verify_supply_chain.py (1)
495-504:⚠️ Potential issue | 🟠 Major | ⚡ Quick win분리된 업로드 job에서 raw
results.sarif우회가 아직 가능합니다.Line 500-501이
results.sarif를 Scorecard 업로드로 분류할 때 동일 job의ossf/scorecard-action만 허용 조건으로 봐서,actions/download-artifact로ossf-scorecard-results를 받는 분리 job의 raw 업로드는 정규화 검사 자체를 건너뜁니다.🔧 제안 diff
- scorecard_sarif_upload = sarif_file == OSSF_NORMALIZED_SARIF or ( - sarif_file is not None - and ( - "scorecard" in sarif_file - or ( - sarif_file == "results.sarif" - and "ossf/scorecard-action" in job_content_without_comments - ) - ) - ) + scorecard_sarif_upload = sarif_file == OSSF_NORMALIZED_SARIF or ( + sarif_file is not None + and ( + "scorecard" in sarif_file + or (sarif_file == "results.sarif" and job_has_scorecard_artifact_source) + ) + )같은 패턴(분리 upload job + raw
results.sarif) 회귀 테스트도 함께 추가해 두는 것이 안전합니다.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/checks/verify_supply_chain.py` around lines 495 - 504, The current logic that sets scorecard_sarif_upload allows raw "results.sarif" to be treated as Scorecard output only when "ossf/scorecard-action" exists in the same job, which lets a separate upload job that downloads an artifact (e.g., via actions/download-artifact or artifact name ossf-scorecard-results) bypass normalization checks; update the condition that computes scorecard_sarif_upload (the expression using sarif_file, OSSF_NORMALIZED_SARIF, and job_content_without_comments) to reject raw "results.sarif" when the job is an upload/download-only job by also detecting download-artifact usage or the artifact name (e.g., check for "actions/download-artifact" or "ossf-scorecard-results" in job_content_without_comments) and treat those as not Scorecard-generated, and add a regression test covering the split upload job + raw results.sarif case to ensure the bypass no longer occurs.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@scripts/checks/verify_supply_chain.py`:
- Around line 495-504: The current logic that sets scorecard_sarif_upload allows
raw "results.sarif" to be treated as Scorecard output only when
"ossf/scorecard-action" exists in the same job, which lets a separate upload job
that downloads an artifact (e.g., via actions/download-artifact or artifact name
ossf-scorecard-results) bypass normalization checks; update the condition that
computes scorecard_sarif_upload (the expression using sarif_file,
OSSF_NORMALIZED_SARIF, and job_content_without_comments) to reject raw
"results.sarif" when the job is an upload/download-only job by also detecting
download-artifact usage or the artifact name (e.g., check for
"actions/download-artifact" or "ossf-scorecard-results" in
job_content_without_comments) and treat those as not Scorecard-generated, and
add a regression test covering the split upload job + raw results.sarif case to
ensure the bypass no longer occurs.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 64eb95ac-1d4c-405d-bc37-b9f63dbb8454
📒 Files selected for processing (2)
scripts/checks/verify_supply_chain.pyservices/analysis-engine/tests/test_supply_chain_policy.py
Summary
upload-sarifso GitHub no longer emits invalid URI warnings.Verification
python3 scripts/checks/verify_supply_chain.pyuv run --project services/analysis-engine pytest services/analysis-engine/tests/test_supply_chain_policy.py -quv run --project services/analysis-engine ruff check services/analysis-engine/tests/test_supply_chain_policy.py scripts/checks/verify_supply_chain.py scripts/checks/normalize_scorecard_sarif.pyuv run --project services/analysis-engine ruff format --check services/analysis-engine/tests/test_supply_chain_policy.py scripts/checks/verify_supply_chain.py scripts/checks/normalize_scorecard_sarif.pygit diff --checkSecurity Notes
no file associated with this alert; existing file-associated SARIF locations are preserved.Refs #194.