Claude/add security audit workflow#66
Conversation
Scheduled GitHub Actions workflow that uses Claude to perform research-driven dependency auditing, static analysis (Semgrep), and dynamic testing, then files labeled GitHub issues for each finding. Includes SHA-pinned actions, prompt injection defenses, concurrency guard, 60-minute timeout, and artifact upload for raw scan output. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add explicit github_token to Claude action step
- Add pip-audit alongside semgrep for Python ecosystem support
- Remove noisy "no findings" issue creation; log to stdout instead
- Drop WebFetch from allowed tools to reduce prompt injection surface
- Fix claude_args block scalar to plain string (avoids trailing newline)
- Use ${{ github.workflow }} for concurrency group (more portable)
- Remove redundant 2>/dev/null || true on gh label create --force
- Bump timeout from 60 to 90 minutes for larger repos
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Claude finished @fynyky's task in 2m 47s —— View job Claude Code ReviewTodo List:
SummaryNice addition — a scheduled AI-driven audit that complements the existing test/lint CI and gives findings a structured issue format. Pinned action SHAs, sensible permissions ( The main gaps are around fit with this specific repo (Node.js project, Python-only audit setup) and long-term issue-tracker hygiene (no real dedup). Details inline; recap below. Higher-impact
Lower-impact
Things working well
One small noteI couldn't fully verify exact |
| - name: Install audit tools | ||
| run: python3 -m pip install --quiet semgrep==1.164.0 pip-audit |
There was a problem hiding this comment.
Missing Node.js setup — likely blocks several audit steps for this repo.
This repo is a Node.js library (package.json declares parcel, playwright, web-test-runner, etc.), but the workflow only installs Python tooling. Without actions/setup-node and npm ci:
- Step 3 of the prompt (
npm audit --json) will run against the lockfile but won't havenode_modules, and may use whatever Node version is preinstalled on the runner (currently fine, but not pinned/reproducible). - Step 5 ("run the existing test suite to establish a baseline") cannot succeed —
npm testrunsnpm run lint && npx playwright install && web-test-runner, none of which work withoutnpm cifirst. - Semgrep's JS analysis runs on source files only, but the dependency posture differs from CI.
Suggest adding a setup step before line 23:
- uses: actions/setup-node@<sha> # v6, matching test.yml
with:
node-version: 24
- run: npm ci| - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4 | ||
|
|
||
| - name: Install audit tools | ||
| run: python3 -m pip install --quiet semgrep==1.164.0 pip-audit |
There was a problem hiding this comment.
Inconsistent pinning: semgrep is pinned, pip-audit is not.
For reproducible scheduled audits, both tools should be pinned to specific versions. As written, a pip-audit release could silently change the workflow's behavior between runs. Either pin both:
| run: python3 -m pip install --quiet semgrep==1.164.0 pip-audit | |
| run: python3 -m pip install --quiet semgrep==1.164.0 pip-audit==2.7.3 |
…or leave both unpinned (accepting the risk for fresh rulesets). One-sided pinning is the worst of both worlds.
Also worth noting: pinning semgrep==1.164.0 means rulesets via --config p/... are still fetched live and can update independently — the pin only freezes the engine, not the rules. That may or may not be what you want for a security tool that benefits from up-to-date detections.
| - name: Ensure security label exists | ||
| env: | ||
| GH_TOKEN: ${{ github.token }} | ||
| run: gh label create security --color d73a4a --description "Security vulnerability" --force |
There was a problem hiding this comment.
--force will overwrite maintainer customizations every week.
gh label create --force will reset the color and description if someone has already customized the security label (e.g., changed it to a different shade, added an emoji, or expanded the description). On a weekly cron, this silently undoes that customization each Monday.
Safer pattern — only create when missing:
| run: gh label create security --color d73a4a --description "Security vulnerability" --force | |
| run: gh label list --json name -q '.[].name' | grep -qx security || gh label create security --color d73a4a --description "Security vulnerability" |
Or simply tolerate the existence error:
run: gh label create security --color d73a4a --description "Security vulnerability" 2>/dev/null || true| **6. Check for duplicate issues** | ||
| ``` | ||
| gh issue list --label security --state open --json number,title | ||
| ``` |
There was a problem hiding this comment.
Duplicate check only filters --state open — closed/triaged findings will be re-reported every week.
If a maintainer closes a generated issue as wontfix, not planned, or after applying an accepted-risk decision, the next Monday's run won't see it (it's not state open) and will create a brand-new duplicate. Over weeks this becomes a tracker-spam problem.
Two reasonable fixes:
-
Check
--state all:gh issue list --label security --state all --json number,title,state…and instruct Claude in the prompt to skip findings whose title closely matches an existing issue regardless of state.
-
Use a stable fingerprint (e.g.,
CVE-XXXX-XXXXXor a hash of package@version + rule ID) in the issue title or a hidden HTML comment, and have Claude grep all issues for that fingerprint before filing.
Without one of these, expect ~10 fresh issues per Monday once any have been triaged closed.
| If no genuine vulnerabilities are found, do not create any issues. Instead, | ||
| print a brief summary to stdout of what was scanned and confirm no issues were found. | ||
|
|
||
| claude_args: '--allowedTools "Bash,WebSearch"' |
There was a problem hiding this comment.
Bash is unrestricted, and WebFetch is missing despite the research step depending on it.
Two issues with this allow-list:
-
No
WebFetch. Step 2 of the prompt directs Claude to "actively research" CVEs from NVD, GitHub Advisory Database, and OWASP, and step 7 asks for advisory links.WebSearchreturns snippets/URLs but doesn't fetch page bodies — Claude can't read full advisory details withoutWebFetch. Either add it, or weaken step 2 to acknowledge the limitation. -
Bashis wide open. A scheduled workflow with unrestricted shell on a runner that hasGH_TOKENforissues: writeis a significant trust surface. The prompt itself is hardcoded (good), but step 2 explicitly tells Claude to fetch external content, and step 4 runs Semgrep with rulesets fetched fromp/owasp-top-ten— both routes for prompt injection that could try to talk Claude into executing arbitrary shell. The prompt's "SECURITY NOTICE" mitigates this somewhat but isn't a hard boundary.Consider scoping Bash to the commands actually needed, similar to
review.yml:claude_args: | --allowedTools "Bash(npm audit:*),Bash(pip-audit:*),Bash(semgrep:*),Bash(gh issue list:*),Bash(gh issue create:*),Bash(gh issue view:*),Bash(tee:*),Bash(jq:*),Read,Grep,Glob,WebSearch,WebFetch"
-
Style consistency. The single-line quoted form
'--allowedTools "Bash,WebSearch"'works but is fragile.review.ymluses theclaude_args: |block-scalar form — adopting the same here avoids quote-nesting surprises if the list grows.
| **4. Static analysis** | ||
| Run Semgrep with the OWASP Top 10 and secrets detection rules, plus any | ||
| language-specific ruleset appropriate for this repo: | ||
| ``` | ||
| semgrep --config p/owasp-top-ten --config p/secrets --json -o audit-semgrep.json . | ||
| ``` | ||
| Then manually review the source code for issues not caught by automated tools, | ||
| specifically looking for the vulnerability classes identified in step 2. |
There was a problem hiding this comment.
Consider uploading Semgrep results as SARIF to GitHub code scanning.
The current setup outputs JSON for artifact upload, but artifacts are only useful if someone goes looking for them. Semgrep can emit SARIF, and github/codeql-action/upload-sarif will surface findings as native code scanning alerts on the Security tab — with deduplication across runs, code-anchored line annotations, and a UI for dismissing false positives. That UI also doubles as the de-duplication mechanism this workflow currently lacks (see the comment on step 6).
Rough shape:
- name: Semgrep SARIF
run: semgrep --config p/owasp-top-ten --config p/secrets --sarif -o semgrep.sarif . || true
- uses: github/codeql-action/upload-sarif@<sha>
if: always()
with:
sarif_file: semgrep.sarifRequires security-events: write permission. The Claude-driven issue creation can then focus on findings code scanning can't surface (dynamic analysis, dependency advisories with no SARIF, manual review hits) rather than competing with the Security tab.
| - name: Claude security audit and issue creation | ||
| uses: anthropics/claude-code-action@537ffff2eff706bd7e3e1c3daf2d4b39067a9f85 # v1 | ||
| with: | ||
| anthropic_api_key: ${{ secrets.ANTHROPIC_API_KEY }} | ||
| github_token: ${{ github.token }} | ||
| track_progress: true |
There was a problem hiding this comment.
gh issue create invoked by Claude may not have a token in the bash environment.
The "Ensure security label exists" step explicitly sets GH_TOKEN: ${{ github.token }}, but this step doesn't. claude-code-action typically exposes the token to its internal MCP servers, but whether it also exports GH_TOKEN / GITHUB_TOKEN into the Bash shell that Claude invokes is action-version-specific. If it doesn't, every gh issue create call in step 7 of the prompt will fail with an auth error and the audit will produce no issues.
Belt-and-braces fix — add the env to this step too:
- name: Claude security audit and issue creation
uses: anthropics/claude-code-action@537ffff2eff706bd7e3e1c3daf2d4b39067a9f85 # v1
env:
GH_TOKEN: ${{ github.token }}
with:
...Worth verifying with a manual workflow_dispatch run before relying on the Monday schedule.
No description provided.