Skip to content

Claude/add security audit workflow#66

Merged
fynyky merged 2 commits into
masterfrom
claude/add-security-audit-workflow
May 28, 2026
Merged

Claude/add security audit workflow#66
fynyky merged 2 commits into
masterfrom
claude/add-security-audit-workflow

Conversation

@fynyky

@fynyky fynyky commented May 28, 2026

Copy link
Copy Markdown
Owner

No description provided.

fynyky and others added 2 commits May 28, 2026 09:34
Scheduled GitHub Actions workflow that uses Claude to perform
research-driven dependency auditing, static analysis (Semgrep), and
dynamic testing, then files labeled GitHub issues for each finding.
Includes SHA-pinned actions, prompt injection defenses, concurrency
guard, 60-minute timeout, and artifact upload for raw scan output.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add explicit github_token to Claude action step
- Add pip-audit alongside semgrep for Python ecosystem support
- Remove noisy "no findings" issue creation; log to stdout instead
- Drop WebFetch from allowed tools to reduce prompt injection surface
- Fix claude_args block scalar to plain string (avoids trailing newline)
- Use ${{ github.workflow }} for concurrency group (more portable)
- Remove redundant 2>/dev/null || true on gh label create --force
- Bump timeout from 60 to 90 minutes for larger repos

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@claude

claude Bot commented May 28, 2026

Copy link
Copy Markdown

Claude finished @fynyky's task in 2m 47s —— View job


Claude Code Review

Todo List:

  • Read PR context and changed files
  • Analyze workflow file for security, correctness, and best practices
  • Read repo context (CLAUDE.md, package.json) to understand project
  • Post inline comments for specific issues
  • Post final review summary

Summary

Nice addition — a scheduled AI-driven audit that complements the existing test/lint CI and gives findings a structured issue format. Pinned action SHAs, sensible permissions (contents: read, issues: write), concurrency with cancel-in-progress: false, and the upfront prompt-injection notice are all the right instincts.

The main gaps are around fit with this specific repo (Node.js project, Python-only audit setup) and long-term issue-tracker hygiene (no real dedup). Details inline; recap below.

Higher-impact

  1. Node.js environment missing. This is a JS/Parcel/Playwright project but the workflow only installs Python tooling. npm audit (step 3) and "run the existing test suite" (step 5) will be impaired or impossible without actions/setup-node + npm ci. See inline on line 23–24.
  2. Duplicate detection only checks open issues. Closed/triaged findings will be re-reported every Monday, growing the noise floor over time. Either query --state all or add a stable fingerprint. See inline on line 97–100.
  3. GH_TOKEN may not reach Claude's Bash shell, which would cause every gh issue create to fail silently. Worth setting env: GH_TOKEN on the Claude step explicitly and verifying via workflow_dispatch before relying on Monday. See inline on line 31–36.
  4. Bash allow-list is unrestricted, WebFetch missing. Scoping Bash tightens the trust surface (semgrep rulesets and advisory pages are prompt-injection vectors), and the research step (Fixes to github workflow #2 of the prompt) genuinely needs WebFetch to read advisory bodies. See inline on line 136.

Lower-impact

  1. Inconsistent pinningsemgrep is version-pinned, pip-audit is not. Pick one policy. Note semgrep's engine pin doesn't pin rulesets (p/owasp-top-ten), which still update live. See inline on line 24.
  2. gh label create --force silently resets the label's color/description weekly — replace with an existence check or || true. See inline on line 29.
  3. Consider SARIF + code scanning for Semgrep findings instead of (or alongside) JSON artifacts — gives you a native dedup/dismissal UX on the Security tab. See inline on line 75–82.

Things working well

  • Action SHA pinning with # v4 / # v1 comments
  • Hardcoded prompt with explicit "untrusted external content" warning
  • Least-privilege permissions for the job
  • concurrency group with cancel-in-progress: false so a manual dispatch doesn't kill the running cron
  • if: always() + if-no-files-found: ignore on the artifact step — robust to partial failure
  • 90-min timeout-minutes cap on a potentially long-running Claude session

One small note

I couldn't fully verify exact claude-code-action@v1 env-var behavior without running it; the GH_TOKEN concern is "verify before relying on it" rather than "definitely broken." A single workflow_dispatch run will tell you everything.
· Branch: claude/add-security-audit-workflow

Comment on lines +23 to +24
- name: Install audit tools
run: python3 -m pip install --quiet semgrep==1.164.0 pip-audit

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing Node.js setup — likely blocks several audit steps for this repo.

This repo is a Node.js library (package.json declares parcel, playwright, web-test-runner, etc.), but the workflow only installs Python tooling. Without actions/setup-node and npm ci:

  • Step 3 of the prompt (npm audit --json) will run against the lockfile but won't have node_modules, and may use whatever Node version is preinstalled on the runner (currently fine, but not pinned/reproducible).
  • Step 5 ("run the existing test suite to establish a baseline") cannot succeed — npm test runs npm run lint && npx playwright install && web-test-runner, none of which work without npm ci first.
  • Semgrep's JS analysis runs on source files only, but the dependency posture differs from CI.

Suggest adding a setup step before line 23:

- uses: actions/setup-node@<sha>  # v6, matching test.yml
  with:
    node-version: 24
- run: npm ci

Fix this →

- uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4

- name: Install audit tools
run: python3 -m pip install --quiet semgrep==1.164.0 pip-audit

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistent pinning: semgrep is pinned, pip-audit is not.

For reproducible scheduled audits, both tools should be pinned to specific versions. As written, a pip-audit release could silently change the workflow's behavior between runs. Either pin both:

Suggested change
run: python3 -m pip install --quiet semgrep==1.164.0 pip-audit
run: python3 -m pip install --quiet semgrep==1.164.0 pip-audit==2.7.3

…or leave both unpinned (accepting the risk for fresh rulesets). One-sided pinning is the worst of both worlds.

Also worth noting: pinning semgrep==1.164.0 means rulesets via --config p/... are still fetched live and can update independently — the pin only freezes the engine, not the rules. That may or may not be what you want for a security tool that benefits from up-to-date detections.

- name: Ensure security label exists
env:
GH_TOKEN: ${{ github.token }}
run: gh label create security --color d73a4a --description "Security vulnerability" --force

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--force will overwrite maintainer customizations every week.

gh label create --force will reset the color and description if someone has already customized the security label (e.g., changed it to a different shade, added an emoji, or expanded the description). On a weekly cron, this silently undoes that customization each Monday.

Safer pattern — only create when missing:

Suggested change
run: gh label create security --color d73a4a --description "Security vulnerability" --force
run: gh label list --json name -q '.[].name' | grep -qx security || gh label create security --color d73a4a --description "Security vulnerability"

Or simply tolerate the existence error:

run: gh label create security --color d73a4a --description "Security vulnerability" 2>/dev/null || true

Comment on lines +97 to +100
**6. Check for duplicate issues**
```
gh issue list --label security --state open --json number,title
```

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate check only filters --state open — closed/triaged findings will be re-reported every week.

If a maintainer closes a generated issue as wontfix, not planned, or after applying an accepted-risk decision, the next Monday's run won't see it (it's not state open) and will create a brand-new duplicate. Over weeks this becomes a tracker-spam problem.

Two reasonable fixes:

  1. Check --state all:

    gh issue list --label security --state all --json number,title,state
    

    …and instruct Claude in the prompt to skip findings whose title closely matches an existing issue regardless of state.

  2. Use a stable fingerprint (e.g., CVE-XXXX-XXXXX or a hash of package@version + rule ID) in the issue title or a hidden HTML comment, and have Claude grep all issues for that fingerprint before filing.

Without one of these, expect ~10 fresh issues per Monday once any have been triaged closed.

If no genuine vulnerabilities are found, do not create any issues. Instead,
print a brief summary to stdout of what was scanned and confirm no issues were found.

claude_args: '--allowedTools "Bash,WebSearch"'

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bash is unrestricted, and WebFetch is missing despite the research step depending on it.

Two issues with this allow-list:

  1. No WebFetch. Step 2 of the prompt directs Claude to "actively research" CVEs from NVD, GitHub Advisory Database, and OWASP, and step 7 asks for advisory links. WebSearch returns snippets/URLs but doesn't fetch page bodies — Claude can't read full advisory details without WebFetch. Either add it, or weaken step 2 to acknowledge the limitation.

  2. Bash is wide open. A scheduled workflow with unrestricted shell on a runner that has GH_TOKEN for issues: write is a significant trust surface. The prompt itself is hardcoded (good), but step 2 explicitly tells Claude to fetch external content, and step 4 runs Semgrep with rulesets fetched from p/owasp-top-ten — both routes for prompt injection that could try to talk Claude into executing arbitrary shell. The prompt's "SECURITY NOTICE" mitigates this somewhat but isn't a hard boundary.

    Consider scoping Bash to the commands actually needed, similar to review.yml:

    claude_args: |
      --allowedTools "Bash(npm audit:*),Bash(pip-audit:*),Bash(semgrep:*),Bash(gh issue list:*),Bash(gh issue create:*),Bash(gh issue view:*),Bash(tee:*),Bash(jq:*),Read,Grep,Glob,WebSearch,WebFetch"
  3. Style consistency. The single-line quoted form '--allowedTools "Bash,WebSearch"' works but is fragile. review.yml uses the claude_args: | block-scalar form — adopting the same here avoids quote-nesting surprises if the list grows.

Comment on lines +75 to +82
**4. Static analysis**
Run Semgrep with the OWASP Top 10 and secrets detection rules, plus any
language-specific ruleset appropriate for this repo:
```
semgrep --config p/owasp-top-ten --config p/secrets --json -o audit-semgrep.json .
```
Then manually review the source code for issues not caught by automated tools,
specifically looking for the vulnerability classes identified in step 2.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider uploading Semgrep results as SARIF to GitHub code scanning.

The current setup outputs JSON for artifact upload, but artifacts are only useful if someone goes looking for them. Semgrep can emit SARIF, and github/codeql-action/upload-sarif will surface findings as native code scanning alerts on the Security tab — with deduplication across runs, code-anchored line annotations, and a UI for dismissing false positives. That UI also doubles as the de-duplication mechanism this workflow currently lacks (see the comment on step 6).

Rough shape:

- name: Semgrep SARIF
  run: semgrep --config p/owasp-top-ten --config p/secrets --sarif -o semgrep.sarif . || true
- uses: github/codeql-action/upload-sarif@<sha>
  if: always()
  with:
    sarif_file: semgrep.sarif

Requires security-events: write permission. The Claude-driven issue creation can then focus on findings code scanning can't surface (dynamic analysis, dependency advisories with no SARIF, manual review hits) rather than competing with the Security tab.

Comment on lines +31 to +36
- name: Claude security audit and issue creation
uses: anthropics/claude-code-action@537ffff2eff706bd7e3e1c3daf2d4b39067a9f85 # v1
with:
anthropic_api_key: ${{ secrets.ANTHROPIC_API_KEY }}
github_token: ${{ github.token }}
track_progress: true

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gh issue create invoked by Claude may not have a token in the bash environment.

The "Ensure security label exists" step explicitly sets GH_TOKEN: ${{ github.token }}, but this step doesn't. claude-code-action typically exposes the token to its internal MCP servers, but whether it also exports GH_TOKEN / GITHUB_TOKEN into the Bash shell that Claude invokes is action-version-specific. If it doesn't, every gh issue create call in step 7 of the prompt will fail with an auth error and the audit will produce no issues.

Belt-and-braces fix — add the env to this step too:

- name: Claude security audit and issue creation
  uses: anthropics/claude-code-action@537ffff2eff706bd7e3e1c3daf2d4b39067a9f85  # v1
  env:
    GH_TOKEN: ${{ github.token }}
  with:
    ...

Worth verifying with a manual workflow_dispatch run before relying on the Monday schedule.

@fynyky fynyky merged commit 80fe65d into master May 28, 2026
15 of 16 checks passed
@fynyky fynyky deleted the claude/add-security-audit-workflow branch May 28, 2026 17:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant