Claude/add security audit workflow by fynyky · Pull Request #66 · fynyky/elemental

fynyky · 2026-05-28T17:03:13Z

No description provided.

Scheduled GitHub Actions workflow that uses Claude to perform research-driven dependency auditing, static analysis (Semgrep), and dynamic testing, then files labeled GitHub issues for each finding. Includes SHA-pinned actions, prompt injection defenses, concurrency guard, 60-minute timeout, and artifact upload for raw scan output. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Add explicit github_token to Claude action step - Add pip-audit alongside semgrep for Python ecosystem support - Remove noisy "no findings" issue creation; log to stdout instead - Drop WebFetch from allowed tools to reduce prompt injection surface - Fix claude_args block scalar to plain string (avoids trailing newline) - Use ${{ github.workflow }} for concurrency group (more portable) - Remove redundant 2>/dev/null || true on gh label create --force - Bump timeout from 60 to 90 minutes for larger repos Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

claude · 2026-05-28T17:03:37Z

Claude finished @fynyky's task in 2m 47s —— View job

Claude Code Review

Todo List:

Read PR context and changed files
Analyze workflow file for security, correctness, and best practices
Read repo context (CLAUDE.md, package.json) to understand project
Post inline comments for specific issues
Post final review summary

Summary

Nice addition — a scheduled AI-driven audit that complements the existing test/lint CI and gives findings a structured issue format. Pinned action SHAs, sensible permissions (contents: read, issues: write), concurrency with cancel-in-progress: false, and the upfront prompt-injection notice are all the right instincts.

The main gaps are around fit with this specific repo (Node.js project, Python-only audit setup) and long-term issue-tracker hygiene (no real dedup). Details inline; recap below.

Higher-impact

Node.js environment missing. This is a JS/Parcel/Playwright project but the workflow only installs Python tooling. npm audit (step 3) and "run the existing test suite" (step 5) will be impaired or impossible without actions/setup-node + npm ci. See inline on line 23–24.
Duplicate detection only checks open issues. Closed/triaged findings will be re-reported every Monday, growing the noise floor over time. Either query --state all or add a stable fingerprint. See inline on line 97–100.
GH_TOKEN may not reach Claude's Bash shell, which would cause every gh issue create to fail silently. Worth setting env: GH_TOKEN on the Claude step explicitly and verifying via workflow_dispatch before relying on Monday. See inline on line 31–36.
Bash allow-list is unrestricted, WebFetch missing. Scoping Bash tightens the trust surface (semgrep rulesets and advisory pages are prompt-injection vectors), and the research step (Fixes to github workflow #2 of the prompt) genuinely needs WebFetch to read advisory bodies. See inline on line 136.

Lower-impact

Inconsistent pinning — semgrep is version-pinned, pip-audit is not. Pick one policy. Note semgrep's engine pin doesn't pin rulesets (p/owasp-top-ten), which still update live. See inline on line 24.
gh label create --force silently resets the label's color/description weekly — replace with an existence check or || true. See inline on line 29.
Consider SARIF + code scanning for Semgrep findings instead of (or alongside) JSON artifacts — gives you a native dedup/dismissal UX on the Security tab. See inline on line 75–82.

Things working well

Action SHA pinning with # v4 / # v1 comments
Hardcoded prompt with explicit "untrusted external content" warning
Least-privilege permissions for the job
concurrency group with cancel-in-progress: false so a manual dispatch doesn't kill the running cron
if: always() + if-no-files-found: ignore on the artifact step — robust to partial failure
90-min timeout-minutes cap on a potentially long-running Claude session

One small note

I couldn't fully verify exact claude-code-action@v1 env-var behavior without running it; the GH_TOKEN concern is "verify before relying on it" rather than "definitely broken." A single workflow_dispatch run will tell you everything.
· Branch: claude/add-security-audit-workflow

claude · 2026-05-28T17:06:40Z

+      - name: Install audit tools
+        run: python3 -m pip install --quiet semgrep==1.164.0 pip-audit


Missing Node.js setup — likely blocks several audit steps for this repo.

This repo is a Node.js library (package.json declares parcel, playwright, web-test-runner, etc.), but the workflow only installs Python tooling. Without actions/setup-node and npm ci:

Step 3 of the prompt (npm audit --json) will run against the lockfile but won't have node_modules, and may use whatever Node version is preinstalled on the runner (currently fine, but not pinned/reproducible).

Step 5 ("run the existing test suite to establish a baseline") cannot succeed — npm test runs npm run lint && npx playwright install && web-test-runner, none of which work without npm ci first.

Semgrep's JS analysis runs on source files only, but the dependency posture differs from CI.

Suggest adding a setup step before line 23:

- uses: actions/setup-node@<sha> # v6, matching test.yml with: node-version: 24 - run: npm ci

Fix this →

claude · 2026-05-28T17:06:41Z

+      - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5  # v4
+
+      - name: Install audit tools
+        run: python3 -m pip install --quiet semgrep==1.164.0 pip-audit


Inconsistent pinning: semgrep is pinned, pip-audit is not.

For reproducible scheduled audits, both tools should be pinned to specific versions. As written, a pip-audit release could silently change the workflow's behavior between runs. Either pin both:

Suggested change

run: python3 -m pip install --quiet semgrep==1.164.0 pip-audit

run: python3 -m pip install --quiet semgrep==1.164.0 pip-audit==2.7.3

…or leave both unpinned (accepting the risk for fresh rulesets). One-sided pinning is the worst of both worlds.

Also worth noting: pinning semgrep==1.164.0 means rulesets via --config p/... are still fetched live and can update independently — the pin only freezes the engine, not the rules. That may or may not be what you want for a security tool that benefits from up-to-date detections.

claude · 2026-05-28T17:06:43Z

+      - name: Ensure security label exists
+        env:
+          GH_TOKEN: ${{ github.token }}
+        run: gh label create security --color d73a4a --description "Security vulnerability" --force


--force will overwrite maintainer customizations every week.

gh label create --force will reset the color and description if someone has already customized the security label (e.g., changed it to a different shade, added an emoji, or expanded the description). On a weekly cron, this silently undoes that customization each Monday.

Safer pattern — only create when missing:

Suggested change

run: gh label create security --color d73a4a --description "Security vulnerability" --force

run: gh label list --json name -q '.[].name' | grep -qx security || gh label create security --color d73a4a --description "Security vulnerability"

Or simply tolerate the existence error:

run: gh label create security --color d73a4a --description "Security vulnerability" 2>/dev/null || true

claude · 2026-05-28T17:06:44Z

+            **6. Check for duplicate issues**
+            ```
+            gh issue list --label security --state open --json number,title
+            ```


Duplicate check only filters --state open — closed/triaged findings will be re-reported every week.

If a maintainer closes a generated issue as wontfix, not planned, or after applying an accepted-risk decision, the next Monday's run won't see it (it's not state open) and will create a brand-new duplicate. Over weeks this becomes a tracker-spam problem.

Two reasonable fixes:

Check --state all:

gh issue list --label security --state all --json number,title,state

…and instruct Claude in the prompt to skip findings whose title closely matches an existing issue regardless of state.

Use a stable fingerprint (e.g., CVE-XXXX-XXXXX or a hash of package@version + rule ID) in the issue title or a hidden HTML comment, and have Claude grep all issues for that fingerprint before filing.

Without one of these, expect ~10 fresh issues per Monday once any have been triaged closed.

claude · 2026-05-28T17:06:45Z

+            If no genuine vulnerabilities are found, do not create any issues. Instead,
+            print a brief summary to stdout of what was scanned and confirm no issues were found.
+
+          claude_args: '--allowedTools "Bash,WebSearch"'


Bash is unrestricted, and WebFetch is missing despite the research step depending on it.

Two issues with this allow-list:

No WebFetch. Step 2 of the prompt directs Claude to "actively research" CVEs from NVD, GitHub Advisory Database, and OWASP, and step 7 asks for advisory links. WebSearch returns snippets/URLs but doesn't fetch page bodies — Claude can't read full advisory details without WebFetch. Either add it, or weaken step 2 to acknowledge the limitation.

Bash is wide open. A scheduled workflow with unrestricted shell on a runner that has GH_TOKEN for issues: write is a significant trust surface. The prompt itself is hardcoded (good), but step 2 explicitly tells Claude to fetch external content, and step 4 runs Semgrep with rulesets fetched from p/owasp-top-ten — both routes for prompt injection that could try to talk Claude into executing arbitrary shell. The prompt's "SECURITY NOTICE" mitigates this somewhat but isn't a hard boundary.

Consider scoping Bash to the commands actually needed, similar to review.yml:

claude_args: | --allowedTools "Bash(npm audit:*),Bash(pip-audit:*),Bash(semgrep:*),Bash(gh issue list:*),Bash(gh issue create:*),Bash(gh issue view:*),Bash(tee:*),Bash(jq:*),Read,Grep,Glob,WebSearch,WebFetch"

Style consistency. The single-line quoted form '--allowedTools "Bash,WebSearch"' works but is fragile. review.yml uses the claude_args: | block-scalar form — adopting the same here avoids quote-nesting surprises if the list grows.

claude · 2026-05-28T17:06:46Z

+            **4. Static analysis**
+            Run Semgrep with the OWASP Top 10 and secrets detection rules, plus any
+            language-specific ruleset appropriate for this repo:
+            ```
+            semgrep --config p/owasp-top-ten --config p/secrets --json -o audit-semgrep.json .
+            ```
+            Then manually review the source code for issues not caught by automated tools,
+            specifically looking for the vulnerability classes identified in step 2.


Consider uploading Semgrep results as SARIF to GitHub code scanning.

The current setup outputs JSON for artifact upload, but artifacts are only useful if someone goes looking for them. Semgrep can emit SARIF, and github/codeql-action/upload-sarif will surface findings as native code scanning alerts on the Security tab — with deduplication across runs, code-anchored line annotations, and a UI for dismissing false positives. That UI also doubles as the de-duplication mechanism this workflow currently lacks (see the comment on step 6).

Rough shape:

- name: Semgrep SARIF run: semgrep --config p/owasp-top-ten --config p/secrets --sarif -o semgrep.sarif . || true - uses: github/codeql-action/upload-sarif@<sha> if: always() with: sarif_file: semgrep.sarif

Requires security-events: write permission. The Claude-driven issue creation can then focus on findings code scanning can't surface (dynamic analysis, dependency advisories with no SARIF, manual review hits) rather than competing with the Security tab.

claude · 2026-05-28T17:06:47Z

+      - name: Claude security audit and issue creation
+        uses: anthropics/claude-code-action@537ffff2eff706bd7e3e1c3daf2d4b39067a9f85  # v1
+        with:
+          anthropic_api_key: ${{ secrets.ANTHROPIC_API_KEY }}
+          github_token: ${{ github.token }}
+          track_progress: true


gh issue create invoked by Claude may not have a token in the bash environment.

The "Ensure security label exists" step explicitly sets GH_TOKEN: ${{ github.token }}, but this step doesn't. claude-code-action typically exposes the token to its internal MCP servers, but whether it also exports GH_TOKEN / GITHUB_TOKEN into the Bash shell that Claude invokes is action-version-specific. If it doesn't, every gh issue create call in step 7 of the prompt will fail with an auth error and the audit will produce no issues.

Belt-and-braces fix — add the env to this step too:

- name: Claude security audit and issue creation uses: anthropics/claude-code-action@537ffff2eff706bd7e3e1c3daf2d4b39067a9f85 # v1 env: GH_TOKEN: ${{ github.token }} with: ...

Worth verifying with a manual workflow_dispatch run before relying on the Monday schedule.

fynyky and others added 2 commits May 28, 2026 09:34

claude Bot reviewed May 28, 2026

View reviewed changes

fynyky merged commit 80fe65d into master May 28, 2026
15 of 16 checks passed

fynyky deleted the claude/add-security-audit-workflow branch May 28, 2026 17:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Claude/add security audit workflow#66

Claude/add security audit workflow#66
fynyky merged 2 commits into
masterfrom
claude/add-security-audit-workflow

fynyky commented May 28, 2026

Uh oh!

claude Bot commented May 28, 2026 •

edited

Loading

Uh oh!

claude Bot May 28, 2026

Uh oh!

claude Bot May 28, 2026

Uh oh!

claude Bot May 28, 2026

Uh oh!

claude Bot May 28, 2026

Uh oh!

claude Bot May 28, 2026

Uh oh!

claude Bot May 28, 2026

Uh oh!

claude Bot May 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		- name: Install audit tools
		run: python3 -m pip install --quiet semgrep==1.164.0 pip-audit

	run: gh label create security --color d73a4a --description "Security vulnerability" --force
	run: gh label list --json name -q '.[].name' \| grep -qx security \|\| gh label create security --color d73a4a --description "Security vulnerability"

Conversation

fynyky commented May 28, 2026

Uh oh!

claude Bot commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Claude Code Review

Summary

Higher-impact

Lower-impact

Things working well

One small note

Uh oh!

claude Bot May 28, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot May 28, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot May 28, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot May 28, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot May 28, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot May 28, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot May 28, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

claude Bot commented May 28, 2026 •

edited

Loading