diff --git a/.claude/CLAUDE.md b/.claude/CLAUDE.md index f7f64082..32b025fd 100644 --- a/.claude/CLAUDE.md +++ b/.claude/CLAUDE.md @@ -166,6 +166,7 @@ assets/brand/ # Brand assets: SVGs, animated PNGs, icons (ICO/ICNS/PNG - **Brand assets**: All brand files in `assets/brand/` (logo.svg, logotype.svg, icon.svg, brandkit.html + PNG/ICO/ICNS variants). Brand assets are **proprietary** (not MIT) — do not change license headers. Sidebar uses `` tags for animated SVG logo + logotype (CSP-safe, blocks SVG script execution). Colour-blind mode passed via `?mode=cb-{colourBlindMode}` from settings store. Dark mode auto-detected via `@media(prefers-color-scheme)` in SVG CSS. Regenerate with `node scripts/generate-icons.mjs` (icons) and `node scripts/svg-to-apng.mjs` (animated PNGs). Copyright year: `2026` (dynamic via `scripts/update-copyright-year.sh`). - **Third-party licence obligations when bundling**: The offline-installer build path (`release.yml` Step 8.5, `bundle_engines=true`) redistributes upstream engines and tools inside our signed installer. MIT/BSD/Unlicense/PSF components only need the upstream copyright + permission notice preserved alongside the binary. **LGPL components** (FFmpeg, MP4Box/GPAC) additionally need a written offer for source (component + exact version + upstream-source URL + fall-back contact, valid for three years) and the binary must remain user-substitutable — subprocess invocation already satisfies this. **GPL components** (get_iplayer, planned for M8) need the complete corresponding source shipped alongside the binary (or a three-year written offer); MeedyaDL's own MIT code is protected from GPL propagation by the "mere aggregation" exception because we subprocess-invoke rather than link. The harvesting + source-offer emission belongs in `release.yml` Step 8.5 itself so it can never be forgotten when a new component lands in `engines.toml` or `tool-versions.toml`. Full matrix + practical "written offer for source" template in `.claude/memory/project_third_party_licence_obligations.md`. Tracked in #802. Notes do **not** apply to the default tiny-installer path (`bundle_engines=false`) which only ships MeedyaDL's own code — engines copy themselves from PyPI / GitHub Releases onto the user's machine at first launch. - **Licence compliance is enforced per-PR (#806)**: the `Licences` workflow (`.github/workflows/licences.yml`) runs on every PR to `main` and on `workflow_dispatch`. Three checks: (1) `npm run check:acknowledgements` confirms every direct dep in `Cargo.toml` / `package.json` is named in `ACKNOWLEDGEMENTS.md` (set-coverage drift); (2) `npm run check:upstream-licences` confirms each direct dep's actual upstream-declared licence string matches what `ACKNOWLEDGEMENTS.md` claims, catching upstream re-licensings between MeedyaDL releases (most commonly MIT → MIT/Apache-2.0 dual-licence flips, but also the rare permissive → copyleft change that would be a serious compliance risk); (3) `cargo-deny check licenses` enforces the permissive-only allowlist in `src-tauri/deny.toml` on the full transitive tree. The upstream-string check uses an SPDX-aware normaliser that treats `LGPL-3.0+` ≡ `LGPL-3.0-or-later` and `MIT/Apache-2.0` ≡ `MIT OR Apache-2.0` as equivalent (advisory, not blocking). The `Licences` workflow is in addition to (not a replacement for) the existing cargo-deny step inside `ci.yml::backend` — duplicated here so the compliance picture lives in one workflow a maintainer can read in isolation. Run locally via `npm run check:legal` (umbrella over both Node scripts). +- **PR security heuristics run per-PR**: the `PR Security Checks` workflow (`.github/workflows/pr-security.yml`) runs on every PR to `main` / `release-candidate` / `beta` / `alpha` and on `workflow_dispatch`. **All checks are non-blocking advisory** — the merge gate stays with `ci.yml` (which already hard-gates `cargo clippy -D warnings`, `cargo test`, `cargo-deny`, `tsc`, `eslint`, and CodeQL for JS/TS + Actions); this workflow adds the *heuristic* layer those gates don't cover and posts a **single upserted PR comment** (edited in place by hidden marker ``, not a new comment per push). Adapted from the WebMS-Intra `pr-security.yml` to MeedyaDL's Rust/TS/Tauri stack. Eight checks: (1) gitleaks CLI secrets scan (working tree + commits since base, `--redact`, findings surfaced in the comment, no SARIF upload so the permission surface stays `contents:read` + `pull-requests:write`); (2) Rust subprocess shell-interpolation grep (`Command::new("sh")` / `.arg("-c")` — enforces the "no `sh -c`" invariant); (3) `unsafe` Rust blocks/fns in non-test changed code; (4) dangerous frontend sinks (`eval` / `new Function` / `dangerouslySetInnerHTML` / `innerHTML=`); (5) hardcoded absolute paths (`/Users/`, `/home//`, `C:\`); (6) unpinned GitHub Actions (`uses: org/repo@` not a 40-hex SHA — handles both `- uses:` and bare `uses:` forms, exempts `./local` and `docker://`); (7) sensitive/proprietary path touches (`assets/brand/` is PROPRIETARY, `src-tauri/capabilities/`, `tauri.conf.json`, `.github/workflows/`, signing/entitlements); (8) cross-source consistency via `tools/audit-checks/`. Checks 2–7 scan only PR-changed files; check 8 validates whole-repo state. The two consistency scripts — `check_ipc_commands.py` (every `#[tauri::command]` is registered in `lib.rs` `generate_handler![]` AND every frontend `invoke('x')` targets a registered command — the runtime "command not found" class) and `check_codec_registry.py` (every meta-codec `resolves_to` target is a real codec section AND every audio `services.gamdl` flag is a kebab-case `SongCodec` variant) — are pure cross-source reference validators (the MeedyaDL analog of WebMS's `check_route_targets.py` / `check_sql_columns.py`), exit 0 by default and 1 under `--strict`, report ` • path:line — msg` bullets the workflow greps for, and **must stay zero-finding on a clean tree** (add a negative test when changing them). Run locally: `python3 tools/audit-checks/check_ipc_commands.py` / `check_codec_registry.py`. The matching **`.github/pull_request_template.md`** carries the manual security-review checklist (subprocess safety, IPC contract registration, keychain/redaction, CSP/capabilities, licensing, proprietary-asset headers) that a grep can't cover. - **Git operations**: Do NOT auto-commit or auto-push. Only edit files — let the user control git operations. - **Documentation maintenance**: When adding features, modifying settings, changing commands/services, or altering UI — update ALL affected markdown files (README.md, Project_Plan.md, CHANGELOG.md, CLAUDE.md, help/*.md). This includes version numbers, file counts, feature lists, project structure trees, and help topic cross-references. Project_Plan.md serves as both the plan and status tracker (PROJECT_STATUS.md was consolidated into it). - **GitHub Issue tracking**: For every task (features, bug fixes, enhancements, security fixes): (1) Create a GitHub Issue if one doesn't exist (`gh issue create`); (2) Close with completion comment when done (`gh issue close --reason completed`); (3) Link parent/child issues in the body (e.g., "Depends on #107", "Part of #100"); (4) Add to "MeedyaDL Development" project (`gh project item-add 6 --owner MWBMPartners`); (5) Create follow-up issues for any future work identified during implementation. diff --git a/.claude/memory/MEMORY.md b/.claude/memory/MEMORY.md index eb5d46b3..b7182ba6 100644 --- a/.claude/memory/MEMORY.md +++ b/.claude/memory/MEMORY.md @@ -11,3 +11,4 @@ - [v1.7 bumper bundle](project_v17_bumper_bundle.md) — 22-issue session on `feat/v1.7-bumper-bundle` (2026-05-17/18); 13 closed + 9 deferred-with-plan; 11 new unit tests; demonstrates the defer-with-plan vs close pattern - [MeedyaSuite-core online-only](project_meedyasuite_core_online_only.md) — standing rule: when investigating MeedyaSuite-core integrations, always fetch from github.com/MWBMPartners/MeedyaSuite-core via `gh api`, never trust the local cargo checkout (pinned at Cargo.toml's branch ref, may lag main) - [MeedyaSuite org migration (planned)](project_meedyasuite_org_migration.md) — pending consolidation of MWBMPartners/* repos under MeedyaSuite org; secrets inventory + transfer order pinned; tauri.conf.json already has multi-endpoint updater fallback (2026-05-22); GitHub does NOT support nested orgs (siblings only) +- [PR security heuristics + audit-checks](project_pr_security_checks.md) — `pr-security.yml` (8 non-blocking advisory checks: gitleaks, `sh -c`, unsafe, frontend sinks, hardcoded paths, unpinned actions, sensitive-path, consistency) + `tools/audit-checks/` (IPC contract + codec-registry, stdlib-only, zero-finding-on-clean discipline); adapted from WebMS-Intra; landed PR #905 (2026-06-03); note the unresolvable `setup-python` SHA still in `upstream-gamdl-watch.yml` diff --git a/.claude/memory/project_pr_security_checks.md b/.claude/memory/project_pr_security_checks.md new file mode 100644 index 00000000..ba8c7471 --- /dev/null +++ b/.claude/memory/project_pr_security_checks.md @@ -0,0 +1,43 @@ +--- +name: PR security heuristics + audit-checks (pr-security.yml, tools/audit-checks/) +description: How MeedyaDL's per-PR security heuristic gate and cross-source consistency scripts work, why they're shaped the way they are, and the gotchas hit standing them up — adapted from WebMS-Intra to the Rust/TS/Tauri stack +type: project +--- +Landed 2026-06-03 on PR #905 (`ci: add PR security heuristics workflow + cross-source audit checks`), CI 12/12 green. Adapted from the WebMS-Intra `pr-security.yml` approach at the user's request ("add similar PR heuristics security checks to this project"). WebMS-Intra is PHP; MeedyaDL is Rust + TypeScript + Tauri, so this is an **adaptation, not a port** — the PHP-specific checks (PHP lint hard gate, mysqli SQL-injection, CSRF tokens, Psalm) were dropped and the heuristic layer re-aimed at MeedyaDL's own documented invariants. + +## What exists + +- **`.github/workflows/pr-security.yml`** — runs on PRs to `main` / `release-candidate` / `beta` / `alpha` + `workflow_dispatch`. **Every check is non-blocking (`continue-on-error`)** — the merge gate stays with `ci.yml` (clippy `-D warnings`, cargo test, cargo-deny, tsc, eslint, CodeQL). This adds only the heuristic layer those gates don't cover. +- **`tools/audit-checks/check_ipc_commands.py`** + **`check_codec_registry.py`** + **`README.md`** — zero-dependency Python cross-source validators, runnable locally. +- **`.github/pull_request_template.md`** — manual security-review checklist mapped to MeedyaDL invariants. +- **`.claude/CLAUDE.md`** — convention bullet under the "Licence compliance is enforced per-PR" neighbour. + +## The 8 workflow checks + +1. gitleaks CLI secrets scan (working tree + commits since base, `--redact`); findings surfaced in the PR comment, **no SARIF upload** (keeps perms at `contents:read` + `pull-requests:write`). 2. Rust subprocess shell-interpolation (`Command::new("sh")` / `.arg("-c")` — the "no `sh -c`" rule). 3. `unsafe` Rust in non-test changed code. 4. Dangerous frontend sinks (`eval` / `new Function` / `dangerouslySetInnerHTML` / `innerHTML=`). 5. Hardcoded absolute paths (`/Users/`, `/home//`, `C:\`). 6. Unpinned GitHub Actions (`uses: org/repo@` not a 40-hex SHA — handles both `- uses:` and bare `uses:` forms; exempts `./local` and `docker://`). 7. Sensitive/proprietary path touches (`assets/brand/` is PROPRIETARY, `src-tauri/capabilities/`, `tauri.conf.json`, `.github/workflows/`, signing/entitlements). 8. The two consistency scripts. + +Checks 2–7 scan only PR-changed files; check 8 validates whole-repo state. + +## The two consistency scripts (the WebMS checks-9-11 analog) + +Both follow the WebMS pattern: validate that two sources which must agree have no compiler link between them ("code references something that doesn't exist in another source"). + +- **`check_ipc_commands.py`** — every `#[tauri::command]` under `src-tauri/src/` is registered in `lib.rs`'s `generate_handler![]`, AND every frontend `invoke('x')` literal targets a registered command. Catches the runtime "command not found" class (defined-but-unregistered compiles fine; a typo'd invoke target only fails when a user clicks the button). +- **`check_codec_registry.py`** — every `codecs.toml` meta-codec `resolves_to` target is a real concrete codec section, AND every audio `services.gamdl` flag is a kebab-case `SongCodec` variant. TOML is parsed with **targeted regex, not `tomllib`** (so the script is Python-version-agnostic and venv-free). + +**Conventions to preserve:** +- Findings print as ` • path:line — message` bullets; the workflow greps for the `•` bullet to decide whether to surface a section. Keep that prefix. +- **Zero findings on a clean tree is mandatory.** Both were verified clean on the current tree and negative-tested (inject a bad `invoke()`, a dangling `resolves_to`, a bogus `gamdl=` → all caught). Add a negative test when you change a check — a check that cries wolf on day one gets ignored. +- Default exit 0; `--strict` exits 1 on a high-severity finding (for local pre-push hooks). + +The README lists good next candidates: `engines.toml` ↔ `EngineCommandBuilder` impls, `tool-versions.toml` ↔ installed tools, i18n `t('key')` ↔ locale JSON. (A Rust↔TS `AppSettings` drift check is tempting but high-false-positive because of serde renames — validate carefully before adding.) + +## Design choices worth remembering + +- **One upserted comment, not one per push.** The comment carries a hidden marker ``; the workflow finds an existing marked comment via `gh api …/issues/{n}/comments` and PATCHes it (`jq -Rs '{body: .}' | gh api --input -`), else POSTs a new one. WebMS posts a fresh comment every push; this avoids that spam. +- **Self-referential advisory is by design.** Check 7 flags any PR touching `.github/workflows/`, so the PR that *added* `pr-security.yml` got flagged by `pr-security.yml` — correct behaviour (reviewers should confirm workflow/brand changes are deliberate), advisory, no action. + +## Gotchas hit standing it up (2026-06-03) + +- **`actions/setup-python@e348410041c5b0ca4452c8e292ca3936bac9ba7f # v6` is NOT a resolvable SHA.** My first pr-security.yml copied this pin from `upstream-gamdl-watch.yml:60` (a repo grep "confirmed" it was already in use). The job died in 2 s at action-resolution: *"Unable to resolve action … unable to find version e348410…"*. Fix: the audit scripts are stdlib-only (`re`/`sys`/`pathlib`), so **`setup-python` was removed entirely** — ubuntu runners' preinstalled `python3` runs them. **`upstream-gamdl-watch.yml` still carries the same bad pin** and will fail at its Python-setup step whenever that cron fires — a latent bug worth a follow-up (flagged to the user 2026-06-03). +- **Detecting CI *success* from a remote session is awkward.** Webhooks deliver CI *failures* and comments but never success/new-push/merge-conflict transitions. Unauthenticated `api.github.com` polling hits the shared-runner-IP rate limit fast (60/hr), there was no `GH_TOKEN` in the shell, and `send_later` wasn't available. The working pattern: rely on the failure-webhook for interim breakage, and arm a `Monitor` single-shot timer (`sleep N && echo`) to wake the session and re-query check-runs via the **authenticated GitHub MCP** (`pull_request_read get_check_runs`) once. Re-arm if still running. diff --git a/.github/pull_request_template.md b/.github/pull_request_template.md new file mode 100644 index 00000000..ed46642c --- /dev/null +++ b/.github/pull_request_template.md @@ -0,0 +1,105 @@ + + +## Summary + + + +## Scope of changes + +- [ ] Rust backend (`src-tauri/src/`) +- [ ] React / TypeScript frontend (`src/`) +- [ ] IPC surface (new/changed `#[tauri::command]`) +- [ ] Settings schema (`models/settings.rs` + TS types) +- [ ] Engine / codec / tool config (`engines.toml`, `codecs.toml`, `tool-versions.toml`, `tags.toml`) +- [ ] CI / workflows (`.github/`) +- [ ] Documentation (README, CHANGELOG, CLAUDE.md, `Project_Plan.md`, `help/*.md`) +- [ ] Other: _____ + +## Test plan + +- [ ] `cargo test` (in `src-tauri/`) passes +- [ ] `npm run type-check` and `npm run test` pass +- [ ] `cargo clippy -- -D warnings` is clean +- [ ] Tested the golden path manually (`cargo tauri dev`) +- [ ] Tested at least one edge / failure case + +## Security review + +**Tick each row consciously.** These mirror MeedyaDL's documented invariants +(see `.claude/CLAUDE.md`). The automated workflow flags some of these +heuristically, but a clean bot comment is not a substitute for this pass. + +### Input handling + +- [ ] URLs are validated as `http(s)://` before reaching a subprocess (`gamdl_service.rs` guard), and download URLs are domain-allowlisted (Apple Music / Classical / iTunes) +- [ ] Any new filesystem path goes through `validate_path_safe()` (rejects `..` traversal); no path is built from unsanitised user input +- [ ] User strings written to GAMDL `config.ini` pass through `sanitize_ini_value()` (strips `\n` / `\r`) +- [ ] Imported settings/manifests are length-capped and control-char-stripped (`sanitize_imported_settings()`) + +### Subprocess safety + +- [ ] All subprocess calls use parameterised `Command::new().arg()` — **no** `sh -c` / `bash -c` / `format!()`-into-shell patterns +- [ ] No user input reaches `eval` / `new Function` / a shell + +### Output & UI + +- [ ] No `dangerouslySetInnerHTML` / `innerHTML =` of untrusted or remote-derived strings; Markdown is rendered through `rehype-sanitize` +- [ ] No raw secret/credential values are rendered into the DOM or logged to the activity log + +### Secrets / credentials + +- [ ] No API keys, developer tokens, `.p8` keys, passwords, or wrapper auth tokens are committed (embedded keys come from `option_env!` build secrets only) +- [ ] Sensitive values are stored in the OS keychain (`keyring`), not in `settings.json` +- [ ] Wrapper URLs are passed through `redact_url_query()` before any logging +- [ ] No new file was added under a server/secret-managed path without justification + +### Filesystem + +- [ ] No hardcoded absolute paths (`/Users/…`, `/home//…`, `C:\…`) — paths derive from `app_data_dir` / `std::env::temp_dir()` +- [ ] New on-disk writes that must survive a crash use the atomic temp-then-rename pattern + +### IPC contract + +- [ ] Every new `#[tauri::command]` is registered in `tauri::generate_handler![]` in `lib.rs` **and** has a frontend wrapper (`src/lib/tauri-commands.ts`) +- [ ] Rate-limited commands (downloads, update checks, cookie imports) keep their limiter +- [ ] `python3 tools/audit-checks/check_ipc_commands.py` is clean + +### Settings / registry consistency + +- [ ] If `AppSettings` changed: `settings_version` bumped + `migrate_settings()` updated, and the TypeScript type mirror updated +- [ ] If `codecs.toml` changed: `python3 tools/audit-checks/check_codec_registry.py` is clean + +### Dependencies / licensing + +- [ ] New dependencies are permissively licensed (cargo-deny allowlist) and named in `ACKNOWLEDGEMENTS.md` (`npm run check:legal` passes) +- [ ] No GPL/copyleft code is *linked* into MeedyaDL's own MIT code (subprocess invocation is fine) + +### CI / supply chain + +- [ ] New GitHub Actions are pinned to an immutable 40-char commit SHA (not a `@vX` tag) +- [ ] No `[skip ci]` in commit messages (unless explicitly requested) +- [ ] `workflow_dispatch` inputs are consumed via `env:`, not interpolated directly into `run:` shell + +### Proprietary assets + +- [ ] `assets/brand/` files (if touched) keep their **proprietary** license headers — these are NOT MIT + +## Documentation + +- [ ] Updated all affected docs (README, CHANGELOG, CLAUDE.md, `Project_Plan.md`, `help/*.md`) — feature lists, settings, commands, file counts, structure trees + +## Related issues + + + +--- + +Checklist enforced by repo convention, not by CI. The automated **PR Security Checks** workflow adds heuristic scans on top of this manual review; both are advisory — the merge gate is `ci.yml`. diff --git a/.github/workflows/pr-security.yml b/.github/workflows/pr-security.yml new file mode 100644 index 00000000..6e2aa3cd --- /dev/null +++ b/.github/workflows/pr-security.yml @@ -0,0 +1,328 @@ +# Copyright (c) 2026 MeedyaSuite +# Licensed under the MIT License. See LICENSE file in the project root. +# +# PR Security Checks +# ================== +# +# Defence-in-depth advisory gate that runs on every PR targeting a +# protected branch. Adapted from the WebMS-Intra pr-security workflow to +# MeedyaDL's Rust + TypeScript + Tauri stack. +# +# IMPORTANT: every check here is NON-BLOCKING (continue-on-error). The +# merge gate stays with ci.yml — which already enforces the hard gates this +# repo cares about (`cargo clippy -D warnings`, `cargo test`, `cargo-deny`, +# `tsc`/`npm run type-check`, `eslint`, CodeQL for JS/TS + Actions). This +# workflow adds the *heuristic* layer those gates don't cover, and posts a +# single advisory comment on the PR. These are HEURISTICS — false positives +# are expected. Review the findings; don't blindly act on them. +# +# Checks performed: +# 1. Secrets scan (gitleaks CLI) — high-confidence committed credentials +# 2. Rust subprocess shell interpolation — Command::new("sh") / .arg("-c") (CLAUDE.md ban) +# 3. Unsafe Rust — `unsafe { }` / `unsafe fn` in non-test code +# 4. Dangerous frontend sinks — eval / new Function / dangerouslySetInnerHTML / innerHTML= +# 5. Hardcoded absolute paths — /Users//home// C:\ literals +# 6. Unpinned GitHub Actions — `uses: org/repo@` not pinned to a 40-hex SHA +# 7. Sensitive / proprietary path touches — assets/brand, capabilities, signing, tauri.conf.json, workflows +# 8. Cross-source consistency — tools/audit-checks/*.py (IPC contract, codec registry) +# +# Checks 2-7 only scan files CHANGED in the PR. Check 8 validates whole-repo +# state (a drift can live in an unchanged file that this PR orphaned). The +# audit scripts live in tools/audit-checks/ and are runnable locally. + +name: PR Security Checks + +on: + pull_request: + branches: + - main + - release-candidate + - beta + - alpha + workflow_dispatch: + +permissions: + contents: read + pull-requests: write + +concurrency: + group: pr-security-${{ github.event.pull_request.number || github.ref }} + cancel-in-progress: true + +jobs: + security: + name: Static security checks + runs-on: ubuntu-latest + + steps: + - name: Checkout PR head + uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6 + with: + fetch-depth: 0 + + # No setup-python step: the audit-checks scripts are stdlib-only + # (re / sys / pathlib — the TOML is regex-parsed, not via tomllib), so + # the runner's preinstalled python3 is sufficient. + + # ===================================================================== + # 1) Secrets scan via gitleaks (CLI binary, not the action) + # + # We use the gitleaks CLI directly rather than gitleaks/gitleaks-action, + # which requires a paid GITLEAKS_LICENSE secret for org accounts. The + # CLI is MIT-licensed and free. Findings are surfaced (redacted) in the + # PR comment; we do not upload SARIF to keep the permission surface at + # contents:read + pull-requests:write. + # ===================================================================== + - name: Install gitleaks CLI + id: install_gitleaks + continue-on-error: true + run: | + set -e + VERSION="8.21.2" + URL="https://github.com/gitleaks/gitleaks/releases/download/v${VERSION}/gitleaks_${VERSION}_linux_x64.tar.gz" + curl -fsSL "$URL" -o /tmp/gitleaks.tgz + tar -xzf /tmp/gitleaks.tgz -C /tmp gitleaks + sudo mv /tmp/gitleaks /usr/local/bin/gitleaks + gitleaks version + + - name: Scan for committed secrets (gitleaks) + id: gitleaks + continue-on-error: true + if: steps.install_gitleaks.outcome == 'success' + env: + BASE_SHA: ${{ github.event.pull_request.base.sha }} + HEAD_SHA: ${{ github.sha }} + run: | + set +e + SECTION=/tmp/gitleaks-section.md + : > "$SECTION" + # `detect` scans the working tree + commits since the PR base. + # --redact stops actual secret values appearing in the log/report. + gitleaks detect \ + --source . \ + --log-opts "${BASE_SHA}..${HEAD_SHA}" \ + --no-banner \ + --redact \ + --report-path /tmp/gitleaks.json \ + --report-format json \ + --exit-code 1 + EXIT=$? + COUNT=0 + if [ -f /tmp/gitleaks.json ]; then + COUNT=$(jq 'length' /tmp/gitleaks.json 2>/dev/null || echo 0) + fi + echo "count=$COUNT" >> "$GITHUB_OUTPUT" + if [ "$COUNT" -gt 0 ]; then + { + echo "### Committed secrets (gitleaks — values redacted)" + echo + echo '```' + jq -r '.[] | " • \(.File):\(.StartLine) — \(.RuleID): \(.Description)"' /tmp/gitleaks.json 2>/dev/null | sort -u + echo '```' + echo + } > "$SECTION" + echo "::warning::gitleaks reported $COUNT potential secret(s) — review the PR comment" + fi + exit 0 + + # ===================================================================== + # 2-7) Custom heuristic scan over the files changed in this PR, + # plus 8) the whole-repo cross-source consistency scripts. + # ===================================================================== + - name: Heuristic + consistency scan + id: heuristics + continue-on-error: true + env: + BASE_SHA: ${{ github.event.pull_request.base.sha }} + HEAD_SHA: ${{ github.sha }} + run: | + set +e + REPORT=/tmp/security-report.md + : > "$REPORT" + + HITS=0 + add_section() { + local title="$1"; local body="$2" + if [[ -n "$body" ]]; then + { + echo "### $title" + echo + echo '```' + echo "$body" + echo '```' + echo + } >> "$REPORT" + HITS=$((HITS + 1)) + fi + } + + # gitleaks section produced by the previous step (already fenced). + if [[ -s /tmp/gitleaks-section.md ]]; then + cat /tmp/gitleaks-section.md >> "$REPORT" + HITS=$((HITS + 1)) + fi + + # ---- Changed-file sets (present in the tree at HEAD) ------------ + DIFF="git diff --name-only --diff-filter=ACMRT ${BASE_SHA} ${HEAD_SHA} --" + CHANGED_ALL=$($DIFF 2>/dev/null || true) + CHANGED_RUST=$(echo "$CHANGED_ALL" | grep -E '^src-tauri/.*\.rs$' || true) + CHANGED_RUST_NONTEST=$(echo "$CHANGED_RUST" | grep -vE '(^|/)(tests?|integration_tests)\.rs$|/tests/|_tests?\.rs$|/test_' || true) + CHANGED_TS=$(echo "$CHANGED_ALL" | grep -E '^src/.*\.(ts|tsx)$' | grep -vE '\.test\.(ts|tsx)$|/test/|__tests__' || true) + CHANGED_WF=$(echo "$CHANGED_ALL" | grep -E '^\.github/workflows/.*\.ya?ml$' || true) + + echo "Changed (all): $(echo "$CHANGED_ALL" | grep -c . || true) file(s)" + echo " Rust: $(echo "$CHANGED_RUST" | grep -c . || true) | TS: $(echo "$CHANGED_TS" | grep -c . || true) | workflows: $(echo "$CHANGED_WF" | grep -c . || true)" + echo + + # ---- 2) Rust subprocess shell interpolation -------------------- + # CLAUDE.md invariant: "No shell interpolation — all subprocess + # calls use parameterised Command::new().arg(); no sh -c format!()". + if [[ -n "$CHANGED_RUST_NONTEST" ]]; then + SHELL_HITS=$(echo "$CHANGED_RUST_NONTEST" | xargs -r grep -nE 'Command::new\(\s*"(sh|bash|zsh|cmd|powershell|pwsh)"|\.arg\(\s*"-c"|\b(sh|bash) -c\b' 2>/dev/null || true) + add_section "Rust subprocess shell interpolation (use parameterised Command::new().arg() — no sh -c)" "$SHELL_HITS" + fi + + # ---- 3) Unsafe Rust -------------------------------------------- + if [[ -n "$CHANGED_RUST_NONTEST" ]]; then + UNSAFE=$(echo "$CHANGED_RUST_NONTEST" | xargs -r grep -nE '\bunsafe\s*(\{|fn\b)' 2>/dev/null || true) + add_section "Unsafe Rust blocks/functions (justify the safety invariant in a comment)" "$UNSAFE" + fi + + # ---- 4) Dangerous frontend sinks (XSS) ------------------------- + # MeedyaDL renders Markdown help via rehype-sanitize; raw HTML + # injection of user/remote data is the risk. Advisory — confirm + # the source is sanitised/trusted. + if [[ -n "$CHANGED_TS" ]]; then + SINKS=$(echo "$CHANGED_TS" | xargs -r grep -nE 'dangerouslySetInnerHTML|\.innerHTML\s*=|\beval\s*\(|new[[:space:]]+Function\s*\(' 2>/dev/null || true) + add_section "Dangerous frontend sinks (eval / new Function / dangerouslySetInnerHTML / innerHTML=)" "$SINKS" + fi + + # ---- 5) Hardcoded absolute filesystem paths -------------------- + # CLAUDE.md #459: validate_path_safe rejects traversal; paths + # should derive from app_data_dir / env, never be hardcoded. + PATH_TARGETS=$(printf '%s\n%s\n' "$CHANGED_RUST_NONTEST" "$CHANGED_TS" | grep -E '.' || true) + if [[ -n "$PATH_TARGETS" ]]; then + PATHS=$(echo "$PATH_TARGETS" | xargs -r grep -nE "(['\"])(/Users/|/home/[^/\"' ]+/|C:\\\\)" 2>/dev/null || true) + add_section "Hardcoded absolute filesystem paths (derive from app_data_dir / std::env, not literals)" "$PATHS" + fi + + # ---- 6) Unpinned GitHub Actions -------------------------------- + # CLAUDE.md CI-hardening: release-critical actions are pinned to + # immutable commit SHAs, not mutable tags. Flag any `uses: x@ref` + # in changed workflows where ref is not a 40-hex SHA (local `./` + # and docker:// refs are exempt). + if [[ -n "$CHANGED_WF" ]]; then + UNPINNED="" + for f in $CHANGED_WF; do + [[ -f "$f" ]] || continue + while IFS= read -r hit; do + [[ -z "$hit" ]] && continue + ref="${hit##*@}" + ref="${ref%% *}"; ref="${ref%%#*}"; ref="${ref//[[:space:]]/}" + if ! echo "$ref" | grep -qiE '^[0-9a-f]{40}$'; then + UNPINNED="${UNPINNED}${f}: ${hit#*:}"$'\n' + fi + done < <(grep -nE '^[[:space:]]*(-[[:space:]]+)?uses:[[:space:]]*[A-Za-z0-9._-]+/[A-Za-z0-9._/-]+@' "$f" 2>/dev/null) + done + add_section "Unpinned GitHub Actions (pin to a 40-char commit SHA, not a tag)" "$UNPINNED" + fi + + # ---- 7) Sensitive / proprietary path touches ------------------- + # assets/brand is PROPRIETARY (not MIT — never change its license + # headers). capabilities/signing/tauri.conf.json/workflows are + # security-relevant surfaces. Advisory: confirm intentional. + SENSITIVE=$(echo "$CHANGED_ALL" | grep -E '^(assets/brand/|src-tauri/capabilities/|src-tauri/tauri\.conf\.json$|\.github/workflows/|\.github/rulesets/)|\.entitlements$|(^|/)(signing|codesign)' || true) + add_section "PR touches a sensitive / proprietary path (assets/brand is PROPRIETARY; confirm intentional)" "$SENSITIVE" + + # ---- 8) Cross-source consistency scripts ----------------------- + # Run whole-repo (not changed-file-scoped) — a drift can live in an + # unchanged file this PR orphaned. Each script exits 0; we surface + # any `•` bullets. + IPC=$(python3 tools/audit-checks/check_ipc_commands.py 2>&1 || true) + if echo "$IPC" | grep -q '•'; then + add_section "Tauri IPC contract drift (defined vs registered vs invoked)" "$(echo "$IPC" | grep -A100 '###')" + fi + CODECS=$(python3 tools/audit-checks/check_codec_registry.py 2>&1 || true) + if echo "$CODECS" | grep -q '•'; then + add_section "Codec registry drift (codecs.toml vs SongCodec / meta resolution)" "$(echo "$CODECS" | grep -A100 '###')" + fi + + echo "hits=$HITS" >> "$GITHUB_OUTPUT" + echo "Total finding sections: $HITS" + if [[ -s "$REPORT" ]]; then + echo "----- report -----" + cat "$REPORT" + fi + + # ===================================================================== + # Post (or update) a single advisory comment on the PR. + # Upsert by hidden marker so repeated pushes edit one comment instead + # of spamming a new one each time. + # ===================================================================== + - name: Upsert PR comment + if: always() && github.event_name == 'pull_request' + continue-on-error: true + env: + GH_TOKEN: ${{ secrets.GITHUB_TOKEN }} + PR_NUMBER: ${{ github.event.pull_request.number }} + REPO: ${{ github.repository }} + HITS: ${{ steps.heuristics.outputs.hits }} + GITLEAKS_COUNT: ${{ steps.gitleaks.outputs.count }} + GITLEAKS_OUTCOME: ${{ steps.install_gitleaks.outcome }} + run: | + MARKER='' + REPORT=/tmp/security-report.md + COMMENT=/tmp/comment.md + + { + echo "$MARKER" + echo "## PR Security Checks" + echo + if [[ "${HITS:-0}" == "0" ]]; then + echo "✅ No heuristic or consistency findings on this PR." + else + echo "⚠️ **${HITS} category(ies) of findings** — these are heuristics, please review each:" + echo + cat "$REPORT" + fi + if [[ "${GITLEAKS_OUTCOME}" != "success" ]]; then + echo "> ℹ️ gitleaks CLI could not be installed this run — the secrets scan was skipped." + echo + fi + echo "Generated by \`.github/workflows/pr-security.yml\`. Non-blocking — the merge gate is \`ci.yml\`. Cross-source checks live in \`tools/audit-checks/\` and run locally. False positives are expected." + } > "$COMMENT" + + # Find an existing comment carrying our marker and edit it; else create. + EXISTING=$(gh api "repos/${REPO}/issues/${PR_NUMBER}/comments" --paginate \ + --jq ".[] | select(.body | contains(\"${MARKER}\")) | .id" 2>/dev/null | head -n1 || true) + + if [[ -n "$EXISTING" ]]; then + jq -Rs '{body: .}' "$COMMENT" | gh api --method PATCH \ + "repos/${REPO}/issues/comments/${EXISTING}" --input - >/dev/null \ + && echo "Updated existing comment ${EXISTING}" \ + || gh pr comment "$PR_NUMBER" --repo "$REPO" --body-file "$COMMENT" + else + jq -Rs '{body: .}' "$COMMENT" | gh api --method POST \ + "repos/${REPO}/issues/${PR_NUMBER}/comments" --input - >/dev/null \ + && echo "Created new comment" \ + || gh pr comment "$PR_NUMBER" --repo "$REPO" --body-file "$COMMENT" + fi + + # ===================================================================== + # Summary in the Actions UI (always visible, even on manual dispatch). + # ===================================================================== + - name: Job summary + if: always() + run: | + { + echo "### PR Security Checks" + echo + if [[ "${{ steps.install_gitleaks.outcome }}" == "success" ]]; then + echo "- gitleaks: ${{ steps.gitleaks.outputs.count || '0' }} potential secret(s)" + else + echo "- gitleaks: install failed — scan skipped" + fi + echo "- Heuristic + consistency: ${{ steps.heuristics.outputs.hits || '0' }} finding category(ies)" + echo + echo "Non-blocking advisory checks. Merge gate remains \`ci.yml\`." + } >> "$GITHUB_STEP_SUMMARY" diff --git a/tools/audit-checks/README.md b/tools/audit-checks/README.md new file mode 100644 index 00000000..e1a012ea --- /dev/null +++ b/tools/audit-checks/README.md @@ -0,0 +1,55 @@ + + +# Audit checks + +Cross-source consistency checks for MeedyaDL. Each script validates that one +part of the codebase still agrees with another part that the Rust/TypeScript +compilers **cannot** check for you — the "code references something that +doesn't exist in another source" bug class. + +They are invoked by the **PR Security Checks** workflow +(`.github/workflows/pr-security.yml`) on every pull request, and are runnable +locally with no dependencies beyond Python 3 (the TOML is parsed with +targeted regex, so no `tomllib`/`tomli`/venv is needed). + +| Script | What it validates | Analogous bug class | +| --- | --- | --- | +| `check_ipc_commands.py` | Tauri IPC contract: every `#[tauri::command]` is registered in `lib.rs`'s `generate_handler![]`, and every frontend `invoke('x')` targets a registered command. | A button that calls a command the backend never registered → runtime "command not found". | +| `check_codec_registry.py` | `codecs.toml` integrity: every meta-codec `resolves_to` target is a real codec section, and every audio `services.gamdl` flag is a real `SongCodec` variant. | A renamed/removed codec leaving the registry pointing at nothing → download fails. | + +## Running locally + +```bash +# Advisory (always exits 0; prints any findings) — what a quick check looks like +python3 tools/audit-checks/check_ipc_commands.py +python3 tools/audit-checks/check_codec_registry.py + +# Strict (exits 1 on a high-severity finding) — handy in a pre-push hook +python3 tools/audit-checks/check_ipc_commands.py --strict +python3 tools/audit-checks/check_codec_registry.py --strict +``` + +## Conventions + +- **Findings are printed as ` • path:line — message` bullets.** The + workflow greps for the `•` bullet to decide whether to surface a section in + the PR comment, so keep that prefix if you add findings. +- **Zero findings on a clean tree is mandatory.** These are precision tools, + not lint nags — a check that cries wolf on day one gets ignored. Add a + negative test (inject the drift, confirm it's caught, revert) when you add + or change a check. +- **Default exit 0, `--strict` exit 1.** CI runs them advisory; local hooks + can opt into blocking. + +## Adding a check + +Good candidates are pairs of sources that must agree but have no compiler +link between them. Ideas not yet implemented: + +- `engines.toml` engine IDs ↔ the `EngineCommandBuilder` implementations + registered in `engine_runner.rs`. +- `tool-versions.toml` tool IDs ↔ the tools `dependency_manager.rs` installs. +- Rust `AppSettings` fields ↔ the TypeScript `AppSettings` type (watch for + serde renames — high false-positive risk; validate carefully before adding). +- i18n: keys referenced via `t('x')` ↔ keys present in + `public/locales/en/translation.json`. diff --git a/tools/audit-checks/check_codec_registry.py b/tools/audit-checks/check_codec_registry.py new file mode 100755 index 00000000..af9640fc --- /dev/null +++ b/tools/audit-checks/check_codec_registry.py @@ -0,0 +1,215 @@ +#!/usr/bin/env python3 +# Copyright (c) 2026 MeedyaSuite +# Licensed under the MIT License. See LICENSE file in the project root. +""" +Codec registry cross-source consistency check. + +`src-tauri/codecs.toml` is the universal codec registry — compiled into the +binary via `include_str!()` and parsed at runtime by +`models/codec_registry.rs`. It cross-references itself (meta codecs resolve +to concrete codec IDs) and the Rust `SongCodec` enum in +`models/gamdl_options.rs` (each concrete codec's `services.gamdl` value is a +GAMDL CLI string that must be a real `SongCodec` variant). Neither link is +checked by the compiler, so drift is silent until a download fails. + +Two checks (both pure cross-source reference validation — the MeedyaDL +analog of WebMS-Intra's `check_sql_columns.py` / `check_route_targets.py`): + + 1. META RESOLUTION — every `resolves_to = { = "" }` target must + be a concrete codec section that exists in codecs.toml. Catches a meta + codec left pointing at a renamed/removed concrete codec. + + 2. GAMDL CLI VALIDITY — every concrete `[audio..services]` `gamdl` + value must be a kebab-case `SongCodec` enum variant. Catches a typo'd + flag or a `SongCodec` rename that didn't propagate to the registry. + +Video/lyrics `gamdl` values are intentionally NOT validated here: GAMDL's +video-codec / lyrics CLI strings have no single canonical Rust enum to check +against, and guessing would produce false positives. + +The TOML is parsed with targeted regex (no `tomllib`/`tomli` dependency) so +the script runs on any Python 3 without a venv — matching the audit-checks +house style. + +Exit code: + 0 — no findings, OR findings without --strict + 1 — at least one finding AND --strict was passed + +Usage: + python3 tools/audit-checks/check_codec_registry.py [--strict] +""" + +from __future__ import annotations + +import re +import sys +from pathlib import Path + +REPO_ROOT = Path(__file__).resolve().parents[2] +CODECS_TOML = REPO_ROOT / "src-tauri" / "codecs.toml" +GAMDL_OPTIONS_RS = REPO_ROOT / "src-tauri" / "src" / "models" / "gamdl_options.rs" + +# A concrete codec section header, e.g. `[audio.eac3-atmos]` (exactly two +# dot-segments — NOT the `[audio.x.services]` sub-table). +CONCRETE_SECTION_RE = re.compile(r"^\[(audio|video)\.([a-z0-9-]+)\]\s*$") +# A `[audio..services]` sub-table header. +SERVICES_SECTION_RE = re.compile(r"^\[(audio|video)\.([a-z0-9-]+)\.services\]\s*$") +# Any section header (used to know when a sub-table block ends). +ANY_SECTION_RE = re.compile(r"^\[") +# A `gamdl = "value"` line. +GAMDL_KV_RE = re.compile(r"""^\s*gamdl\s*=\s*['"]([a-z0-9-]+)['"]""") +# A `resolves_to = { svc = "id", ... }` inline table. +RESOLVES_RE = re.compile(r"resolves_to\s*=\s*\{([^}]*)\}") +INLINE_PAIR_RE = re.compile(r"""([a-z_][a-z0-9_]*)\s*=\s*['"]([a-z0-9-]+)['"]""") + + +def variant_to_kebab(variant: str) -> str: + """Convert a Rust enum variant name to its #[serde(rename_all = + "kebab-case")] string. `AacHeBinaural` -> `aac-he-binaural`.""" + out: list[str] = [] + for i, ch in enumerate(variant): + if ch.isupper() and i > 0: + out.append("-") + out.append(ch.lower()) + return "".join(out) + + +def collect_song_codec_cli_values() -> set[str]: + """Derive the set of valid GAMDL song-codec CLI strings from the + `SongCodec` enum. The enum carries `#[serde(rename_all = "kebab-case")]`; + any per-variant `#[serde(rename = "x")]` override takes precedence.""" + text = GAMDL_OPTIONS_RS.read_text(encoding="utf-8", errors="ignore") + m = re.search(r"pub\s+enum\s+SongCodec\s*\{(.*?)\n\}", text, re.DOTALL) + if not m: + print("WARNING: SongCodec enum not found in gamdl_options.rs", file=sys.stderr) + return set() + body = m.group(1) + values: set[str] = set() + pending_rename: str | None = None + for line in body.splitlines(): + stripped = line.strip() + if stripped.startswith("//"): + continue + rn = re.search(r"""#\[serde\(rename\s*=\s*['"]([^'"]+)['"]""", stripped) + if rn: + pending_rename = rn.group(1) + continue + vm = re.match(r"([A-Z][A-Za-z0-9]*)\s*,", stripped) + if vm: + if pending_rename is not None: + values.add(pending_rename) + pending_rename = None + else: + values.add(variant_to_kebab(vm.group(1))) + return values + + +def parse_codecs_toml() -> tuple[set[str], dict[str, str], list[tuple[int, str, str]]]: + """Parse codecs.toml with targeted regex. + + Returns: + concrete_ids — set of concrete codec IDs (the two-segment sections) + gamdl_by_id — {codec_id: gamdl_cli_value} for concrete codecs that + declare a [.services] gamdl mapping + resolves — list of (line_no, service, target_id) from every + resolves_to inline table + """ + lines = CODECS_TOML.read_text(encoding="utf-8", errors="ignore").splitlines() + concrete_ids: set[str] = set() + gamdl_by_id: dict[str, str] = {} + resolves: list[tuple[int, str, str]] = [] + + current_services_id: str | None = None + for i, line in enumerate(lines): + cm = CONCRETE_SECTION_RE.match(line) + if cm: + concrete_ids.add(cm.group(2)) + current_services_id = None + continue + sm = SERVICES_SECTION_RE.match(line) + if sm: + current_services_id = sm.group(2) + continue + if ANY_SECTION_RE.match(line): + # Some other section header — leaving any services block. + current_services_id = None + if current_services_id is not None: + gm = GAMDL_KV_RE.match(line) + if gm: + gamdl_by_id[current_services_id] = gm.group(1) + rm = RESOLVES_RE.search(line) + if rm: + for pair in INLINE_PAIR_RE.finditer(rm.group(1)): + resolves.append((i + 1, pair.group(1), pair.group(2))) + + return concrete_ids, gamdl_by_id, resolves + + +def check() -> int: + if not CODECS_TOML.exists(): + print(f"WARNING: {CODECS_TOML} not found — skipping", file=sys.stderr) + return 0 + + concrete_ids, gamdl_by_id, resolves = parse_codecs_toml() + song_cli = collect_song_codec_cli_values() + + print(f"Concrete codec sections : {len(concrete_ids)}") + print(f"Concrete codecs with gamdl flag : {len(gamdl_by_id)}") + print(f"Meta resolves_to references : {len(resolves)}") + print(f"SongCodec CLI values (from Rust) : {len(song_cli)}") + print() + + findings = 0 + + # 1) Meta resolution integrity. + dangling = [(ln, svc, tgt) for (ln, svc, tgt) in resolves if tgt not in concrete_ids] + if dangling: + print("### Meta codec resolves_to a codec ID that does not exist\n") + for ln, svc, tgt in dangling: + print(f" • codecs.toml:{ln} — resolves_to {svc} = \"{tgt}\" but no [audio.{tgt}]/[video.{tgt}] section exists") + print() + findings += len(dangling) + + # 2) GAMDL CLI value validity (audio only; see module docstring). + # We only validate against SongCodec when we successfully parsed it. + if song_cli: + bad_gamdl = [ + (cid, val) + for cid, val in sorted(gamdl_by_id.items()) + # Restrict to AUDIO codecs — video gamdl values (h264/h265) and + # lyrics values (lrc/srt/ttml) are not SongCodec variants. + if cid in concrete_ids and val not in song_cli and _is_audio(cid) + ] + if bad_gamdl: + print("### Audio codec gamdl flag is not a known SongCodec CLI value\n") + for cid, val in bad_gamdl: + print(f" • codecs.toml [audio.{cid}.services] — gamdl = \"{val}\" is not a SongCodec variant (kebab-case)") + print() + findings += len(bad_gamdl) + + if findings == 0: + print("OK — codecs.toml meta references resolve and audio gamdl flags match SongCodec.") + + if findings and "--strict" in sys.argv: + return 1 + return 0 + + +# codecs.toml does not tag concrete sections audio-vs-video in a way the +# regex parser retains per-id, so recover it by re-reading the header set. +_AUDIO_IDS: set[str] | None = None + + +def _is_audio(codec_id: str) -> bool: + global _AUDIO_IDS + if _AUDIO_IDS is None: + _AUDIO_IDS = set() + for line in CODECS_TOML.read_text(encoding="utf-8", errors="ignore").splitlines(): + m = re.match(r"^\[audio\.([a-z0-9-]+)\]\s*$", line) + if m: + _AUDIO_IDS.add(m.group(1)) + return codec_id in _AUDIO_IDS + + +if __name__ == "__main__": + sys.exit(check()) diff --git a/tools/audit-checks/check_ipc_commands.py b/tools/audit-checks/check_ipc_commands.py new file mode 100755 index 00000000..e82b0bbb --- /dev/null +++ b/tools/audit-checks/check_ipc_commands.py @@ -0,0 +1,220 @@ +#!/usr/bin/env python3 +# Copyright (c) 2026 MeedyaSuite +# Licensed under the MIT License. See LICENSE file in the project root. +""" +Tauri IPC contract consistency check. + +MeedyaDL's frontend talks to the Rust backend exclusively through Tauri's +`invoke()` bridge. Three sources have to agree for an IPC call to work at +runtime, and the Rust compiler only enforces ONE of the three links: + + (A) Backend definition — a `#[tauri::command]` function under + `src-tauri/src/`. + (B) Backend registration — the function listed in the + `tauri::generate_handler![ ... ]` block in + `src-tauri/src/lib.rs`. + (C) Frontend call site — an `invoke('command_name')` literal somewhere + under `src/`. + +The compiler rejects (B) referencing a non-existent (A) — that link is safe. +But it is blind to the two links that actually break shipping builds: + + * (A) WITHOUT (B): a command is defined but never registered. It compiles + cleanly, yet every frontend `invoke()` of it fails at runtime with + "command not found". Dead IPC. + + * (C) WITHOUT (B): the frontend invokes a name that no registered command + answers (typo, renamed-but-not-updated, or deleted backend command). + Also a runtime "command not found" — only discovered when a user clicks + the button. + +This is the MeedyaDL analog of WebMS-Intra's `check_route_targets.py` +("code references a target that doesn't exist in another source"). + +Exit code: + 0 — no findings (or only informational), OR findings without --strict + 1 — at least one (C)-without-(B) finding AND --strict was passed + +Usage: + python3 tools/audit-checks/check_ipc_commands.py [--strict] +""" + +from __future__ import annotations + +import re +import sys +from pathlib import Path + +REPO_ROOT = Path(__file__).resolve().parents[2] +RUST_SRC = REPO_ROOT / "src-tauri" / "src" +LIB_RS = RUST_SRC / "lib.rs" +FRONTEND_SRC = REPO_ROOT / "src" + +# A `#[tauri::command]` / `#[command]` attribute, with or without args +# (e.g. `#[tauri::command(rename_all = "snake_case")]`). +COMMAND_ATTR_RE = re.compile(r"#\[\s*(?:tauri::)?command\b") +# The function name on (or shortly after) the attribute line. +FN_NAME_RE = re.compile(r"\bfn\s+([a-z_][A-Za-z0-9_]*)") +# A frontend invoke() call with a string-literal target, with or without the +# `invoke(...)` generic turbofish. +INVOKE_RE = re.compile(r"""invoke\s*(?:<[^>]*>)?\s*\(\s*['"]([a-z_][a-z0-9_]*)['"]""") + +# The JSDoc usage example in tauri-commands.ts uses this placeholder; it is +# never a real command. Excluded explicitly in addition to comment-stripping. +PLACEHOLDER_NAMES = {"command_name"} + + +def strip_line_comments(text: str) -> str: + """Strip // line comments and /* */ block comments while preserving line + numbers (block comments are replaced by an equal count of newlines so + reported line numbers still line up with the original file).""" + text = re.sub( + r"/\*.*?\*/", + lambda m: "\n" * m.group(0).count("\n"), + text, + flags=re.DOTALL, + ) + # Strip // to end-of-line. This is a heuristic (it will also blank a // + # that appears inside a string literal), which is acceptable here: we + # only ever match command-name *identifiers*, and a false strip can at + # most hide a call site, never invent one. + text = re.sub(r"//[^\n]*", "", text) + return text + + +def collect_defined_commands() -> dict[str, tuple[str, int]]: + """Scan every .rs file under src-tauri/src for `#[tauri::command]` + functions. Returns {fn_name: (relative_path, line_no)}.""" + defined: dict[str, tuple[str, int]] = {} + for rs in sorted(RUST_SRC.rglob("*.rs")): + try: + lines = rs.read_text(encoding="utf-8", errors="ignore").splitlines() + except OSError: + continue + rel = str(rs.relative_to(REPO_ROOT)) + for i, line in enumerate(lines): + if not COMMAND_ATTR_RE.search(line): + continue + # The fn may be on this line or up to a few attribute/doc lines + # below the #[tauri::command] attribute. + for j in range(i, min(i + 8, len(lines))): + m = FN_NAME_RE.search(lines[j]) + if m: + defined.setdefault(m.group(1), (rel, j + 1)) + break + return defined + + +def collect_registered_commands() -> set[str]: + """Extract the generate_handler![ ... ] block from lib.rs and collect the + final path segment of every registered command (e.g. + `commands::system::get_platform_info` -> `get_platform_info`).""" + text = LIB_RS.read_text(encoding="utf-8", errors="ignore") + start = text.find("generate_handler![") + if start == -1: + print("WARNING: generate_handler![ not found in lib.rs", file=sys.stderr) + return set() + # Bracket-match from the opening [ to its partner so we capture exactly + # the macro's argument list, nothing after it. + open_idx = text.index("[", start) + depth = 0 + end_idx = open_idx + for idx in range(open_idx, len(text)): + c = text[idx] + if c == "[": + depth += 1 + elif c == "]": + depth -= 1 + if depth == 0: + end_idx = idx + break + block = strip_line_comments(text[open_idx + 1 : end_idx]) + registered: set[str] = set() + for raw in block.split(","): + token = raw.strip() + if not token: + continue + # Take the segment after the last `::` path separator. + ident = token.split("::")[-1].strip() + if re.fullmatch(r"[a-z_][a-z0-9_]*", ident): + registered.add(ident) + return registered + + +def collect_frontend_invokes() -> dict[str, tuple[str, int]]: + """Scan TS/TSX under src/ (excluding test + mock files) for invoke() + string-literal targets. Returns {name: (relative_path, line_no)} keeping + the first occurrence.""" + invokes: dict[str, tuple[str, int]] = {} + for ext in ("*.ts", "*.tsx"): + for ts in sorted(FRONTEND_SRC.rglob(ext)): + rel = str(ts.relative_to(REPO_ROOT)) + # Test scaffolding mocks invoke() with arbitrary names — skip it. + if ".test." in ts.name or "/test/" in rel.replace("\\", "/") or "__tests__" in rel: + continue + try: + raw = ts.read_text(encoding="utf-8", errors="ignore") + except OSError: + continue + stripped = strip_line_comments(raw) + for m in INVOKE_RE.finditer(stripped): + name = m.group(1) + if name in PLACEHOLDER_NAMES: + continue + line_no = stripped[: m.start()].count("\n") + 1 + invokes.setdefault(name, (rel, line_no)) + return invokes + + +def check() -> int: + defined = collect_defined_commands() + registered = collect_registered_commands() + invokes = collect_frontend_invokes() + + print(f"#[tauri::command] functions defined : {len(defined)}") + print(f"Commands registered in lib.rs : {len(registered)}") + print(f"Distinct frontend invoke() targets : {len(invokes)}") + print() + + # (A) without (B): defined but never registered -> dead IPC. + defined_not_registered = sorted(set(defined) - registered) + # (C) without (B): frontend calls a command that is not registered. + frontend_unregistered = sorted(set(invokes) - registered) + # (B) without (A): registered name with no discoverable definition. + # Normally a compile error, so a non-empty set here usually means this + # parser missed an unusual definition site — surfaced as informational. + registered_not_defined = sorted(registered - set(defined)) + + high_severity = 0 + + if frontend_unregistered: + print("### Frontend invoke() targets with no registered backend command\n") + for name in frontend_unregistered: + rel, ln = invokes[name] + print(f" • {rel}:{ln} — invoke('{name}') has no matching registered #[tauri::command]") + print() + high_severity += len(frontend_unregistered) + + if defined_not_registered: + print("### Commands defined but NOT registered in generate_handler![] (unreachable IPC)\n") + for name in defined_not_registered: + rel, ln = defined[name] + print(f" • {rel}:{ln} — `{name}` is #[tauri::command] but absent from lib.rs generate_handler![]") + print() + + if registered_not_defined: + print("### Registered in lib.rs but no #[tauri::command] definition found (informational)\n") + for name in registered_not_defined: + print(f" • lib.rs — `{name}` is registered but this checker found no defining function (parser gap?)") + print() + + if not (frontend_unregistered or defined_not_registered or registered_not_defined): + print("OK — frontend, backend definitions, and registration are consistent.") + + if high_severity and "--strict" in sys.argv: + return 1 + return 0 + + +if __name__ == "__main__": + sys.exit(check())