diff --git a/.github/ISSUE_TEMPLATE/bug_report.yml b/.github/ISSUE_TEMPLATE/bug_report.yml index 69d6b8e..5dca1cd 100644 --- a/.github/ISSUE_TEMPLATE/bug_report.yml +++ b/.github/ISSUE_TEMPLATE/bug_report.yml @@ -1,35 +1,136 @@ name: Bug Report -description: Report a bug in MCTS -title: "[Bug]: " -labels: ["bug", "triage"] +description: Something in MCTS is broken or behaves unexpectedly +title: "[BUG]: " +labels: ["type:bug", "status:triage"] body: - type: markdown attributes: value: | - For MCTS tool bugs, see [Getting Started](https://github.com/MCP-Audit/MCTS/blob/main/docs/get-started/getting-started.md) and [CLI Reference](https://github.com/MCP-Audit/MCTS/blob/main/docs/platform/cli.md). For vulnerabilities in **MCTS itself**, see [SECURITY.md](https://github.com/MCP-Audit/MCTS/blob/main/SECURITY.md). + **Before you submit:** search [open issues](https://github.com/MCP-Audit/MCTS/issues) for duplicates and reproduce on latest `main` or `develop`. + + - [Getting Started](https://github.com/MCP-Audit/MCTS/blob/main/docs/get-started/getting-started.md) + - [CLI Reference](https://github.com/MCP-Audit/MCTS/blob/main/docs/platform/cli.md) + - Vulnerabilities in **MCTS itself** → [SECURITY.md](https://github.com/MCP-Audit/MCTS/blob/main/SECURITY.md) (not this template) + + - type: checkboxes + id: checklist + attributes: + label: Checklist + options: + - label: I searched existing issues and did not find a duplicate + required: true + - label: I reproduced this on the latest `main` or `develop` branch + required: true + + - type: textarea + id: summary + attributes: + label: Summary + description: One or two sentences describing the bug. + placeholder: "`mcts scan` crashes when scanning with `--snapshot` and an empty tools array." + validations: + required: true + + - type: textarea + id: expected + attributes: + label: Expected behavior + description: What should have happened instead? + validations: + required: true + - type: textarea - id: description + id: actual attributes: - label: What happened? - description: Describe the bug and what you expected. + label: Actual behavior + description: What happened? Include error messages, exit codes, or unexpected output. validations: required: true + - type: textarea id: reproduce attributes: label: Steps to reproduce + description: Exact commands and inputs so a maintainer can replay the issue. placeholder: | - 1. Run `mcts scan ...` - 2. See error + 1. `uv sync --all-extras` + 2. `uv run mcts scan examples/vulnerable-mcp-server/server.py --scoring both` + 3. See error … + render: shell validations: required: true + + - type: textarea + id: evidence + attributes: + label: Evidence + description: Logs, stack traces, config snippets, or file paths. Redact secrets. + render: shell + + - type: dropdown + id: component + attributes: + label: Component (suggested) + description: Primary area affected. Maintainers may adjust after triage. + options: + - component:cli + - component:api + - component:reporting + - component:ui + - component:sast + - component:live-probe + - component:fuzz + - component:inventory + - component:github-action + - component:ci + - component:scripts + - component:release + - component:auth + - component:docs + - component:other (comment in body) + validations: + required: true + + - type: dropdown + id: priority + attributes: + label: Priority (suggested) + description: Your best estimate — maintainers confirm during triage. + options: + - "priority:P0 — blocks production / data loss / security bypass" + - "priority:P1 — major broken workflow or incorrect security result" + - "priority:P2 — medium impact; workaround exists" + - "priority:P3 — minor / cosmetic / docs polish" + validations: + required: true + - type: input id: version attributes: label: MCTS version - placeholder: 0.1.0 + description: Output of `mcts --version` or PyPI/git tag. + placeholder: "0.1.2 or git commit abc1234" + - type: input id: python attributes: label: Python version - placeholder: 3.12 + placeholder: "3.12" + + - type: input + id: platform + attributes: + label: OS / environment + placeholder: "macOS 15, Ubuntu 24.04, GitHub Actions, etc." + + - type: textarea + id: impact + attributes: + label: Impact + description: Who is affected and how severely (CLI users, CI, API deployments, etc.)? + + - type: textarea + id: references + attributes: + label: References + description: Related issues, PRs, or doc links (optional). diff --git a/.github/ISSUE_TEMPLATE/config.yml b/.github/ISSUE_TEMPLATE/config.yml new file mode 100644 index 0000000..51d5063 --- /dev/null +++ b/.github/ISSUE_TEMPLATE/config.yml @@ -0,0 +1,11 @@ +blank_issues_enabled: false +contact_links: + - name: Security disclosure (MCTS vulnerabilities) + url: https://github.com/MCP-Audit/MCTS/blob/main/SECURITY.md + about: Do not file public issues for undisclosed security vulnerabilities in MCTS itself. + - name: Issue labeling guide + url: https://github.com/MCP-Audit/MCTS/blob/main/docs/contributing/issue-labeling.md + about: How maintainers label type, priority, component, and status. + - name: Contributing guide + url: https://github.com/MCP-Audit/MCTS/blob/main/CONTRIBUTING.md + about: Development setup, branch workflow, and PR expectations. diff --git a/.github/ISSUE_TEMPLATE/documentation.yml b/.github/ISSUE_TEMPLATE/documentation.yml new file mode 100644 index 0000000..f2dd642 --- /dev/null +++ b/.github/ISSUE_TEMPLATE/documentation.yml @@ -0,0 +1,79 @@ +name: Documentation +description: Report incorrect, missing, or unclear documentation +title: "[DOCS]: " +labels: ["type:docs", "status:triage"] +body: + - type: markdown + attributes: + value: | + Docs live under [`docs/`](https://github.com/MCP-Audit/MCTS/tree/main/docs). Entry points: [Getting Started](https://github.com/MCP-Audit/MCTS/blob/main/docs/get-started/getting-started.md), [Glossary](https://github.com/MCP-Audit/MCTS/blob/main/docs/glossary.md), [Documentation index](https://github.com/MCP-Audit/MCTS/blob/main/docs/index.md). + + - type: checkboxes + id: checklist + attributes: + label: Checklist + options: + - label: I searched existing issues and did not find a duplicate + required: true + + - type: textarea + id: summary + attributes: + label: Summary + description: What doc is wrong or missing? + validations: + required: true + + - type: input + id: doc_path + attributes: + label: Doc path or URL + description: File path in the repo or section heading. + placeholder: "docs/platform/cli.md — mcts scan flags" + validations: + required: true + + - type: dropdown + id: issue_kind + attributes: + label: Issue type + options: + - Incorrect — contradicts current behavior + - Missing — behavior exists but is undocumented + - Unclear — confusing wording or structure + - Outdated — references old commands, versions, or branding + validations: + required: true + + - type: textarea + id: problem + attributes: + label: What's wrong today? + validations: + required: true + + - type: textarea + id: expected + attributes: + label: What should it say? + description: Suggested wording, outline, or link target. + validations: + required: true + + - type: dropdown + id: priority + attributes: + label: Priority (suggested) + options: + - "priority:P1 — misleads users on security-critical behavior" + - "priority:P2 — causes confusion but workaround is obvious" + - "priority:P3 — typo / polish / nice-to-have" + validations: + required: true + + - type: checkboxes + id: contribute + attributes: + label: Contribution + options: + - label: I am willing to open a PR with a doc fix diff --git a/.github/ISSUE_TEMPLATE/feature_request.yml b/.github/ISSUE_TEMPLATE/feature_request.yml index 55c6e45..73b6a59 100644 --- a/.github/ISSUE_TEMPLATE/feature_request.yml +++ b/.github/ISSUE_TEMPLATE/feature_request.yml @@ -1,33 +1,105 @@ name: Feature Request -description: Suggest a new feature or analyzer -title: "[Feature]: " -labels: ["enhancement", "triage"] +description: Propose new functionality, analyzer, or workflow improvement +title: "[FEATURE]: " +labels: ["type:feature", "status:triage"] body: - type: markdown attributes: value: | - Check the [Product Roadmap](https://github.com/MCP-Audit/MCTS/blob/main/docs/more/roadmap.md) and [Feature Expansion Plan](https://github.com/MCP-Audit/MCTS/blob/main/docs/more/feature-expansion-plan.md) before proposing large features. + **Before you submit:** check the [Product Roadmap](https://github.com/MCP-Audit/MCTS/blob/main/docs/more/roadmap.md) and [Feature Expansion Plan](https://github.com/MCP-Audit/MCTS/blob/main/docs/more/feature-expansion-plan.md) for overlapping work. + + Large features should start as an issue before opening a PR. See [CONTRIBUTING.md](https://github.com/MCP-Audit/MCTS/blob/main/CONTRIBUTING.md). + + - type: checkboxes + id: checklist + attributes: + label: Checklist + options: + - label: I searched existing issues and did not find a duplicate + required: true + - label: This is not a bug report (use the Bug Report template for broken behavior) + required: true + + - type: textarea + id: summary + attributes: + label: Summary + description: One or two sentences on what you want and why. + validations: + required: true + - type: textarea id: problem attributes: label: Problem - description: What security gap or workflow pain does this solve? + description: What security gap, false-negative class, or workflow pain does this solve today? validations: required: true + - type: textarea id: solution attributes: label: Proposed solution + description: How should MCTS behave? CLI flags, analyzer logic, report output, etc. validations: required: true + + - type: textarea + id: alternatives + attributes: + label: Alternatives considered + description: Other approaches you considered and why you prefer this one. + - type: dropdown - id: area + id: component + attributes: + label: Component (suggested) + options: + - component:cli + - component:api + - component:reporting + - component:ui + - component:sast + - component:live-probe + - component:fuzz + - component:inventory + - component:github-action + - component:ci + - component:scripts + - component:release + - component:auth + - component:docs + - component:other (comment in body) + validations: + required: true + + - type: dropdown + id: priority + attributes: + label: Priority (suggested) + options: + - "priority:P0 — blocks production readiness" + - "priority:P1 — high value; should land soon" + - "priority:P2 — medium value; planned backlog" + - "priority:P3 — nice-to-have / future consideration" + validations: + required: true + + - type: textarea + id: acceptance + attributes: + label: Acceptance criteria + description: Checklist of done conditions for maintainers and contributors. + placeholder: | + - [ ] New analyzer emits MCTS-T-* finding with evidence + - [ ] Regression fixture added under tests/fixtures/regression/ + - [ ] CLI flag documented in docs/platform/cli.md + validations: + required: true + + - type: checkboxes + id: contribute attributes: - label: Area + label: Contribution options: - - Analyzer - - CLI - - Reporting - - CI/CD Action - - Documentation - - Other + - label: I am willing to open a PR for this (comment on the issue to claim it) diff --git a/.github/ISSUE_TEMPLATE/security_finding.yml b/.github/ISSUE_TEMPLATE/security_finding.yml index 74edf4d..4e8d466 100644 --- a/.github/ISSUE_TEMPLATE/security_finding.yml +++ b/.github/ISSUE_TEMPLATE/security_finding.yml @@ -1,23 +1,141 @@ name: Security Finding -description: Report a false positive or missed vulnerability pattern -title: "[Security]: " -labels: ["security", "triage"] +description: Report a false positive, false negative, or scoring issue in MCTS results +title: "[SECURITY]: " +labels: ["type:security", "status:triage"] body: - type: markdown attributes: value: | - Technique IDs and taxonomy: [Threat Taxonomy](https://github.com/MCP-Audit/MCTS/blob/main/docs/reporting/taxonomy.md). Scoring behavior: [Scoring Specification](https://github.com/MCP-Audit/MCTS/blob/main/docs/reporting/scoring-spec.md). + Use this template when MCTS **scan results** look wrong — missed risks, noisy findings, or unexpected scores. + + **Not for vulnerabilities in MCTS itself** → follow [SECURITY.md](https://github.com/MCP-Audit/MCTS/blob/main/SECURITY.md) for responsible disclosure. + + - [Threat Taxonomy](https://github.com/MCP-Audit/MCTS/blob/main/docs/reporting/taxonomy.md) — `MCTS-T-*` technique IDs + - [Scoring spec (legacy)](https://github.com/MCP-Audit/MCTS/blob/main/docs/reporting/scoring-spec.md) + - [Scoring spec (v2)](https://github.com/MCP-Audit/MCTS/blob/main/docs/reporting/scoring-spec-v2.md) + + - type: checkboxes + id: checklist + attributes: + label: Checklist + options: + - label: I searched existing issues and did not find a duplicate + required: true + - label: I am reporting scan-result accuracy, not a vulnerability in the MCTS tool itself + required: true + + - type: dropdown + id: finding_kind + attributes: + label: Finding type + options: + - finding:false-positive — MCTS flagged risk that should not fire + - finding:false-negative — real risk that MCTS missed + - Scoring / severity mismatch — score, risk level, or category seems wrong + - Attack chain / graph issue + - Other (describe in body) + validations: + required: true + + - type: textarea + id: summary + attributes: + label: Summary + description: One or two sentences on what MCTS got wrong. + validations: + required: true + - type: textarea id: finding attributes: label: Finding details - description: Describe the false positive or missed pattern. + description: Title, severity, analyzer name, technique ID, and why the result is incorrect. + placeholder: | + - Finding title: … + - Severity: critical / high / … + - Analyzer: PathValidationAnalyzer + - Technique: MCTS-T-1029 + - Why wrong: … validations: required: true + - type: textarea id: mcp-server attributes: label: MCP server context - description: Minimal repro server or tool definition (redact secrets). + description: Minimal repro — tool definition, handler snippet, or example server path. Redact secrets. + render: shell + validations: + required: true + + - type: textarea + id: reproduce + attributes: + label: Reproduction command + description: Exact `mcts scan` (or subcommand) invocation and flags. + placeholder: "uv run mcts scan examples/vulnerable-mcp-server/server.py --scoring both" + render: shell + validations: + required: true + + - type: input + id: technique_id + attributes: + label: Technique ID (if known) + placeholder: "MCTS-T-1029" + + - type: dropdown + id: scoring_mode + attributes: + label: Scoring mode + options: + - legacy (`score.overall`) + - v2 (`score_v2.absolute_risk`) + - both (default) + - unknown / not applicable + validations: + required: true + + - type: dropdown + id: component + attributes: + label: Component (suggested) + options: + - component:sast + - component:reporting + - component:live-probe + - component:cli + - component:api + - component:other (comment in body) + validations: + required: true + + - type: dropdown + id: priority + attributes: + label: Priority (suggested) + options: + - "priority:P0 — critical false negative or score bypass" + - "priority:P1 — high-severity misclassification affecting triage" + - "priority:P2 — medium noise or coverage gap" + - "priority:P3 — edge case / low-severity tuning" validations: required: true + + - type: textarea + id: expected + attributes: + label: Expected behavior + description: What should MCTS report instead (finding, severity, score, or silence)? + + - type: textarea + id: impact + attributes: + label: Impact + description: How would this misclassification affect a security review or CI gate? + + - type: textarea + id: references + attributes: + label: References + description: OWASP MCP mapping, CWE, related issues, or benchmark servers. diff --git a/.github/rulesets/README.md b/.github/rulesets/README.md new file mode 100644 index 0000000..fb8d3a2 --- /dev/null +++ b/.github/rulesets/README.md @@ -0,0 +1,75 @@ +# Repository rulesets + +Version-controlled [repository ruleset](https://docs.github.com/en/repositories/configuring-branches-and-merges-in-your-repository/managing-rulesets/about-rulesets) definitions for MCP-Audit/MCTS. + +Apply or refresh rulesets (repo admin, `gh` CLI authenticated): + +```bash +./scripts/enable-branch-protection.sh MCP-Audit/MCTS +./scripts/enable-branch-protection.sh MCP-Audit/MCTS --dry-run # preview only +``` + +If you previously applied a ruleset named `Protect main`, delete it under **Settings → Rules** after applying — this repo now uses `Protect release branches` (same file: `main.json`). + +## Branch access model + +| Branch | Who can update | How changes land | +|--------|----------------|------------------| +| `main` (current release) | **Maintainers** (`maintain` role) and **Admins** | PRs from `develop` or feature branches; maintainers merge when CI is green | +| `main_*` (pinned releases, e.g. `main_0.1.2`) | **Maintainers** and **Admins** | Same policy as `main` — hotfix PRs merged by maintainers | +| `develop` (integration) | **Admins only** (`admin` role) | Direct pushes by admins; contributors open PRs to `develop` from feature branches | + +The `update` rule blocks direct pushes unless the actor is in `bypass_actors`. Repository role IDs (GitHub API): + +| Role | `actor_id` | Typical members | +|------|------------|-----------------| +| `maintain` | `2` | Release maintainers — can merge PRs into `main` | +| `write` | `4` | Contributors — feature branches and PRs only | +| `admin` | `5` | Repo admins — full access including `develop` | + +Assign roles under **Settings → Collaborators and teams**. Contributors should have **Write**; release maintainers **Maintain**; integration owners **Admin**. + +`OrganizationAdmin` is included as an emergency bypass for org owners (not used on personal forks). + +## Rulesets + +| File | Branches | Rules summary | +|------|----------|---------------| +| `main.json` | `main` + `main_*` | Update restricted to bypass actors; PR + CI required; no force-push or deletion | +| `develop.json` | `develop` | Update restricted to admins; CI required; no force-push or deletion | + +### `main` bypass actors + +| Actor | Mode | Effect | +|-------|------|--------| +| `RepositoryRole` maintain (`2`) | `pull_request` | Can merge PRs into `main` when checks pass | +| `RepositoryRole` admin (`5`) | `always` | Full bypass for hotfixes / break-glass | +| `OrganizationAdmin` | `always` | Org-owner bypass | + +### `develop` bypass actors + +| Actor | Mode | Effect | +|-------|------|--------| +| `RepositoryRole` admin (`5`) | `always` | Only admins can push to `develop` | +| `OrganizationAdmin` | `always` | Org-owner bypass | + +> **Note:** Admins with `bypass_mode: always` can push even when a status check is pending or failed. Run CI before pushing to `develop`, or merge via PR from a branch so checks gate the commit. + +## Required status checks + +Both rulesets require these checks from [`.github/workflows/ci.yml`](../workflows/ci.yml): + +| Check | Workflow job | What it covers | +|-------|--------------|----------------| +| `test` | `test` → `test-gate.yml` | Ruff, pytest, regression harness, wheel smoke, SARIF | +| `scoring-v2` | `scoring-v2` → `scoring-v2.yml` | v2 scoring tests + Spearman ρ ≥ 0.80 calibration gate | + +`main` uses `strict_required_status_checks_policy: true` so PR branches must be up to date before merge. + +## Changing access or checks + +1. Edit `bypass_actors` or `rules` in the JSON file. +2. Re-run `enable-branch-protection.sh`. +3. Update [CONTRIBUTING.md](../../CONTRIBUTING.md) and this README. + +If GitHub reports a missing check context, open a recent PR → **Checks** tab and copy the exact status names into `required_status_checks`. diff --git a/.github/rulesets/develop.json b/.github/rulesets/develop.json new file mode 100644 index 0000000..8ef9758 --- /dev/null +++ b/.github/rulesets/develop.json @@ -0,0 +1,52 @@ +{ + "name": "Protect develop", + "target": "branch", + "enforcement": "active", + "conditions": { + "ref_name": { + "include": ["refs/heads/develop"], + "exclude": [] + } + }, + "rules": [ + { + "type": "update", + "parameters": { + "update_allows_fetch_and_merge": true + } + }, + { + "type": "deletion" + }, + { + "type": "non_fast_forward" + }, + { + "type": "required_status_checks", + "parameters": { + "required_status_checks": [ + { + "context": "test" + }, + { + "context": "scoring-v2" + } + ], + "strict_required_status_checks_policy": false, + "do_not_enforce_on_create": true + } + } + ], + "bypass_actors": [ + { + "actor_id": 5, + "actor_type": "RepositoryRole", + "bypass_mode": "always" + }, + { + "actor_id": null, + "actor_type": "OrganizationAdmin", + "bypass_mode": "always" + } + ] +} diff --git a/.github/rulesets/main.json b/.github/rulesets/main.json index 1b68319..343fef5 100644 --- a/.github/rulesets/main.json +++ b/.github/rulesets/main.json @@ -1,25 +1,68 @@ { - "name": "Protect main", + "name": "Protect release branches", "target": "branch", "enforcement": "active", "conditions": { "ref_name": { - "include": ["~DEFAULT_BRANCH"], + "include": ["~DEFAULT_BRANCH", "refs/heads/main_*"], "exclude": [] } }, "rules": [ + { + "type": "update", + "parameters": { + "update_allows_fetch_and_merge": true + } + }, + { + "type": "deletion" + }, + { + "type": "non_fast_forward" + }, + { + "type": "pull_request", + "parameters": { + "required_approving_review_count": 0, + "dismiss_stale_reviews_on_push": false, + "require_code_owner_review": false, + "require_last_push_approval": false, + "required_review_thread_resolution": false, + "allowed_merge_methods": ["merge", "squash", "rebase"] + } + }, { "type": "required_status_checks", "parameters": { "required_status_checks": [ { "context": "test" + }, + { + "context": "scoring-v2" } ], - "strict_required_status_checks_policy": false + "strict_required_status_checks_policy": true, + "do_not_enforce_on_create": true } } ], - "bypass_actors": [] + "bypass_actors": [ + { + "actor_id": 2, + "actor_type": "RepositoryRole", + "bypass_mode": "pull_request" + }, + { + "actor_id": 5, + "actor_type": "RepositoryRole", + "bypass_mode": "always" + }, + { + "actor_id": null, + "actor_type": "OrganizationAdmin", + "bypass_mode": "always" + } + ] } diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index 6abfd37..a06a060 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -10,3 +10,6 @@ permissions: jobs: test: uses: ./.github/workflows/test-gate.yml + + scoring-v2: + uses: ./.github/workflows/scoring-v2.yml diff --git a/.github/workflows/scoring-v2.yml b/.github/workflows/scoring-v2.yml new file mode 100644 index 0000000..2566439 --- /dev/null +++ b/.github/workflows/scoring-v2.yml @@ -0,0 +1,21 @@ +name: scoring-v2 + +on: + push: + pull_request: + workflow_call: + +jobs: + scoring: + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v4 + - uses: astral-sh/setup-uv@v5 + - run: uv sync --group dev + - run: uv run pytest tests/scoring/ tests/test_attack_graph.py tests/test_cli_gates_v2.py tests/test_cli_report.py tests/test_analysis_output.py tests/test_html_report.py tests/test_governance.py tests/test_mcp_server.py tests/test_api_gate_violations.py tests/test_inventory_scan_all.py -v + - run: uv run pytest tests/test_scoring.py -v + - run: uv run python scripts/calibrate_scoring_weights.py --min-rho 0.80 + - run: uv build + - run: | + uv run python -c "from mcts.scoring.weights import load_weights; load_weights('manual_v1'); load_weights('weights_learned')" + uv run python -c "from mcts.scoring.corpus import load_corpus_stats; load_corpus_stats()" diff --git a/CHANGELOG.md b/CHANGELOG.md index e56d3fc..4d77bec 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -7,8 +7,27 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ## [Unreleased] +## [0.1.3] - 2026-06-12 + +### Added + +- **Scoring v2 (multi-factor risk)** — parallel `score_v2.absolute_risk` with factor classifiers, attack-chain multipliers, corpus-calibrated `security_score`, and explainable `top_contributors`; legacy `score.overall` unchanged (invariant I1) +- **Default dual scoring** — `--scoring both` is the default in CLI, API, and GitHub Action; opt out with `--scoring legacy` +- **v2 CI gates** — `--min-security-score`, `--max-absolute-risk`, `--max-risk-level`, `--min-category-score-v2`; API returns `gate_violations` and echoed `scoring_mode` +- **Dashboard v2** — absolute risk header, factor-axis radar, OWASP `category_scores_v2` tiles, dual-score glossary when `both` +- **Dashboard overview** — hero snapshot, issues/risk priority grid, quick-jump nav, plain-language zones (actions, risk breakdown, coverage, trends), and collapsible “How to read this report” guide for v2 and legacy scans +- **Scan history trend table** — dynamic columns (date, absolute risk, risk level, security score, issues, critical, high, legacy score) from `history.json`; records severity counts per run +- **SARIF `mcts/scoreV2`** — optional run properties; see [sarif-score-v2.md](docs/reporting/sarif-score-v2.md) for Code Scanning adoption +- **Calibration** — 11-server corpus, Spearman gate (ρ ≥ 0.80), `scripts/calibrate_scoring_weights.py`, packaged `scoring_v2_corpus_stats.json` +- **Docs** — [ADR-003](docs/analysis/adr-003-scoring-v2.md), [scoring-spec-v2](docs/reporting/scoring-spec-v2.md), [migration guide](docs/migration/scoring-v2.md) +- **Pentest** — `verdict` follows `score_v2.risk_level` when v2 scoring is enabled +- **CI** — `scoring-v2` workflow required on main CI (`ci.yml`) with Spearman ρ ≥ 0.80 gate + ### Fixed +- Pentest marks `attack_chains` as `skipped` (not `complete`) when zero MCP tools are discovered; `pentest_limits` on `PentestReport` records coverage (`static-only` vs `full`) ([#215](https://github.com/MCP-Audit/MCTS/issues/215), thanks [@sachinML](https://github.com/sachinML) — [PR #255](https://github.com/MCP-Audit/MCTS/pull/255)) +- Legacy security score card and gauge hidden when v2 scoring is active so the overview shows a single primary risk model +- v2 dimension radar uses relative normalization so spoke scale reflects dominant factors on each scan (not absolute corpus scale) - Reject invalid `--snapshot` JSON such as scan-report artifacts, empty tool lists, or tool rows without names before scan analysis starts. - Validate governance `--policy` files before scan execution so missing or invalid policy files fail before reports are written. - Fail `--auto` with a clear error when multiple MCP config files or entrypoint candidates are found instead of silently scanning the repo root. @@ -34,9 +53,15 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ### Changed +- **HTML dashboard layout** — equal-height side-by-side panels across overview, risk breakdown, and trends; scrollable overflow (280px cap) for trend history, risk contributors, and category health; overview issue/pass lists capped at six rows +- **Brand assets** — canonical `Logo 2.jpg` for terminal headers, HTML sidebar, and exports (replaces separate PNG/report variants) +- **Trend sparkline** — chart width follows container size with resize handling +- **Documentation** — added [Scoring developer guide](docs/reporting/scoring-guide.md) as single entry point; simplified glossary, getting started, and migration doc; synced architecture, CI, and [html-report](docs/reporting/html-report.md) docs for the reorganized dashboard - Print MCP Surface / Supply Chain / Dependency Hygiene breakdown when `--min-score` or `--ci` gate fails. - Validate resolvable live launch configuration before the consent gate on `mcts snapshot` and `mcts fuzz`. - **Doctor + MCP server startup hints** — `mcts doctor` now reports whether the optional `[mcp]` extra is installed, and `mcts-mcp` prints a direct install hint instead of a bare import failure when the extra is missing (#219). +- **GitHub issue templates** — structured bug, feature, security, and documentation forms aligned with `type:*` / `priority:P*` label taxonomy +- **Branch rulesets** — `main` + `main_*` release branches (maintainer merge) and admin-only `develop` integration branch ## [0.1.2] - 2026-06-10 diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 0855605..dc68b3f 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -49,7 +49,9 @@ flowchart TB INFO["MCPServerInfo\nmcp/models.py"] ANA["Analyzers\nanalyzers/*.py"] COMP["Compliance\ncompliance/checks.py"] - SCORE["Scoring\nscoring/engine.py"] + GRAPH["Attack graph\nscoring/graph.py"] + V1["Legacy score\nengine.py"] + V2["v2 score\nengine_v2.py"] OUT["ScanReport\nreporting/models.py"] TERM["Terminal / JSON / SARIF / HTML"] @@ -58,12 +60,14 @@ flowchart TB DISC --> INFO INFO --> ANA ANA --> COMP - COMP --> SCORE - SCORE --> OUT + COMP --> GRAPH + GRAPH --> V1 + V1 --> V2 + V2 --> OUT OUT --> TERM ``` -**Orchestrator:** `Scanner` in `src/mcts/core/scanner.py` wires discovery, the analyzer list, deduplication, compliance, and scoring. +**Orchestrator:** `Scanner` in `src/mcts/core/scanner.py` wires discovery, analyzers, compliance, attack graph, legacy scoring, and optional v2 scoring (`scoring_mode` default `both`). | Layer | Directory | Typical contribution | |-------|-----------|----------------------| @@ -71,7 +75,7 @@ flowchart TB | Discovery | `discovery/`, `mcp/client.py`, `probe/` | New languages, live/remote transport, inventory | | Analyzers | `analyzers/` | New security checks (subclass `BaseAnalyzer`) | | SAST / rules | `sast/`, `taxonomy/sigma/` | Tree-sitter taint, Semgrep rules, Sigma metadata | -| Scoring & reports | `scoring/`, `reporting/`, `report/` | Score formula, SARIF, HTML dashboard | +| Scoring & reports | `scoring/`, `governance/`, `reporting/`, `report/` | v1/v2 engines, corpus stats, gates, SARIF, HTML dashboard | | Tests | `tests/`, `tests/fixtures/regression/` | Unit tests, technique regression fixtures | **Adding an analyzer (common task):** @@ -81,7 +85,7 @@ flowchart TB 3. Add tests and, when applicable, a fixture under `tests/fixtures/regression/MCTS-T-*/`. 4. Document in [Security Checks](docs/analysis/security-checks.md) and assign a `technique_id`. -Full pipeline detail: [Architecture](docs/analysis/architecture.md) · [Extension points](docs/analysis/architecture.md#extension-points) +Full pipeline detail: [Architecture](docs/analysis/architecture.md) · [Scoring guide](docs/reporting/scoring-guide.md) · [Extension points](docs/analysis/architecture.md#extension-points) --- @@ -144,27 +148,34 @@ Use the repo templates when possible: [bug report](https://github.com/MCP-Audit/ ## Branch Protection -Pull requests to `main` require the **test** CI check to pass. +| Branch | Who can update | Policy | +|--------|----------------|--------| +| `main` (current release) | **Maintainers** (`maintain`) and **Admins** | PRs required; **`test`** + **`scoring-v2`** must pass; branch up to date; no force-push or deletion | +| `main_*` (pinned releases, e.g. `main_0.1.2`) | **Maintainers** and **Admins** | Same as `main` — for version-specific hotfix lines | +| `develop` (integration) | **Admins only** | **`test`** + **`scoring-v2`** must pass; no force-push or deletion; contributors land work via PRs from feature branches | + +Contributors typically have **Write** (feature branches only). Assign **Maintain** to release maintainers who merge into `main`. Assign **Admin** to owners who push integration work to `develop`. + +Definitions live in [`.github/rulesets/`](.github/rulesets/) (`main.json`, `develop.json`). See [rulesets README](.github/rulesets/README.md) for bypass actors and role IDs. ### Enable on GitHub (one-time, repo admin) -**Option A — Script** +**Option A — Script (recommended)** ```bash ./scripts/enable-branch-protection.sh MCP-Audit/MCTS ``` -The script is **idempotent**: re-running it updates the existing `Protect main` ruleset instead of creating duplicates. Use `--dry-run` to preview without applying changes. +The script is **idempotent**: re-running it updates existing rulesets (`Protect release branches`, `Protect develop`) instead of creating duplicates. Use `--dry-run` to preview without applying changes. **Option B — GitHub UI** 1. Go to **Settings → Rules → Rulesets → New branch ruleset** -2. Target: default branch (`main`) -3. Add rule: **Require status checks to pass** -4. Required check: `test` -5. Save and enable enforcement - -The ruleset definition lives in `.github/rulesets/main.json`. +2. Target: default branch (`main`) or `develop` +3. Add rules: **Restrict updates** (role bypass), **Require pull request** (`main` only), **Require status checks**, **Block force pushes**, **Restrict deletions** +4. Bypass actors: `main` → Maintain (PR merge) + Admin; `develop` → Admin only +5. Required checks: `test`, `scoring-v2` +6. Save and enable enforcement --- diff --git a/README.md b/README.md index c011f40..014ab0e 100644 --- a/README.md +++ b/README.md @@ -33,22 +33,21 @@ uv run mcts scan examples/vulnerable-mcp-server/server.py ``` $ mcts scan examples/vulnerable-mcp-server/server.py -[✓] Discovering tools... -[✓] Mapping permissions... -[✓] Detecting attack chains... -[✓] Generating report... - ==================== MCTS Security Report ==================== -Overall Score: 5/100 (CRITICAL) +Overall Score: 1/100 (CRITICAL) ← legacy (--min-score) Risk Index: 100/100 -Scoring basis: 3 Critical, 7 High, 2 Medium (12 scorable findings) +Scoring basis: 5 Critical, 11 High, 1 Medium (17 scorable findings) +Absolute Risk: 2260 (critical) ← v2 (--max-absolute-risk) +Security Score: 9/100 ← v2 benchmark Severity Summary Top Findings -● Critical 4 [1] CRITICAL Destructive tool: delete_all_users -● High 7 [2] CRITICAL Read → exfiltration attack chain possible -● Medium 2 ... +● Critical 5 [1] CRITICAL Destructive tool: delete_all_users +● High 11 [2] CRITICAL Read → exfiltration attack chain possible +● Medium 1 ... ``` +Two scores on one scan is normal — see the [scoring developer guide](docs/reporting/scoring-guide.md). + ## Problem @@ -113,13 +112,12 @@ MCTS is **alpha** software with a local-first MCP security pipeline — no cloud | Capability | How | |------------|-----| -| Risk scoring | Exponential 0–100 score, risk index, category breakdown | +| Risk scoring | Legacy + v2 by default — [developer guide](docs/reporting/scoring-guide.md) | | Compliance mapping | OWASP LLM Top 10 + OWASP MCP Top 10 (non-scoring meta-findings) | | Terminal UI | Rich dashboard — themes, progress, `--terminal-format` views | -| Export formats | JSON, SARIF (`--format sarif`), raw envelope, HTML (`mcts report`) | -| CI gates | `--fail-on-critical`, `--min-score`, `--max-critical`, `--fail-on-category` | -| CI preset | `--ci` unified gate bundle | -| Governance policies | `--policy` YAML allowlist and min-score gates | +| Export formats | JSON, SARIF, HTML (`mcts report`) | +| CI gates | Legacy (`--min-score`) and/or v2 (`--max-absolute-risk`) — [guide](docs/reporting/scoring-guide.md#ci-gates--pick-one-strategy) | +| Governance policies | `--policy` YAML (legacy + optional v2 fields) | | GitHub Action | JSON + SARIF + HTML artifacts ([`@v1`](action/README.md)) | | Preflight | `mcts doctor` — deps, extras, and config hints | @@ -214,12 +212,16 @@ The HTML report includes a dark-themed overview (score gauge, letter grade, seve ### CI gate (fail on critical or score) ```bash +# Legacy (unchanged) mcts scan ./server.py --fail-on-critical --min-score 70 -mcts scan . --fail-on-critical --min-score 70 + +# v2 (default scoring includes score_v2) +mcts scan ./server.py --fail-on-critical --max-absolute-risk 500 --max-risk-level high + mcts scan . -o report.sarif --format sarif ``` -See [docs/platform/ci-integration.md](docs/platform/ci-integration.md) and [action/README.md](action/README.md). +Gate cheat sheet: [scoring guide](docs/reporting/scoring-guide.md#ci-gates--pick-one-strategy) · [CI integration](docs/platform/ci-integration.md) · [GitHub Action](action/README.md) ### Themes @@ -241,7 +243,7 @@ uv run mcts scan ./server.py --theme minimal --no-progress (core checks always on; 20+ per scan; opt-in via flags) │ ▼ - Risk scoring engine + Legacy score (overall) + v2 score (absolute_risk) │ ┌─────────┼─────────┐ ▼ ▼ ▼ @@ -255,6 +257,7 @@ uv run mcts scan ./server.py --theme minimal --no-progress | I want to… | Guide | |------------|-------| +| Understand scores | **[Scoring developer guide](docs/reporting/scoring-guide.md)** | | Choose a scan mode | [Scanning overview](docs/scanning/README.md) | | Set up CI | [CI integration](docs/platform/ci-integration.md) | | Look up commands | [CLI reference](docs/platform/cli.md) | @@ -275,14 +278,14 @@ MCTS/ │ ├── vet/ # Pre-install package vetting (pypi/npm/oci) │ ├── pentest/ # Structured pentest runner │ ├── mcp_server/ # `mcts-mcp` stdio tools for IDE agents -│ ├── governance/ # YAML policy allowlist + min-score gates +│ ├── governance/ # YAML policy + scan_gates (legacy + v2) │ ├── readiness/ # Production readiness heuristics │ ├── api/ # FastAPI REST server │ ├── inventory/ # Client config + skills discovery │ ├── fuzz/ # Protocol fuzz runner │ ├── sast/ # Tree-sitter taint + Semgrep rule pack │ ├── taxonomy/ # MCTS-T techniques, Sigma rules -│ ├── scoring/ # Risk scoring engine +│ ├── scoring/ # Risk scoring v1 + v2 engines, corpus stats, attack-graph paths │ ├── compliance/ # OWASP & MCP compliance checks │ ├── reporting/ # ScanReport models, SARIF, HTML entry │ ├── report/ # HTML dashboard (templates, CSS, JS) @@ -323,7 +326,7 @@ MCTS is **MCP-boundary security** — tool metadata, schemas, handler source, cl | Trust registries | Cloud scan + reputation | MCTS is local-first; no account required for CI | | Runtime gateways | Runtime policy & governance | Different layer — MCTS scans before deploy; they enforce at runtime | -**Where MCTS leads today:** auditable exponential scoring, capability-graph attack chains, first-party MCTS-T taxonomy with bundled Sigma rules, executive HTML dashboard, readiness + OPA, YARA on metadata, line-jumping detection, Semgrep SAST adapter, LLM metadata triage, package vetting, MCP server mode (`mcts-mcp`), skills scanning, toxic-flow analysis, local-first default. +**Where MCTS leads today:** dual legacy + v2 multi-factor scoring (`absolute_risk`, factor radar, corpus-calibrated `security_score`), capability-graph attack chains, first-party MCTS-T taxonomy with bundled Sigma rules, executive HTML dashboard, readiness + OPA, YARA on metadata, line-jumping detection, Semgrep SAST adapter, LLM metadata triage, package vetting, MCP server mode (`mcts-mcp`), skills scanning, toxic-flow analysis, local-first default. **Highest-priority gaps:** deep multi-language CFG/taint, prompt firewall, CycloneDX AI-BOM export, runtime stdio proxy, remote protocol fuzz (`mcts fuzz --url`), scan history/trends, hallucinated package detection, full Agno multi-agent pentest. diff --git a/action/README.md b/action/README.md index 565a219..0fc0b50 100644 --- a/action/README.md +++ b/action/README.md @@ -46,10 +46,24 @@ jobs: 2. Runs `mcts scan` once on your target (JSON, SARIF, and HTML are derived from the same scan) 3. Writes `mcts-report.json`, `mcts-report.sarif`, and `mcts-report.html` to the workflow workspace 4. Uploads JSON, HTML, and SARIF as workflow artifacts -5. Fails the workflow if `fail-on-critical` or `min-score` thresholds are not met +5. Fails the workflow on gate violations — legacy (`fail-on-critical`, `min-score`) and/or v2 (`max-absolute-risk`, `max-risk-level`, `min-security-score`, `min-category-score-v2`) Upload SARIF to GitHub Code Scanning separately (see quick start) to show findings in the Security tab. +### v2 gate example + +```yaml +- uses: MCP-Audit/MCTS@v1 + with: + target: ./server.py + fail-on-critical: true + max-absolute-risk: "500" + max-risk-level: high + min-security-score: "40" +``` + +Scoring defaults to `both` — JSON and SARIF include `score_v2` without extra inputs. See [Scoring developer guide](../docs/reporting/scoring-guide.md#ci-gates--pick-one-strategy). + ### Installed capabilities (default extras) | Feature | Extra | Default action | @@ -82,7 +96,14 @@ If the action lives in your repo under `action/`: |-------|---------|-------------| | `target` | `./server.py` | Path to MCP server entrypoint or repo directory | | `fail-on-critical` | `true` | Fail workflow if any critical finding is detected | -| `min-score` | — | Fail if overall score is below this threshold (0–100) | +| `min-score` | — | Fail if legacy overall score is below this threshold (0–100) | +| `scoring` | `both` | `legacy`, `v2`, or `both` — enable multi-factor scoring | +| `min-security-score` | — | Fail if v2 benchmark security score is below threshold (requires `scoring: v2` or `both`) | +| `max-absolute-risk` | — | Fail if v2 absolute risk exceeds threshold | +| `max-risk-level` | — | Fail if v2 risk level exceeds band (`low` / `medium` / `high` / `critical`) | +| `min-category-score-v2` | — | Comma-separated v2 OWASP minimums (`injection:80,privilege:70`; 100=good) | +| `weights-profile` | `manual_v1` | v2 weights profile when `scoring` is `v2` or `both` | +| `assets-path` | — | Optional `.mcts/assets.yaml` for v2 asset-value overrides | | `extras` | `mcp,sast` | Comma-separated optional extras (`all` installs every extra) | --- @@ -101,5 +122,6 @@ If the action lives in your repo under `action/`: - [CI Integration](../docs/platform/ci-integration.md) — full CI patterns and gate examples - [CLI Reference](../docs/platform/cli.md) — all scan flags available locally -- [Scoring Specification](../docs/reporting/scoring-spec.md) — how scores are calculated +- [Scoring developer guide](../docs/reporting/scoring-guide.md) — start here (CI flags, two scores) +- [Scoring spec v2](../docs/reporting/scoring-spec-v2.md) — technical reference - [Documentation index](../docs/index.md) diff --git a/action/action.yml b/action/action.yml index 1737d70..2a2ab1a 100644 --- a/action/action.yml +++ b/action/action.yml @@ -14,7 +14,37 @@ inputs: required: false default: "true" min-score: - description: Fail if security score is below this value (0-100). Leave empty to skip. + description: Fail if legacy security score is below this value (0-100). Leave empty to skip. + required: false + default: "" + scoring: + description: Scoring mode — legacy, v2, or both (default both) + required: false + default: "both" + min-security-score: + description: Fail if v2 benchmark security score is below this value (requires scoring v2 or both) + required: false + default: "" + max-absolute-risk: + description: Fail if v2 absolute risk exceeds this value (requires scoring v2 or both) + required: false + default: "" + max-risk-level: + description: Fail if v2 risk level exceeds this band (low, medium, high, critical) + required: false + default: "" + min-category-score-v2: + description: > + Comma-separated v2 OWASP category minimums (category:min, 100=good). + Example injection:80,privilege:70 + required: false + default: "" + weights-profile: + description: v2 weights profile (default manual_v1) + required: false + default: "manual_v1" + assets-path: + description: Optional .mcts/assets.yaml path for v2 asset-value overrides required: false default: "" extras: @@ -72,6 +102,33 @@ runs: if [ -n "${{ inputs.min-score }}" ]; then ARGS+=(--min-score "${{ inputs.min-score }}") fi + if [ -n "${{ inputs.scoring }}" ] && [ "${{ inputs.scoring }}" != "legacy" ]; then + ARGS+=(--scoring "${{ inputs.scoring }}") + fi + if [ -n "${{ inputs.min-security-score }}" ]; then + ARGS+=(--min-security-score "${{ inputs.min-security-score }}") + fi + if [ -n "${{ inputs.max-absolute-risk }}" ]; then + ARGS+=(--max-absolute-risk "${{ inputs.max-absolute-risk }}") + fi + if [ -n "${{ inputs.max-risk-level }}" ]; then + ARGS+=(--max-risk-level "${{ inputs.max-risk-level }}") + fi + if [ -n "${{ inputs.min-category-score-v2 }}" ]; then + IFS=',' read -ra V2_CAT_GATES <<< "${{ inputs.min-category-score-v2 }}" + for gate in "${V2_CAT_GATES[@]}"; do + trimmed="$(echo "$gate" | xargs)" + if [ -n "$trimmed" ]; then + ARGS+=(--min-category-score-v2 "$trimmed") + fi + done + fi + if [ -n "${{ inputs.weights-profile }}" ] && [ "${{ inputs.weights-profile }}" != "manual_v1" ]; then + ARGS+=(--weights "${{ inputs.weights-profile }}") + fi + if [ -n "${{ inputs.assets-path }}" ]; then + ARGS+=(--assets-path "${{ inputs.assets-path }}") + fi uv run mcts "${ARGS[@]}" cp "$REPO_ROOT/mcts_analysis/scan-report.sarif" "$SARIF_OUT" diff --git a/docs/README.md b/docs/README.md index f7c90c1..a3fee68 100644 --- a/docs/README.md +++ b/docs/README.md @@ -2,8 +2,14 @@ > **Start here:** [Documentation index](index.md) -If you are new to MCTS, open **[Install and first scan](get-started/getting-started.md)** (~15 min). Everything else is linked from the [index](index.md). +## New developer (15 min) -**Quick links:** [Which scan mode?](scanning/README.md#which-scan-mode-should-i-use) · [CLI reference](platform/cli.md) · [Glossary](glossary.md) +1. **[Install and first scan](get-started/getting-started.md)** — run one scan, read the report +2. **[Scoring developer guide](reporting/scoring-guide.md)** — if two scores or CI gates are confusing (most people need this once) +3. **[CI integration](platform/ci-integration.md)** — when you wire a pipeline + +Everything else is linked from the [index](index.md) by task. + +**Quick links:** [Which scan mode?](scanning/README.md#which-scan-mode-should-i-use) · [CLI reference](platform/cli.md) · [Glossary](glossary.md) · [Security checks](analysis/security-checks.md) Planning and gap docs live under [more/](more/README.md) — skip them unless you are contributing to MCTS. diff --git a/docs/analysis/README.md b/docs/analysis/README.md index a2d391a..bdefa72 100644 --- a/docs/analysis/README.md +++ b/docs/analysis/README.md @@ -13,7 +13,7 @@ How MCTS **examines** discovered MCP surfaces and **produces findings**. | What does this finding mean? | [Security checks reference](security-checks.md) | | How does the pipeline work? | [Architecture](architecture.md) | | How do I add an analyzer? | [Architecture — Extension points](architecture.md#extension-points) or [CONTRIBUTING.md](../../CONTRIBUTING.md) | -| Why did my scan score this way? | [Scoring spec](../reporting/scoring-spec.md) | +| Why did my scan score this way? | **[Scoring developer guide](../reporting/scoring-guide.md)** | --- diff --git a/docs/analysis/adr-003-scoring-v2.md b/docs/analysis/adr-003-scoring-v2.md new file mode 100644 index 0000000..477bde0 --- /dev/null +++ b/docs/analysis/adr-003-scoring-v2.md @@ -0,0 +1,36 @@ +# ADR-003: MCTS Risk Score v2 + +**Status:** Accepted +**Date:** 2026-06-11 +**Spec:** [scoring-spec-v2.md](../reporting/scoring-spec-v2.md) + +## Context + +Legacy scoring (`score.overall`) uses severity-only exponential decay. Clients need explainable, stable absolute risk with factor breakdowns and attack-chain amplification without double-counting chain meta-findings. + +## Decisions + +| Topic | Choice | +|-------|--------| +| Dual score in CI | `--min-score` stays on legacy `overall` until v2.2 | +| `scoring_mode="v2"` | Runs **both** engines: legacy `score` + `score_v2` | +| Chain meta-findings in v2 sum | **Exclude** — `attack_chains` in `NON_SCORING_V2` | +| Chain multiplier | `paths_v1` tool correlation on validated paths (`medium+` severity) | +| `hop_count` | `len(path_nodes) - 1` on edge-validated paths | +| Analyzer when v2 on | Always run `AttackChainAnalyzer`; bypass `--analyzers` / `--surfaces` | +| `chain_factor` gating | `enable_attack_chains` / `--no-attack-chains` sets `chain_factor_mode: disabled` | +| `weights_hash` | `ScoreV2Basis.weights_hash` only — not on `RiskScoreV2` | +| API score gates | CLI enforces exit codes; API returns `gate_violations` array without HTTP gate exit (v2.0) | +| Canonical graph | `scoring/graph.py` owns paths; `report/data.build_attack_graph()` delegates | +| Fake path rejection | BFS returns `None` when disconnected — never `[start, end]` | +| Model location | v2 types in `scoring/models.py`; `ScanReport` imports `RiskScoreV2` | +| `dimension_scores` | RFC factor axes only; OWASP in `category_scores_v2()` (PR-4d) | +| Bracket formula | `1 + Σ factor_increments` — no YAML bracket double-weight | +| Confidence | Affects `confidence_score` / `risk_range` only — never `absolute_risk` | + +## Consequences + +- `ScanReport.score` remains always populated (backward compatible). +- `ScanReport.score_v2` is additive when v2/both is enabled. +- Under v2/both, attack chains analyzer always runs; `--no-attack-chains` disables multiplier only. +- Legacy and v2 scores may diverge on the same scan — expected (different formulas and scorable sets). diff --git a/docs/analysis/architecture.md b/docs/analysis/architecture.md index 21ff37e..2e3b4f1 100644 --- a/docs/analysis/architecture.md +++ b/docs/analysis/architecture.md @@ -41,8 +41,8 @@ When you run `mcts scan ./server.py`: 1. **Discover** — Build an `MCPServerInfo` snapshot (tools, prompts, resources, handler source, repo markdown instructions, optional live schemas) 2. **Analyze** — Run security analyzers; each returns `Finding` objects 3. **Post-process** — Dedupe, enrich with MCTS-T IDs, append OWASP compliance meta-findings -4. **Score** — Compute 0–100 score (compliance findings excluded from score) -5. **Report** — Terminal UI, JSON, SARIF, or HTML via `mcts report` +4. **Score** — Legacy 0–100 `score.overall` (always) plus v2 `score_v2` when `scoring_mode` is `v2` or `both` (default); compliance excluded from both sums; `attack_chains` meta-rows excluded from v2 only +5. **Report** — Terminal UI, JSON, SARIF (incl. `mcts/scoreV2`), or HTML via `mcts report` **Orchestrator:** `Scanner` in `src/mcts/core/scanner.py` **Config:** `ScanConfig` in `src/mcts/core/config.py` @@ -72,21 +72,24 @@ flowchart LR ANA["Analyzers"] DEDUPE["Dedupe + enrich"] COMP["Compliance OWASP"] - ANA --> DEDUPE --> COMP + GRAPH["Attack graph + scan scope"] + ANA --> DEDUPE --> COMP --> GRAPH end subgraph output [Output] - SCORE["RiskScoringEngine"] + V1["RiskScoringEngine (legacy)"] + V2["RiskScoringEngineV2 (optional)"] REP["ScanReport"] OUT["Terminal · JSON · SARIF · HTML"] - SCORE --> REP --> OUT + GRAPH --> V1 + V1 --> V2 + V2 --> REP --> OUT end CLI --> CFG --> STATIC CFG --> LIVE CFG --> SNAP MERGE --> ANA - COMP --> SCORE ``` ASCII equivalent: @@ -101,7 +104,13 @@ ScanConfig ──► Discovery (static / live / snapshot) ──► MCPServerInf filters → dedupe → enrich (MCTS-T) → compliance │ ▼ - RiskScoringEngine → ScanReport → outputs + attack_graph + scan_scope (paths when v2/both) + │ + ▼ + RiskScoringEngine (always) → RiskScoringEngineV2 (v2/both) + │ + ▼ + ScanReport → terminal · JSON · SARIF · HTML ``` --- @@ -169,13 +178,18 @@ Optional: `probe_protocol_security()` when `--protocol-probe` + `--url`. | Enrich | `enrich_findings()` | Attach `technique_id`, `mitigation_ids`, crosswalk evidence | | Compliance | `ComplianceChecker.check()` | OWASP LLM + MCP meta-findings (**non-scoring**) | -### 5. Score and verify +### 5. Attack graph and scan scope + +Before scoring: `attack_graph` (with `paths` when chains ran) and `scan_scope` are set. Under v2/both, `AttackChainAnalyzer` always runs (whitelist/surface bypass). -`RiskScoringEngine.score()` → `ScoreBasis`; `verify()` asserts score matches findings (regression guard). +### 6. Score and verify -### 6. Build `ScanReport` +1. `RiskScoringEngine.score()` → legacy `ScoreBasis`; `verify()` regression guard (always). +2. When `scoring_mode` is `v2` or `both`: `build_scoring_context()` → `RiskScoringEngineV2.score()` → optional `score_v2`; `verify()` on deterministic core. -Includes `attack_graph` from `AttackChainAnalyzer`, partitioned `score_breakdown`, scan scope notes, and `analyzers_executed` audit list. +### 7. Build `ScanReport` + +Includes canonical `attack_graph`, optional `score_v2`, partitioned legacy `score_breakdown`, scan scope notes, and `analyzers_executed` audit list. Optional: `--save-baseline` writes tool metadata snapshot for rug-pull detection on future scans. @@ -290,7 +304,7 @@ See [Analyzers](#analyzers) below. ### Scoring (`scoring/`) -Exponential decay formula; compliance excluded. Details: [Scoring spec](../reporting/scoring-spec.md). +Legacy exponential decay (`engine.py`); v2 multi-factor engine (`engine_v2.py`, `graph.py`, `chains.py`, packaged corpus stats). Compliance excluded from both; `attack_chains` meta-rows excluded from v2 sum. Details: [Scoring spec](../reporting/scoring-spec.md) · [Scoring v2](../reporting/scoring-spec-v2.md). ### Reporting (`reporting/`, `report/`, `ui/`) @@ -379,24 +393,64 @@ Used by `behavioral_static`. Python AST taint + optional tree-sitter for TS/Go/R `capability/inferrer.py` assigns per-tool flags (`reads_untrusted_input`, `egresses_network`, `executes_commands`, …). BFS finds paths like read → exfiltrate. Graph stored on `ScanReport.attack_graph`. +When `scoring_mode` is `v2` or `both`, paths are built at scan time via `scoring/graph.build_paths()` and stored on the canonical graph: + +```json +{ + "nodes": [{"id": "read_file", "label": "read_file", "type": "tool"}], + "edges": [{"from": "read_file", "to": "send_webhook", "label": "read→exfil"}], + "paths": [{ + "id": "path-chain-credential-theft-2", + "nodes": ["read_file", "get_env", "send_webhook"], + "tools_on_path": ["read_file", "get_env", "send_webhook"], + "hop_count": 2, + "finding_ids": ["chain-credential-theft"] + }] +} +``` + +`hop_count` is validated edge hops only (`len(nodes) - 1`). Scanner, v2 engine, and HTML dashboard all use `canonical_attack_graph(report)` (invariant I3/I11). + --- ## Scoring and reporting +### Legacy engine (`scoring/engine.py`) + +Always runs. Populates `ScanReport.score` (invariant I1). + | Metric | Formula | Notes | |--------|---------|-------| | Raw risk | C×25 + H×10 + M×3 + L×1 | Linear weighted sum | | Overall score | `round(100 × e^(-raw/50))` | Higher is better | | Risk index | `min(100, raw_risk)` | Higher is worse | -`compliance` analyzer findings are **informational only** — they do not affect score. +`compliance` analyzer findings are **informational only** — they do not affect legacy or v2 sums. + +### v2 engine (`scoring/engine_v2.py`) + +Runs when `scoring_mode` is `v2` or `both` (default). Populates `ScanReport.score_v2`. + +Pipeline order (PR-1e): analyzers → compliance → **attack graph + scan scope** → legacy score → `build_scoring_context()` → v2 score. Canonical graph stored on report (I11). + +| Output | Notes | +|--------|-------| +| `absolute_risk` | Multi-factor bracket sum × `chain_factor` on tool-attributed findings | +| `security_score` | Corpus percentile (packaged `scoring_v2_corpus_stats.json`) | +| `dimension_scores` | Eight RFC factor axes for radar chart | +| `top_contributors` | Finding + attack-chain explainability rows | +| `category_scores_v2` | OWASP tiles (100=good), separate from legacy categories | + +`attack_chains` meta-findings appear in the report but are **excluded** from v2 sum (`NON_SCORING_V2`). Chain signal is `chain_factor` on tool rows via `scoring/chains.py` and `scoring/graph.py`. + +Gates: `governance/scan_gates.py` (CLI exit codes + API `gate_violations`). Docs: [Scoring developer guide](../reporting/scoring-guide.md) · [v2 spec](../reporting/scoring-spec-v2.md). -Outputs: +### Report outputs -- **Terminal** — Rich dashboard (`ui/`) -- **JSON** — full `ScanReport` dump -- **SARIF** — `--format sarif` for GitHub Code Scanning -- **HTML** — `mcts report` executive dashboard +- **Terminal** — Rich dashboard (`ui/`) — legacy + v2 lines when `both` +- **JSON** — full `ScanReport` with optional `score_v2` +- **SARIF** — `--format sarif`; run-level `mcts/scoreV2` when v2 present +- **HTML** — `mcts report` executive dashboard with v2 primary header --- @@ -409,7 +463,7 @@ These share discovery/models but use separate entry paths: | `mcts fuzz` | `fuzz/` | Protocol probes → `runtime_events` JSON | | `mcts inventory` | `inventory/` | Client config discovery; feeds cross-server / toxic-flow analyzers | | `mcts vet` | `vet/` | Pre-install PyPI/npm/OCI checks | -| `mcts pentest` | `pentest/` | Structured recon + attack chains | +| `mcts pentest` | `pentest/` | Structured recon + attack chains; `absolute_risk` + v2 `risk_level` verdict when v2/both | | `mcts readiness` | `readiness/` | HEUR-001–020 (separate from security score) | | `mcts serve` | `api/` | REST wrapper around `Scanner` | @@ -427,7 +481,7 @@ src/mcts/ ├── analyzers/ # Security checks (subclass BaseAnalyzer) ├── sast/ # Taint analysis + Semgrep rule pack ├── capability/ # Tool capability profiles -├── scoring/ # RiskScoringEngine, category partitions +├── scoring/ # engine.py (v1), engine_v2.py, graph.py, chains.py, corpus stats ├── compliance/ # OWASP mapping (non-scoring) ├── taxonomy/ # MCTS-T/M, Sigma, crosswalk, enrichment ├── reporting/ # Pydantic models, SARIF @@ -437,7 +491,7 @@ src/mcts/ ├── fuzz/ # Fuzz runner ├── vet/ # Package vetting ├── pentest/ # Pentest phases -├── governance/ # YAML policy gates +├── governance/ # policy.py, scan_gates.py (legacy + v2 YAML/CLI gates) ├── readiness/ # Production heuristics + OPA ├── api/ # FastAPI (mcts serve) ├── mcp_server/ # mcts-mcp stdio tools @@ -493,7 +547,7 @@ Contributor quick start: [CONTRIBUTING.md](../../CONTRIBUTING.md#quick-start-for | Symptom | Where to look | |---------|---------------| | No tools discovered | Discovery logs; try `--auto`; check `--languages`, exclude dirs | -| Score seems wrong | `score.basis` in JSON; compliance findings are non-scoring | +| Score seems wrong | Legacy: `score.basis` in JSON. v2: `score_v2.basis` + `top_contributors`. Compliance non-scoring; `attack_chains` meta-rows excluded from v2 only. Dual scores diverging is expected — see [Scoring developer guide](../reporting/scoring-guide.md). | | Analyzer missing from report | `analyzers_executed` on `ScanReport`; check `--analyzers` subset and opt-in flags | | Live scan incomplete | `discovery_warnings` → `live_discovery` findings; `--strict-live` | | False positive | Analyzer module + fixture in `tests/fixtures/regression/` | @@ -513,7 +567,8 @@ uv run pytest tests/fixtures/regression/ -q # if applicable ## Related - [Security checks reference](security-checks.md) — what each analyzer looks for -- [Scoring specification](../reporting/scoring-spec.md) +- [Scoring specification](../reporting/scoring-spec.md) (legacy) +- [Scoring v2](../reporting/scoring-spec-v2.md) · [Migration](../migration/scoring-v2.md) - [Threat taxonomy](../reporting/taxonomy.md) - [CLI reference](../platform/cli.md) - [CONTRIBUTING.md](../../CONTRIBUTING.md) diff --git a/docs/analysis/security-checks.md b/docs/analysis/security-checks.md index 5808736..601ad48 100644 --- a/docs/analysis/security-checks.md +++ b/docs/analysis/security-checks.md @@ -28,10 +28,12 @@ Some checks are separate from the main scan: ## How checks run ``` -Discovery → MCPServerInfo → analyzers → enrich (MCTS-T) → score → report - ↘ compliance (non-scoring) +Discovery → MCPServerInfo → analyzers → enrich (MCTS-T) → compliance (non-scoring) + → attack graph + scan scope → legacy score → score_v2 (when v2/both) → report ``` +Under `--scoring v2|both`, `attack_chains` meta-findings appear in the report and HTML but are **excluded** from the v2 sum; chain signal applies via `chain_factor` on tool-attributed findings. Legacy `score.overall` still includes chain meta-rows in its scorable set. + | Layer | What is inspected | |-------|-------------------| | **Static** | Tool names, descriptions, JSON schemas, handler source, repo manifests | @@ -807,7 +809,7 @@ uv run mcts scan ./server.py -o report.json uv run mcts report report.json -o security-report.html ``` -**Demo server:** `examples/vulnerable-mcp-server/server.py` exercises permissions, injection, command execution, data leakage, and attack chains — expect score ~5/100 (CRITICAL). +**Demo server:** `examples/vulnerable-mcp-server/server.py` exercises permissions, injection, command execution, data leakage, and attack chains — expect legacy overall ~1/100 and v2 absolute risk ~2260 (see [scoring guide](../reporting/scoring-guide.md)). --- diff --git a/docs/contributing/issue-labeling.md b/docs/contributing/issue-labeling.md index 6eeee33..6c09064 100644 --- a/docs/contributing/issue-labeling.md +++ b/docs/contributing/issue-labeling.md @@ -17,6 +17,8 @@ This guide explains how to open, label, and track issues in [MCP-Audit/MCTS](htt - [Bug report](https://github.com/MCP-Audit/MCTS/issues/new?template=bug_report.yml) - [Feature request](https://github.com/MCP-Audit/MCTS/issues/new?template=feature_request.yml) +- [Security finding](https://github.com/MCP-Audit/MCTS/issues/new?template=security_finding.yml) — false positives/negatives and scoring accuracy in scan results +- [Documentation](https://github.com/MCP-Audit/MCTS/issues/new?template=documentation.yml) For security vulnerabilities in **MCTS itself**, follow [SECURITY.md](../../SECURITY.md) — do not file public issues for undisclosed vulns. diff --git a/docs/get-started/README.md b/docs/get-started/README.md index 7d4927a..dfd23db 100644 --- a/docs/get-started/README.md +++ b/docs/get-started/README.md @@ -19,6 +19,7 @@ That is all you need to begin. The [documentation index](../index.md) links to e | Next step | Guide | |-----------|-------| | Pick live vs remote vs snapshot | [Which scan mode?](../scanning/README.md#which-scan-mode-should-i-use) | +| Understand scores | [Scoring developer guide](../reporting/scoring-guide.md) | | Add MCTS to CI | [CI integration](../platform/ci-integration.md) | | Understand a finding | [Security checks](../analysis/security-checks.md) | | Share an HTML report | [HTML dashboard](../reporting/html-report.md) | diff --git a/docs/get-started/getting-started.md b/docs/get-started/getting-started.md index d8b7fc5..5591fe8 100644 --- a/docs/get-started/getting-started.md +++ b/docs/get-started/getting-started.md @@ -35,10 +35,12 @@ By the end of this guide you will: MCTS reads your server code (or connects to a running server), runs automated security checks, and produces: -- A **security score** from 0 to 100 (100 = no issues found) - A list of **findings** ranked by severity (Critical → Low) +- **Two scores by default** — legacy `score.overall` (0–100, higher = better) and v2 `score_v2.absolute_risk` (integer, higher = worse) - Exportable reports in JSON, SARIF, and HTML formats +**Scores confusing?** Read the **[Scoring developer guide](../reporting/scoring-guide.md)** (5 min) before diving into formulas. + For the full pipeline design, see [Architecture](../analysis/architecture.md). --- @@ -155,7 +157,7 @@ The repo includes demo servers you can scan immediately: | Path | What it demonstrates | Expected score | |------|---------------------|----------------| -| `examples/vulnerable-mcp-server/server.py` | Destructive tools, injection, attack chains | ~5/100 (CRITICAL) | +| `examples/vulnerable-mcp-server/server.py` | Destructive tools, injection, attack chains | Legacy ~1/100; v2 absolute risk ~2260 | | `examples/baseline-mcp-server/server.py` | Minimal, safe tool surface | ~100/100 | | `examples/medium-risk-mcp-server/server.py` | Moderate findings | ~67/100 | | `examples/live-mcp-server/server.py` | Live probe + fuzz tests | Varies | @@ -177,36 +179,32 @@ uv run mcts scan examples/vulnerable-mcp-server/server.py 1. **Discovery** — MCTS parses the Python file and finds all `@tool` handlers, their descriptions, input schemas, and handler source code 2. **Analysis** — 25+ security analyzers check for permissions, injection, secrets, command execution, attack chains, and more -3. **Scoring** — Findings are weighted by severity and converted to a 0–100 score +3. **Scoring** — Two engines run by default: legacy `score.overall` + v2 `score_v2` ([guide](../reporting/scoring-guide.md)) 4. **Report** — Results appear in your terminal ### Reading the output ```text -[✓] Discovering tools... -[✓] Mapping permissions... -[✓] Detecting attack chains... -[✓] Generating report... - ==================== MCTS Security Report ==================== -Overall Score: 5/100 (CRITICAL) +Overall Score: 1/100 (CRITICAL) Risk Index: 100/100 -Scoring basis: 3 Critical, 7 High, 2 Medium, 0 Low (12 scorable findings) +Scoring basis: 5 Critical, 11 High, 1 Medium, 0 Low (17 scorable findings) +Absolute Risk: 2260 (critical) +Security Score: 9/100 -● Critical 4 -● High 7 -● Medium 2 -● Low 0 +● Critical 5 +● High 11 +● Medium 1 ``` | Field | Meaning | |-------|---------| -| **Overall Score** | 0–100, higher is better. Below 50 is serious. | -| **Risk Index** | 0–100, higher is worse. Linear measure of total risk. | -| **Scoring basis** | How many findings at each severity level contributed to the score | -| **Severity counts** | Total findings including non-scoring compliance items | +| **Overall Score** | Legacy 0–100 (higher = better). Existing CI `--min-score` uses this. | +| **Absolute Risk** | v2 integer (higher = worse). Primary posture metric for new policies. | +| **Security Score** | v2 benchmark vs corpus (higher = better). **Not** the same as Overall Score. | +| **Severity counts** | Findings by level (compliance rows appear in reports but are excluded from score math) | -Scores are never hardcoded — the scanner verifies its math on every run. Details: [Scoring Specification](../reporting/scoring-spec.md). +**Two scores?** That is expected — legacy **1/100** and v2 **2260** measure different things. See the **[Scoring developer guide](../reporting/scoring-guide.md)** for which metric to use in CI. ### Scan a whole repository @@ -253,7 +251,7 @@ By default, every scan writes artifacts to **`mcts_analysis/`** in your project | `scan-report.json` | Full machine-readable report | | `scan-report.html` | Executive HTML dashboard (open directly) | | `scan-report.sarif` | GitHub Code Scanning upload | -| `history.json` | Score trend across runs | +| `history.json` | Score trend across runs (`scoring_version`, `absolute_risk` when v2) | Relative `-o` paths use the **basename only** under `mcts_analysis/` — e.g. `-o report.json` → `mcts_analysis/report.json`, not `./report.json`. @@ -356,7 +354,9 @@ Most users start with a **static scan** (`mcts scan ./server.py`). When you need ## CI gate -Fail your build when security thresholds aren't met: +Fail your build when security thresholds aren't met. + +**Existing pipelines (legacy — no change required):** ```bash uv run mcts scan ./server.py \ @@ -365,7 +365,17 @@ uv run mcts scan ./server.py \ -o report.json ``` -GitHub Action: [CI Integration](../platform/ci-integration.md) · [action/README.md](../../action/README.md) +**New policies (v2 gates — scoring is `both` by default):** + +```bash +uv run mcts scan ./server.py \ + --fail-on-critical \ + --max-absolute-risk 500 \ + --max-risk-level high \ + -o report.json +``` + +Gate cheat sheet: [Scoring developer guide](../reporting/scoring-guide.md#ci-gates--pick-one-strategy) · GitHub Action: [CI Integration](../platform/ci-integration.md) · [action/README.md](../../action/README.md) --- @@ -392,7 +402,7 @@ mcts scan . --auto --auto-server my-server -o report.json --html report.html | Exit code 2, "Live probing requires consent" | Missing consent flag | Add `--i-understand-live-risk` or `MCTS_LIVE_OK=1` | | Exit code 2, "Unknown format" | Invalid `--format` | Use `json` or `sarif` | | No tools discovered | Wrong target or empty repo | Point at server entrypoint; check `--languages` | -| Score seems wrong | Compliance findings in report | Only scorable analyzers affect score; check `score.basis` | +| Score seems wrong / two different numbers | Dual engines on default scans | Expected — see [Scoring guide](../reporting/scoring-guide.md); check `score.basis` and `score_v2.basis` | | `mcp` import error | Missing extra | `uv sync --extra mcp` or `uvx --from 'mcp-mcts[mcp]' mcts …` | | Remote scan fails | Missing consent or auth | `--i-understand-live-risk` + `--bearer-token` | | TS tools missing | Language filter | Use `--languages typescript` | @@ -403,6 +413,7 @@ mcts scan . --auto --auto-server my-server -o report.json --html report.html | I want to… | Guide | |------------|-------| +| Understand scores & CI gates | **[Scoring developer guide](../reporting/scoring-guide.md)** | | Pick live vs remote vs snapshot | [Which scan mode?](../scanning/README.md#which-scan-mode-should-i-use) | | See every CLI flag | [CLI reference](../platform/cli.md) | | Understand a finding | [Security checks](../analysis/security-checks.md) | diff --git a/docs/glossary.md b/docs/glossary.md index 5cf5ef3..84029af 100644 --- a/docs/glossary.md +++ b/docs/glossary.md @@ -50,13 +50,19 @@ Plain-language definitions for terms used throughout MCTS documentation. If you ## Scores and reports +**Start here:** [Scoring developer guide](reporting/scoring-guide.md) — explains the two engines, which metric to use, and CI flags. + | Term | What it means | |------|---------------| -| **Security score** | A number from 0 to 100 where **100 is best** (no issues) and lower scores mean more risk. Calculated from finding severities using a transparent formula. | -| **Risk index** | A number from 0 to 100 where **higher is worse**. A linear measure of total risk burden, capped at 100. | -| **Severity** | How serious a finding is: **Critical** (immediate danger), **High** (serious), **Medium** (needs attention), **Low** (minor). | -| **SARIF** | **Static Analysis Results Interchange Format** — a standard JSON format that GitHub Code Scanning and other tools can ingest to show findings in pull requests. | -| **HTML dashboard** | A self-contained web page generated by `mcts report` — suitable for sharing with security teams or leadership. Includes charts, findings table, and remediation advice. | +| **Legacy overall score** | `score.overall` — 0–100, **higher is better**. `--min-score` gates this. | +| **Absolute risk** | `score_v2.absolute_risk` — integer, **higher is worse**. v2 headline metric. | +| **Benchmark security score** | `score_v2.security_score` — 0–100 vs corpus, **higher is better**. Not the same as legacy overall. | +| **Risk level** | `score_v2.risk_level` — `low` / `medium` / `high` / `critical`. | +| **Risk index** | `score.risk_index` — legacy linear 0–100, higher is worse. | +| **scoring_mode** | `both` (default), `v2`, or `legacy`. | +| **Severity** | Critical / High / Medium / Low on each finding. | +| **SARIF** | Standard format for Code Scanning; optional `mcts/scoreV2` run properties. | +| **HTML dashboard** | Shareable report from `mcts report`. | --- @@ -83,7 +89,7 @@ Plain-language definitions for terms used throughout MCTS documentation. If you | **stdio** | Standard input/output — how MCTS talks to a local MCP server by launching it as a subprocess and communicating over pipes. | | **JSON-RPC** | The message format MCP uses for requests and responses between client and server. | | **JSON Schema** | A standard for describing the shape of JSON data — used to define tool input parameters. | -| **CI gate** | A check in your continuous integration pipeline that fails the build if security thresholds are not met (e.g. score below 70 or any critical finding). | +| **CI gate** | A check that fails the build (exit code 1) when thresholds are not met — legacy (`--min-score`, `--fail-on-category`) or v2 (`--max-absolute-risk`, `--min-security-score`, `--max-risk-level`, `--min-category-score-v2`). | | **Readiness** | Operational checks separate from security — whether a server is production-ready (logging, error handling, etc.). Run with `mcts readiness`. | --- diff --git a/docs/index.md b/docs/index.md index 6cc0b6e..77ea6cf 100644 --- a/docs/index.md +++ b/docs/index.md @@ -10,11 +10,20 @@ **New to MCTS?** Read one guide, run one command, done. -1. **[Install and first scan](get-started/getting-started.md)** — install, scan the example server, read the score, export HTML -2. Stuck on a term? **[Glossary](glossary.md)** +1. **[Install and first scan](get-started/getting-started.md)** — install, scan the example server, read the output, export HTML +2. **Two scores on the same scan?** **[Scoring developer guide](reporting/scoring-guide.md)** — 5 min, answers 90% of score questions +3. Stuck on a term? **[Glossary](glossary.md)** You do **not** need to read the CLI reference, architecture doc, or planning docs to get value from MCTS. +### Typical developer path + +``` +Install → first scan → scoring guide (if confused) → CI integration → done +``` + +Contributors add: [Architecture](analysis/architecture.md) → [CONTRIBUTING.md](../CONTRIBUTING.md). + --- ## I want to… @@ -30,7 +39,8 @@ Pick the task that matches what you are doing right now: | Scan a **hosted** URL | [Remote scanning](scanning/remote-scanning.md) — `--url` + auth | | Scan with **no network** (exported JSON) | [Static snapshot](scanning/static-snapshot.md) — `--snapshot` | | **Choose a scan mode** (decision tree) | [Which scan mode should I use?](scanning/README.md#which-scan-mode-should-i-use) | -| Fail CI on bad scores | [CI integration](platform/ci-integration.md) — `--fail-on-critical --min-score 70` | +| Understand scan scores | **[Scoring developer guide](reporting/scoring-guide.md)** — start here | +| Fail CI on bad scores | [CI integration](platform/ci-integration.md) — see scoring guide for gate cheat sheet | | Share results with leadership | [HTML report](reporting/html-report.md) — `mcts report report.json -o report.html` | | See what's installed on my machine | [Config inventory](scanning/inventory.md) — `mcts inventory --scan` | | Scan all local MCP configs | `mcts scan --machine-wide` — [CLI reference](platform/cli.md) | @@ -73,6 +83,7 @@ Three tiers — read top to bottom only as needed. | Which scan mode to use | [Scanning overview](scanning/README.md) | | Live / remote / snapshot / fuzz / inventory | [Scanning guides](scanning/README.md#guides) | | CI and GitHub Action | [CI integration](platform/ci-integration.md) | +| Understand scores | **[Scoring developer guide](reporting/scoring-guide.md)** | | HTML and SARIF reports | [Reporting overview](reporting/README.md) | ### Tier 2 — Reference (when you need details) @@ -81,7 +92,7 @@ Three tiers — read top to bottom only as needed. |-------|-------| | Every command and flag | [CLI reference](platform/cli.md) | | Every security check | [Security checks](analysis/security-checks.md) | -| How the score is calculated | [Scoring spec](reporting/scoring-spec.md) | +| Scoring (legacy + v2) | **[Scoring developer guide](reporting/scoring-guide.md)** → [legacy spec](reporting/scoring-spec.md) · [v2 spec](reporting/scoring-spec-v2.md) | | Technique IDs (MCTS-T-*) | [Threat taxonomy](reporting/taxonomy.md) | | REST API | [REST API](platform/rest-api.md) | | Term definitions | [Glossary](glossary.md) | @@ -104,7 +115,7 @@ Three tiers — read top to bottom only as needed. |------|------| | Developer (first time) | [Getting started](get-started/getting-started.md) → [Scanning overview](scanning/README.md) | | MCP server author | [Getting started](get-started/getting-started.md) → [Security checks](analysis/security-checks.md) | -| DevOps / CI | [CI integration](platform/ci-integration.md) → [Scoring spec](reporting/scoring-spec.md) | +| DevOps / CI | [Scoring developer guide](reporting/scoring-guide.md) → [CI integration](platform/ci-integration.md) | | Security engineer | [Architecture](analysis/architecture.md) → [Security checks](analysis/security-checks.md) | | Agent / platform team | [Inventory](scanning/inventory.md) → [CLI reference](platform/cli.md) | | Contributor | [CONTRIBUTING.md](../CONTRIBUTING.md) → [Quick start](../CONTRIBUTING.md#quick-start-for-first-time-contributors) | [Architecture](analysis/architecture.md) | diff --git a/docs/migration/scoring-v2.md b/docs/migration/scoring-v2.md new file mode 100644 index 0000000..433f0e3 --- /dev/null +++ b/docs/migration/scoring-v2.md @@ -0,0 +1,99 @@ +# Scoring v2 — migration & configuration + +> **New to scoring?** Start with the [Scoring developer guide](../reporting/scoring-guide.md) — it explains the two scores, CI flags, and JSON fields in plain language. + +This page covers **configuration and migration** details not repeated in the main guide. + +--- + +## Modes + +| `--scoring` | `score.overall` | `score_v2` in JSON | +|-------------|-----------------|-------------------| +| `both` (**default**) | Yes | Yes | +| `v2` | Yes | Yes | +| `legacy` | Yes | No | + +```bash +mcts scan # both (default) +mcts scan --scoring legacy # legacy only +``` + +--- + +## Governance policy (`.mcts/policy.yaml`) + +```yaml +# Legacy +min_score: 70 +max_critical: 0 + +# v2 (optional) +min_security_score: 50 +max_absolute_risk: 500 +max_risk_level: medium +min_category_score_v2: + injection: 80 + privilege: 70 +``` + +Use with `mcts scan --policy .mcts/policy.yaml`. + +--- + +## Asset overrides (`.mcts/assets.yaml`) + +Optional v2 `asset_value` tuning: + +```yaml +overrides: + customer_db: 0.9 + temp_cache: 0.2 +``` + +```bash +mcts scan --assets-path .mcts/assets.yaml +``` + +--- + +## History & trends + +`mcts_analysis/history.json` entries include: + +- `scoring_version` +- `absolute_risk`, `security_score`, `risk_level` (when v2 ran) + +Trend charts never mix legacy and v2 on the same Y-axis. + +--- + +## Machine-wide & inventory + +`mcts scan --machine-wide` and `mcts inventory --scan-all` add per-server v2 fields and `worst_absolute_risk` in summaries when v2 is enabled. + +--- + +## API notes + +- Request fields: `scoring_mode`, `weights_profile`, `corpus_stats_path`, `assets_path`, v2 gate fields +- Response: `gate_violations` array; HTTP 200 even when gates fail (use CLI for exit codes) + +See [REST API](../platform/rest-api.md). + +--- + +## Upgrading legacy-only CI + +1. **No rush** — `--min-score` still works on `score.overall`. +2. **Add v2 gate alongside** — e.g. `--max-absolute-risk` without removing `--min-score`. +3. **Tune thresholds** on your corpus servers (baseline vs vulnerable). +4. **Switch primary metric** when team is ready — v2.2+ may repoint default CI docs to `security_score`. + +--- + +## Related + +- [Scoring developer guide](../reporting/scoring-guide.md) +- [Scoring spec v2](../reporting/scoring-spec-v2.md) +- [ADR-003](../analysis/adr-003-scoring-v2.md) diff --git a/docs/more/feature-expansion-plan.md b/docs/more/feature-expansion-plan.md index ee78a36..bd21fd9 100644 --- a/docs/more/feature-expansion-plan.md +++ b/docs/more/feature-expansion-plan.md @@ -32,7 +32,7 @@ This is the **detailed implementation guide** for evolving MCTS from an alpha sc | **Discovery** | `discovery/*`, `mcp/client.py` | Multi-file Python + TypeScript static discovery; live stdio + HTTP/SSE merge | | **Analyzers** | `analyzers/*.py` | Metadata, SAST, 20+ runtime sub-detectors, Sigma, OAuth, supply chain | | **Attack chains** | `attack_chains.py` | Capability-graph BFS on per-tool profiles | -| **Scoring** | `scoring/engine.py` | Exponential decay + auditable `ScoreBasis` + `--fail-on-category` | +| **Scoring** | `scoring/engine.py`, `engine_v2.py`, `graph.py`, `chains.py` | Legacy exponential + v2 multi-factor (`absolute_risk`), corpus calibration, dual default `both` | | **Compliance** | `compliance/checks.py` | OWASP LLM meta-findings | | **CLI** | `cli/main.py` | `scan`, `report`, `inventory`, `fuzz`, `readiness`, `serve`, `vet`, `pentest`, `doctor`, `snapshot`, `scan-mcp`; `mcts-mcp` server mode | | **Terminal UI** | `ui/*` | Rich themes, progress, report renderer, `--terminal-format` | diff --git a/docs/more/planned-cli.md b/docs/more/planned-cli.md index 551e1db..cc06d46 100644 --- a/docs/more/planned-cli.md +++ b/docs/more/planned-cli.md @@ -43,6 +43,7 @@ From the [Feature Expansion Plan — CLI appendix](feature-expansion-plan.md#sca | `--skills` / SKILL.md scanning | GAP-029 | Shipped | | `--full-toxic-flows` TF codes | GAP-032 | Shipped | | `--ci` gate bundle | GAP-024 | Shipped | +| `--scoring v2\|both` + v2 gates | — | Shipped (default `both`) | | `--policy` governance YAML | GAP-222 | Shipped | | `--scan-all-users` multi-home | GAP-021 | P1 | | `--diff-base` git-scoped scan | GAP-010 | P1 | diff --git a/docs/more/product-positioning.md b/docs/more/product-positioning.md index d812068..b4bf957 100644 --- a/docs/more/product-positioning.md +++ b/docs/more/product-positioning.md @@ -18,7 +18,7 @@ Key properties: - **Runs locally** — no cloud account required for standard scans - **Works in CI** — SARIF output, score gates, published GitHub Action - **MCP-specific** — checks tool permissions, description poisoning, attack chains, and protocol behavior that general SAST tools miss -- **Transparent scoring** — auditable 0–100 score with clear pass/fail gates +- **Transparent scoring** — legacy 0–100 index plus v2 multi-factor `absolute_risk`, factor breakdown, and corpus-calibrated benchmark score ```bash mcts scan ./repo/ @@ -48,8 +48,8 @@ MCTS focuses on the **MCP boundary** — tool metadata, JSON schemas, handler so | Area | What MCTS provides | |------|-------------------| -| **CI adoption** | SARIF 2.1.0, `--min-score`, `--max-critical`, `--fail-on-category`, published GitHub Action `@v1` | -| **Risk intelligence** | Exponential security score, risk index, auditable `ScoreBasis`, seven category dimensions | +| **CI adoption** | SARIF 2.1.0 (incl. `mcts/scoreV2`), legacy + v2 gates, published GitHub Action `@v1` | +| **Risk intelligence** | Dual legacy + v2 scoring, factor-axis radar, `top_contributors`, attack-chain multiplier, auditable `ScoreBasis` | | **Threat model** | Capability-graph attack chains (read→exfil, read→exec), not keyword-only heuristics | | **Reporting** | Rich terminal UI (3 themes), executive HTML dashboard, OWASP LLM + MCP mapping, MCTS-T technique grid, capability matrix, attack graph, scan history trend | | **Taxonomy** | First-party `MCTS-T-*` techniques and `MCTS-M-*` mitigations on every finding | @@ -68,10 +68,14 @@ MCTS focuses on the **MCP boundary** — tool metadata, JSON schemas, handler so Fail PRs when critical findings exist or score drops below team threshold: ```bash +# Legacy gates mcts scan ./server.py --fail-on-critical --min-score 70 --max-critical 0 + +# v2 gates (scoring both is default) +mcts scan ./server.py --max-absolute-risk 500 --max-risk-level high ``` -Integrate via [CI Integration](../platform/ci-integration.md) or GitHub Action. +Integrate via [CI Integration](../platform/ci-integration.md) or GitHub Action. See [Scoring v2 migration](../migration/scoring-v2.md). ### 2. MCP server author review @@ -151,7 +155,8 @@ Run MCTS **in addition to** existing AppSec tooling on MCP server repositories. | Capability | Status | |------------|--------| | Capability-graph attack chains (BFS) | Shipped | -| Auditable exponential score + category gates | Shipped | +| Dual legacy + v2 scoring (`absolute_risk`, factor radar, corpus calibration) | Shipped | +| Auditable exponential score + legacy/v2 CI gates | Shipped | | MCTS-T taxonomy + bundled Sigma metadata rules | Shipped | | Executive HTML dashboard (local, no server) | Shipped | | MCTS-T full technique grid + capability matrix in HTML | Shipped | diff --git a/docs/more/roadmap.md b/docs/more/roadmap.md index 9232fd6..43a9534 100644 --- a/docs/more/roadmap.md +++ b/docs/more/roadmap.md @@ -16,7 +16,7 @@ MCTS aims to become the **default security tool for MCP servers** — the same w Today, MCTS identifies security issues across permissions, injection, tool abuse, data leakage, and attack chains. The next evolution adds deeper SAST, skills scanning, AI-BOM export, and runtime proxy capabilities. -**Operational docs (shipped features):** [Architecture](../analysis/architecture.md) · [CLI](../platform/cli.md) · [Scoring](../reporting/scoring-spec.md) · [CI](../platform/ci-integration.md) +**Operational docs (shipped features):** [Architecture](../analysis/architecture.md) · [CLI](../platform/cli.md) · [Scoring v2](../reporting/scoring-spec-v2.md) · [Migration](../migration/scoring-v2.md) · [CI](../platform/ci-integration.md) Status labels used throughout this document: @@ -40,6 +40,8 @@ Status labels used throughout this document: | Multi-step attack chain detection | Shipped (capability-graph BFS) | | Compliance checks (OWASP LLM Top 10) | Shipped | | Exponential risk scoring (score + risk index) | Shipped | +| Multi-factor scoring v2 (`absolute_risk`, factor radar, corpus calibration) | Shipped (default `both`) | +| v2 CI gates + governance policy fields | Shipped | | Terminal UI (Rich, themes, progress animation) | Shipped | | JSON reports | Shipped | | HTML security dashboard (`mcts report`) | Shipped | diff --git a/docs/platform/README.md b/docs/platform/README.md index a6a2931..fbb7639 100644 --- a/docs/platform/README.md +++ b/docs/platform/README.md @@ -37,9 +37,12 @@ Everything else (`vet`, `pentest`, `fuzz`, `mcts-mcp`, `serve`) is optional — # Daily development mcts scan ./server.py -# CI gate +# CI gate (legacy) mcts scan ./server.py --fail-on-critical --min-score 70 -o report.sarif --format sarif +# CI gate (v2 — scoring is both by default) +mcts scan ./server.py --max-absolute-risk 500 --max-risk-level high -o report.sarif --format sarif + # Share with stakeholders mcts scan ./server.py -o report.json && mcts report report.json -o report.html ``` diff --git a/docs/platform/ci-integration.md b/docs/platform/ci-integration.md index 755e229..be6ccf4 100644 --- a/docs/platform/ci-integration.md +++ b/docs/platform/ci-integration.md @@ -4,8 +4,20 @@ This guide shows how to run MCTS in your CI/CD pipeline — fail builds on security thresholds, upload SARIF to GitHub Code Scanning, and share HTML reports with your team. -> **Just want a quick gate?** Run `mcts scan ./server.py --fail-on-critical --min-score 70` -> **Want the GitHub Action?** See [GitHub Actions](#github-actions-published-action) below. +> **Which CI flags should I use?** [Scoring developer guide](../reporting/scoring-guide.md#ci-gates--pick-one-strategy) — legacy vs v2 cheat sheet +> **Quick legacy gate:** `mcts scan ./server.py --fail-on-critical --min-score 70` +> **Quick v2 gate:** `mcts scan ./server.py --fail-on-critical --max-absolute-risk 500 --max-risk-level high` +> **GitHub Action:** [below](#github-actions-published-action) + +### Pick a CI strategy + +| Strategy | When | Example | +|----------|------|---------| +| **A — Legacy only** | Existing pipelines; no policy change | `--fail-on-critical --min-score 70` | +| **B — v2 only** | New risk policies | `--max-absolute-risk 500 --max-risk-level high` | +| **C — Dual gates** | Transition period | `--min-score 70 --max-absolute-risk 500` | + +Default `--scoring both` means v2 fields are always in JSON/SARIF/HTML even when you only gate on legacy metrics. --- @@ -71,7 +83,7 @@ jobs: 3. Writes `mcts-report.json` and `mcts-report.sarif` 4. Runs `mcts report` → `mcts-report.html` 5. Uploads JSON/HTML as workflow artifacts -6. Respects `fail-on-critical` and `min-score` inputs +6. Respects legacy gates (`fail-on-critical`, `min-score`) and optional v2 gates (`scoring`, `min-security-score`, `max-absolute-risk`, `max-risk-level`, `min-category-score-v2`) Monorepo: `uses: ./action` Full reference: [action/README.md](../../action/README.md) @@ -82,7 +94,12 @@ Full reference: [action/README.md](../../action/README.md) |-------|---------|-------------| | `target` | `./server.py` | Scan target path | | `fail-on-critical` | `true` | Fail workflow on critical findings | -| `min-score` | — | Fail if score below threshold | +| `min-score` | — | Fail if legacy overall score below threshold | +| `scoring` | `both` | `legacy`, `v2`, or `both` | +| `min-security-score` | — | v2 benchmark gate | +| `max-absolute-risk` | — | v2 absolute risk ceiling | +| `max-risk-level` | — | v2 band gate (`low` … `critical`) | +| `min-category-score-v2` | — | Comma-separated `category:min` for v2 OWASP tiles | | `extras` | `mcp,sast` | Optional extras to install (`all` for full set) | --- @@ -113,7 +130,44 @@ mcts scan ./repo/ \ --fail-on-category execution:10 ``` -Category semantics: [Scoring Specification](../reporting/scoring-spec.md). +Category semantics: [Scoring Specification](../reporting/scoring-spec.md). Category gates apply to **legacy** v1 tiles only. + +### Scoring v2 gates + +Scans include `score_v2` by default (`scoring: both`). **Gates** on v2 fields are opt-in: + +```bash +mcts scan ./server.py \ + --scoring v2 \ + --max-absolute-risk 500 \ + --max-risk-level high \ + --min-security-score 40 \ + -o report.json +``` + +| Flag | Metric | +|------|--------| +| `--scoring v2\|both` | Enables `score_v2` in report JSON | +| `--min-score` | Legacy `score.overall` only (unchanged) | +| `--min-security-score` | v2 benchmark percentile score | +| `--max-absolute-risk` | v2 stable integer risk sum | +| `--max-risk-level` | v2 band (`low` < `medium` < `high` < `critical`) | +| `--min-category-score-v2` | v2 OWASP tile minimum (100=good) | + +GitHub Action equivalents: `scoring`, `min-security-score`, `max-absolute-risk`, `max-risk-level`, `min-category-score-v2` inputs. + +**v2 Action example:** + +```yaml +- uses: MCP-Audit/MCTS@v1 + with: + target: ./server.py + fail-on-critical: true + max-absolute-risk: "500" + max-risk-level: high +``` + +See [Scoring developer guide](../reporting/scoring-guide.md), [migration](../migration/scoring-v2.md), and [SARIF scoreV2](../reporting/sarif-score-v2.md). ### SARIF for code scanning @@ -295,7 +349,8 @@ See [Planned CLI flags](../more/planned-cli.md) and [Roadmap Phase 2](../more/ro ## Related +- **[Scoring developer guide](../reporting/scoring-guide.md)** — gate cheat sheet (read first) - [CLI Reference](cli.md) -- [Scoring Spec](../reporting/scoring-spec.md) +- [GitHub Action](../../action/README.md) - [Live Scanning](../scanning/live-scanning.md) - [Roadmap — GitHub Action](../more/roadmap.md#2-github-action) diff --git a/docs/platform/cli.md b/docs/platform/cli.md index 6054d3e..407af2e 100644 --- a/docs/platform/cli.md +++ b/docs/platform/cli.md @@ -5,6 +5,7 @@ Complete reference for every MCTS command and flag. Use this when you need to look up a specific option or understand exit codes. > **New to MCTS?** Start with [Getting Started](../get-started/getting-started.md) — you don't need this full reference yet. +> **Confused by two scores or CI gates?** [Scoring developer guide](../reporting/scoring-guide.md) — read before memorizing flags. > **Choosing a scan mode?** See [Which scan mode should I use?](../scanning/README.md#which-scan-mode-should-i-use). > **Unfamiliar with a term?** See the [Glossary](../glossary.md). @@ -84,11 +85,19 @@ When `-o` is set, format determines serialization. SARIF uses `reporting/sarif.p | Flag | Default | Description | |------|---------|-------------| | `--fail-on-critical` | false | Exit **1** if any critical finding | -| `--min-score` | — | Exit **1** if `score.overall` < N (0–100) | +| `--min-score` | — | Exit **1** if legacy `score.overall` < N (0–100) | | `--max-critical` | — | Exit **1** if critical count > N | -| `--fail-on-category` | — | Repeatable. Format: `category:limit`. Exit **1** when category score ≥ limit | - -Valid category keys: `permissions`, `injection`, `execution`, `data_leakage`, `attack_chains`, `shadowing`, `jailbreak`. See [Scoring Specification](../reporting/scoring-spec.md). +| `--fail-on-category` | — | Repeatable. Format: `category:limit`. Exit **1** when **legacy** category score ≥ limit | +| `--scoring` | `both` | `legacy`, `v2`, or `both` — enable multi-factor scoring | +| `--min-security-score` | — | Exit **1** if v2 benchmark security score < N (requires `--scoring v2` or `both`) | +| `--max-absolute-risk` | — | Exit **1** if v2 `absolute_risk` > N (requires `--scoring v2` or `both`) | +| `--max-risk-level` | — | Exit **1** if v2 `risk_level` exceeds band (`low` < `medium` < `high` < `critical`) | +| `--min-category-score-v2` | — | Repeatable. Format: `category:min`. Exit **1** when v2 OWASP tile score < min (100=good) | +| `--weights` | `manual_v1` | v2 weights profile name | +| `--corpus-stats-path` | packaged default | Override corpus stats JSON for v2 percentile scoring | +| `--no-attack-chains` | false | Disable v2 **chain multiplier** only (`chain_factor_mode: disabled`). Under `--scoring v2\|both` the attack chains analyzer still runs for graph + meta-findings. Use `--scoring legacy` to omit chain meta-findings entirely. | + +Valid **legacy** category keys: `permissions`, `injection`, `execution`, `data_leakage`, `attack_chains`, `shadowing`, `jailbreak`. Category gates apply to v1 tiles only — not `category_scores_v2`. See [Scoring developer guide](../reporting/scoring-guide.md). ### Terminal UI flags @@ -176,12 +185,14 @@ OAuth client credentials: set via config JSON or env (`oauth_token_url`, `oauth_ ### Scoring output -Each scan prints: +Default (`--scoring both`) prints legacy and v2 lines: -- **Overall Score** — 0–100, higher is better (`100 × e^(-raw_risk/50)`) -- **Risk Index** — 0–100, higher is worse (`min(100, raw_risk)`) -- **Scoring basis** — severity counts; compliance excluded -- **Category breakdown** — per-dimension risk bars +- **Overall Score** — legacy 0–100, higher is better (`100 × e^(-raw_risk/50)`) +- **Absolute risk / risk level** — v2 multi-factor integer and band (when `score_v2` present) +- **Security score (v2)** — corpus benchmark percentile when packaged stats available +- **Risk Index** — legacy 0–100, higher is worse (`min(100, raw_risk)`) +- **Scoring basis** — legacy severity counts; compliance excluded +- **Category breakdown** — legacy per-dimension risk bars; v2 OWASP tiles in JSON/HTML when enabled ### Examples @@ -399,6 +410,10 @@ mcts pentest ./repo --json -o pentest-report.json Exit **0** on pass/medium verdict; **1** on critical/high; **2** on errors. +When `--scoring v2` or `both` and `score_v2` is present, **verdict** uses v2 `risk_level` instead of legacy `score.overall` bands. `absolute_risk` is always included on the pentest JSON when v2 ran. + +**Static-only coverage:** when static discovery finds **zero MCP tools** (e.g. prompt-only servers), the `attack_chains` phase is marked `skipped` in the JSON report. Check `pentest_limits.coverage` (`static-only` vs `full`) and `pentest_limits.attack_chains_available` to see what ran. + --- ## `mcts fuzz` @@ -440,7 +455,7 @@ See [Protocol Fuzzing](../scanning/fuzzing.md). | **1** | Gate failure; or critical/high fuzz/inventory findings | | **2** | Usage error, missing consent, probe/fuzz failure, invalid theme/format | -Gate failures (`scan` only): `--fail-on-critical`, `--min-score`, `--max-critical`, `--fail-on-category`. +Gate failures (`scan` only): `--fail-on-critical`, `--min-score`, `--max-critical`, `--fail-on-category` (legacy); `--min-security-score`, `--max-absolute-risk`, `--max-risk-level`, `--min-category-score-v2` (v2, require `--scoring v2` or `both`). --- @@ -487,5 +502,6 @@ GitHub Action: [CI Integration](ci-integration.md) · [`action/action.yml`](../. - [Remote Scanning](../scanning/remote-scanning.md) - [Static Snapshot](../scanning/static-snapshot.md) - [REST API](rest-api.md) -- [Scoring Specification](../reporting/scoring-spec.md) +- **[Scoring developer guide](../reporting/scoring-guide.md)** +- [Scoring Specification (legacy)](../reporting/scoring-spec.md) - [Getting Started](../get-started/getting-started.md) diff --git a/docs/platform/rest-api.md b/docs/platform/rest-api.md index 4c6012e..7816b55 100644 --- a/docs/platform/rest-api.md +++ b/docs/platform/rest-api.md @@ -4,7 +4,8 @@ MCTS can run as a **REST API server** for programmatic scans — useful when you want other tools or services to trigger scans without using the CLI directly. -> **Most users should use the CLI.** The REST API is for automation and integration scenarios. +> **Most users should use the CLI.** The REST API is for automation and integration scenarios. +> **Scores & gates:** [Scoring developer guide](../reporting/scoring-guide.md) — `scoring_mode`, `gate_violations`, v2 fields. --- @@ -123,6 +124,14 @@ All scan endpoints accept these fields (plus endpoint-specific fields where note | `analyzer_filter` | string[] | `[]` | Limit output to named analyzers | | `fanout_offset` | int | `0` | Pagination offset for batch scan endpoints | | `fanout_limit` | int | env max (50) | Page size for batch scan endpoints | +| `scoring_mode` | string | `"both"` | `legacy`, `v2`, or `both` | +| `weights_profile` | string | `"manual_v1"` | v2 weights profile when scoring is enabled | +| `corpus_stats_path` | string | — | Optional path to corpus stats JSON for v2 percentiles | +| `min_security_score` | int | — | Gate: fail when v2 security score below threshold (not enforced server-side by default) | +| `max_absolute_risk` | int | — | Gate: fail when v2 absolute risk above threshold | +| `max_risk_level` | string | — | Gate: fail when v2 risk level exceeds band | +| `min_category_score_v2` | object | — | Map of OWASP category key → minimum tile score (100=good) | +| `assets_path` | string | — | Optional `.mcts/assets.yaml` path for v2 asset-value overrides | Batch endpoints (`/scan-all-tools`, `/scan-all-prompts`, `/scan-all-resources`) run one full analyzer pass per item. Use `fanout_offset` and `fanout_limit` to paginate; responses include `truncated` and `truncation_warning` when more items remain. @@ -163,7 +172,9 @@ Batch endpoints (`/scan-all-tools`, `/scan-all-prompts`, `/scan-all-resources`) } ``` -Response: full `ScanReport` JSON (`model_dump()`). +Response: `ScanResponse` shape — full `ScanReport` fields plus echoed `scoring_mode` and `gate_violations` (string array). When `scoring_mode` is `v2` or `both`, the payload includes `score_v2` (absolute risk, dimension scores, top contributors) and `scoring_version`. Legacy `score.overall` is always populated (invariant I1). The REST API does not fail HTTP status on gate violations — consumers inspect `gate_violations` or use the CLI for exit-code enforcement. + +Optional request field `min_category_score_v2`: map of OWASP category key → minimum health score (100=good). ### Planned API extensions diff --git a/docs/reporting/README.md b/docs/reporting/README.md index 4a15cb5..f4377f8 100644 --- a/docs/reporting/README.md +++ b/docs/reporting/README.md @@ -2,9 +2,9 @@ > [Documentation](../index.md) → **Reporting** -How MCTS **presents** results — scores, exports, and shareable reports. +How MCTS **presents** scan results — scores, exports, and shareable reports. -> **Just ran your first scan?** The terminal already showed a summary. To share with others, generate [HTML](html-report.md). +> **Confused by two scores?** Read **[Scoring — developer guide](scoring-guide.md)** first (5 min). Everything else links from there. --- @@ -13,31 +13,29 @@ How MCTS **presents** results — scores, exports, and shareable reports. | Format | Command | Best for | |--------|---------|----------| | **Terminal** | `mcts scan ./server.py` | Quick feedback while coding | -| **JSON** | `mcts scan … -o report.json` | Automation, input for HTML report | +| **JSON** | `mcts scan … -o report.json` | Automation, HTML input, CI | | **SARIF** | `mcts scan … -f sarif -o report.sarif` | GitHub / GitLab Code Scanning | | **HTML** | `mcts report report.json -o report.html` | Leadership and security reviews | --- -## Score at a glance +## Scoring docs (read in this order) -| Score | Grade | Meaning | -|-------|-------|---------| -| 76–100 | A–B | Good posture | -| 51–75 | C | Review before production | -| 26–50 | D | Significant issues | -| 0–25 | F | Do not deploy | - -Details: [Scoring specification](scoring-spec.md) +| Order | Doc | Who it's for | +|-------|-----|--------------| +| **1** | **[Scoring developer guide](scoring-guide.md)** | Everyone — mental model, CI cheat sheet, JSON fields | +| 2 | [Scoring spec (legacy)](scoring-spec.md) | Legacy formula and `--min-score` gates | +| 3 | [Scoring spec v2](scoring-spec-v2.md) | v2 factors, chains, calibration | +| 4 | [Migration & policy](migration/scoring-v2.md) | YAML policy, assets, history | +| 5 | [SARIF scoreV2](sarif-score-v2.md) | Code Scanning integration | --- -## Guides +## Other guides | Page | When to read | |------|--------------| -| [Scoring specification](scoring-spec.md) | CI gates and score formula | -| [HTML dashboard](html-report.md) | Executive report layout | +| [HTML dashboard](html-report.md) | Layout of the executive report | | [Threat taxonomy](taxonomy.md) | MCTS-T technique IDs on findings | --- @@ -46,4 +44,4 @@ Details: [Scoring specification](scoring-spec.md) - [Getting started](../get-started/getting-started.md) - [CI integration](../platform/ci-integration.md) -- [Documentation index](../index.md) +- [Glossary](../glossary.md) diff --git a/docs/reporting/html-report.md b/docs/reporting/html-report.md index d57e087..fe52e45 100644 --- a/docs/reporting/html-report.md +++ b/docs/reporting/html-report.md @@ -4,7 +4,8 @@ The HTML dashboard turns a JSON scan report into a **shareable, self-contained web page** — suitable for security reviews, leadership briefings, or audit documentation. -> **Haven't generated a report yet?** Run `mcts scan ./server.py -o report.json` first, then `mcts report report.json -o report.html`. +> **Haven't generated a report yet?** Run `mcts scan ./server.py -o report.json` first, then `mcts report report.json -o report.html`. +> **Scores on the page?** See [Scoring developer guide](scoring-guide.md) — v2 block is primary when `score_v2` is present; legacy gauge appears on legacy-only scans. --- @@ -12,9 +13,10 @@ The HTML dashboard turns a JSON scan report into a **shareable, self-contained w After scanning, you get a JSON file with all findings and scores. The HTML dashboard converts that JSON into a polished web page with: -- A visual score gauge and letter grade (A–F) -- Partitioned area scores (MCP Surface, Supply Chain, Dependency Hygiene) when present -- Severity breakdown, category radar chart, and scan history trend +- **v2 multi-factor scoring** (default scans): absolute risk header, risk level pill, factor-axis radar, top contributors, OWASP `category_scores_v2` tiles +- Legacy visual score gauge and letter grade (A–F) — **legacy-only** scans (`--scoring legacy`) +- Partitioned area scores (MCP Surface, Supply Chain, Dependency Hygiene) when present — legacy formula only +- Severity breakdown, category radar chart, and scan history trend (axis switches to `absolute_risk` when all history entries are v2) - A searchable findings table with **location**, **MCTS-T technique links**, and remediation advice - Attack chain visualization - **OWASP LLM Top 10** and **OWASP MCP Top 10** mapping (including coverage gaps) @@ -51,9 +53,10 @@ The output is one HTML file with **inlined CSS and JavaScript**. Chart.js and In |---------|---------| | **Header** | MCTS logo, target path, scan timestamp, export menu | | **Report guide** | How to read scores vs counts, quick-jump links | -| **Score gauge** | Doughnut chart showing `score.overall` (0–100 security points) | -| **Grade card** | Letter grade A–F derived from score | -| **Posture badge** | Critical / High / Medium / Low risk label | +| **v2 score section** | Primary when `score_v2` present: `absolute_risk`, `risk_level` pill, `security_score`, confidence, factor radar, top contributors | +| **Score gauge** | Legacy doughnut chart showing `score.overall` (0–100); hidden when `score_v2` is present | +| **Grade card** | Letter grade A–F derived from legacy `score.overall`; hidden when `score_v2` is present | +| **Posture badge** | v2: `risk_level` from `score_v2`; legacy-only scans use overall-score bands | | **Issues summary** | Severity table with counts and meanings | | **Area sub-scores** | MCP Surface, Supply Chain, Dependency Hygiene, Composite (when `score_breakdown` present) | | **Checks summary** | Analyzers run, passed, with findings, categories clear | @@ -109,19 +112,34 @@ Search matches title, category, tool, location, technique ID, CWE, and evidence ## Scoring display -The dashboard mirrors CLI scoring exactly: +### Dual scoring (default: `--scoring both`) + +When `score_v2` is present, the dashboard shows v2 metrics only (legacy gauge and letter grade are hidden): | Element | Source field | Notes | |---------|--------------|-------| -| Security score | `score.overall` | Higher is better (0–100 points, not a %) | -| Risk index | `score.risk_index` | Shown in tooltip/detail | -| Letter grade | Computed in `report/data.py` | A=90+, F<60 | -| Severity counts | `summary.*` | Scorable findings | -| Area sub-scores | `score_breakdown` | MCP Surface, Supply Chain, Dependency Hygiene, Composite | +| Primary header | `score_v2.absolute_risk` + `risk_level` | Unbounded integer; higher = worse | +| Risk range | `score_v2.risk_range` | Confidence interval — not driven by finding confidence | +| Benchmark score | `score_v2.security_score` | 0–100 percentile vs corpus (omitted if no stats) | +| Factor radar | `score_v2.dimension_scores` | Eight RFC factor axes (exploitability, reachability, …) | +| Top contributors | `score_v2.top_contributors` | Max 10 in JSON; expandable factor breakdown in HTML | +| v2 OWASP tiles | `category_scores_v2` | 100 = good polarity; separate from legacy category bars | +| Score glossary | `score_help` | Factor and severity inputs for v2 | + +### Legacy-only elements (unchanged formula) + +| Element | Source field | Notes | +|---------|--------------|-------| +| Risk index | `score.risk_index` | Shown in legacy detail | +| Area sub-scores | `score_breakdown` | MCP Surface, Supply Chain, Dependency Hygiene — **v1 partitions only** | | Category bars | `CATEGORY_DEFS` weighting | Higher bar = more risk in dimension | -| Formula tooltip | `score.basis` | Shows weighted calculation from severity counts | +| Formula tooltip | `score.basis` | Severity-weighted legacy calculation | + +**Important:** Legacy security scores are **points**, not pass rates. v2 `absolute_risk` uses a different scorable set (excludes `attack_chains` meta-rows) and factor formula — divergent numbers on the same scan are expected. See [Scoring v2 migration](../migration/scoring-v2.md). + +### Trend chart -**Important:** Security scores are **points**, not pass rates. A low overall score with elevated category bars is expected when severe findings are present. +`score_trend()` picks the Y-axis from history `scoring_version`: all-v2 runs plot `absolute_risk`; mixed history keeps legacy `score` with a warning in `trend_meta`. --- @@ -184,7 +202,7 @@ report/data.py → build_dashboard_payload() report/generators/html_report.py ├── Jinja2: templates/dashboard.html ├── Inline: assets/styles.css, assets/dashboard.js - └── Embed: brand/logo-report.png (base64) + └── Embed: brand/Logo 2.jpg (base64) │ ▼ security-report.html (single file) @@ -201,7 +219,7 @@ security-report.html (single file) | `report/data.py` | ScanReport → dashboard JSON | | `report/generators/html_report.py` | Assembly and inlining | | `compliance/checks.py` | MCP Top 10 analyzer map (shared with compliance) | -| `brand/logo-report.png` | Hex icon embed (no wordmark — legible at 44×44) | +| `brand/Logo 2.jpg` | Logo embed in sidebar and exports | Entry: `mcts.reporting.html.write_html_report()` delegates to generator. diff --git a/docs/reporting/sarif-score-v2.md b/docs/reporting/sarif-score-v2.md new file mode 100644 index 0000000..9ab4eb5 --- /dev/null +++ b/docs/reporting/sarif-score-v2.md @@ -0,0 +1,27 @@ +# SARIF `mcts/scoreV2` extension + +MCTS SARIF output (`--format sarif`) includes optional run properties when `score_v2` is present: + +```json +{ + "runs": [{ + "properties": { + "mcts/scoreV2": { + "absoluteRisk": 2260, + "securityScore": 12, + "riskLevel": "critical" + } + } + }] +} +``` + +## Code Scanning adoption + +GitHub Code Scanning ingests SARIF by default but **does not surface custom run properties** in the Security tab. Consumers must: + +1. Parse SARIF JSON in CI or dashboards. +2. Read `runs[].properties["mcts/scoreV2"]` explicitly. +3. Gate on `absoluteRisk` / `securityScore` with `--min-security-score` or `--max-absolute-risk` in the MCTS CLI/Action instead of relying on Code Scanning UI alone. + +Legacy `score.overall` is not written to SARIF run properties in v2.0 — use CLI gates or custom SARIF post-processing for dual-score policies. diff --git a/docs/reporting/scoring-guide.md b/docs/reporting/scoring-guide.md new file mode 100644 index 0000000..62ae001 --- /dev/null +++ b/docs/reporting/scoring-guide.md @@ -0,0 +1,257 @@ +# Scoring — developer guide + +> **Read this first** if you are confused by two scores, different CI flags, or mismatched numbers on the same scan. + +One scan produces **findings** plus **scores**. MCTS runs **two score engines** in parallel by default (`--scoring both`). They answer different questions — both are intentional. + +**Not a scoring question?** Use the [documentation index](../index.md) task picker for install, scan modes, or CI wiring. + +--- + +## Which doc should I read? + +| Your situation | Start here | Then (if needed) | +|----------------|------------|------------------| +| First scan — what do the numbers mean? | This page → [60-second mental model](#60-second-mental-model) | [Getting started](../get-started/getting-started.md#reading-the-output) | +| Wiring CI / GitHub Action | [CI gates](#ci-gates--pick-one-strategy) | [CI integration](../platform/ci-integration.md) | +| JSON field reference | [JSON report fields](#json-report-fields) | [REST API](../platform/rest-api.md) | +| HTML dashboard blocks | [HTML dashboard](#html-dashboard) | [HTML report](html-report.md) | +| Change legacy formula | [Implementing](#implementing-or-debugging-scoring) | [Scoring spec (legacy)](scoring-spec.md) | +| Change v2 factors / chains | [Implementing](#implementing-or-debugging-scoring) | [Scoring spec v2](scoring-spec-v2.md) | +| Policy YAML / assets / history | [Migration notes](../migration/scoring-v2.md) | — | +| SARIF + Code Scanning | [API](#api) | [SARIF scoreV2](sarif-score-v2.md) | + +--- + +## 60-second mental model + +``` +Findings → Legacy engine → score.overall (0–100, higher = better) + → v2 engine → score_v2 (absolute_risk, higher = worse) +``` + +| You want to… | Use this field | CI flag (examples) | +|--------------|----------------|-------------------| +| Keep existing pipelines working | `score.overall` | `--min-score 70` | +| Stable risk number for policies | `score_v2.absolute_risk` | `--max-absolute-risk 500` | +| Compare to other MCP servers | `score_v2.security_score` | `--min-security-score 40` | +| Simple pass/fail band | `score_v2.risk_level` | `--max-risk-level high` | +| Block on critical findings | `summary.critical` | `--fail-on-critical` | + +**Default:** `--scoring both` — you get legacy **and** v2 in JSON, terminal, HTML, and SARIF. +**Legacy only:** `--scoring legacy` — no `score_v2` field. + +--- + +## Why two scores on one scan? + +| | Legacy `score.overall` | v2 `score_v2.absolute_risk` | +|--|------------------------|----------------------------| +| **Formula** | Severity weights + exponential decay | Eight security factors + chain multiplier | +| **Scale** | 0–100 (higher = better) | Integer ≥ 0 (higher = worse) | +| **Findings counted** | All except `compliance` | Also excludes `attack_chains` meta-rows | +| **Attack chains** | Critical chain rows in the sum | Chain signal via `chain_factor` on tool findings | +| **Typical use** | Existing CI, letter grade | New policies, explainability, benchmarks | + +**Different numbers on the same scan are normal** — not a bug. + +Example (`examples/vulnerable-mcp-server/server.py`): + +- Legacy overall: **1/100** (includes chain meta-findings) +- v2 absolute risk: **2260** (multi-factor, tool findings only) +- v2 security score: **9/100** (benchmark vs corpus — not the same as legacy overall) + +--- + +## Reading terminal output + +When `--scoring both` (default): + +```text +Overall Score: 1/100 (CRITICAL) ← legacy; gates: --min-score +Risk Index: 100/100 ← legacy linear burden (higher = worse) +Scoring basis: 5 Critical, 11 High, 1 Medium (17 scorable findings) +Absolute Risk: 2260 (critical) ← v2 headline; gates: --max-absolute-risk +Security Score: 9/100 ← v2 benchmark vs corpus; gates: --min-security-score +MCP Surface: 1/100 ← legacy partition only +``` + +| Line | Engine | Use in CI? | +|------|--------|------------| +| Overall Score | Legacy | `--min-score` | +| Risk Index | Legacy | Display only | +| Absolute Risk | v2 | `--max-absolute-risk`, `--max-risk-level` | +| Security Score | v2 | `--min-security-score` | +| MCP Surface / Supply Chain | Legacy partitions | `--fail-on-category` (legacy keys) | + +**Risk Index** is legacy only (linear 0–100, higher = worse). +**MCP Surface / Supply Chain / Composite** are legacy partitions — not v2. + +--- + +## JSON report fields + +Every `ScanReport` includes `score` (legacy). With v2/both, `score_v2` is added. + +```json +{ + "scoring_version": "both", + "score": { + "overall": 1, + "risk_index": 100, + "basis": { "critical": 5, "high": 11, "scorable_total": 17 } + }, + "score_v2": { + "absolute_risk": 2260, + "risk_level": "critical", + "security_score": 9, + "dimension_scores": { "blast_radius": 100, "reachability": 90, "threat_maturity": 25 }, + "top_contributors": [ "..." ], + "basis": { "scorable_count": 12, "excluded_non_scorable": 7 } + } +} +``` + +| Field | Engine | Notes | +|-------|--------|-------| +| `score.overall` | Legacy | Always present (invariant I1) | +| `score_v2` | v2 | `null` when `--scoring legacy` | +| `score_breakdown` | Legacy | MCP Surface / Supply Chain partitions — **not** v2 | +| `category_scores_v2` | v2 | In dashboard JSON only; OWASP tiles, 100 = good | + +--- + +## CI gates — pick one strategy + +### Strategy A: Keep legacy CI (no change) + +```bash +mcts scan ./server.py --fail-on-critical --min-score 70 +``` + +Works exactly as before. v2 fields are still in the report for visibility. + +### Strategy B: Add v2 gates (recommended for new policies) + +```bash +mcts scan ./server.py \ + --fail-on-critical \ + --max-absolute-risk 500 \ + --max-risk-level high +``` + +Scoring is already `both` by default — no extra `--scoring` flag needed. + +### Strategy C: Dual gates (transition period) + +```bash +mcts scan ./server.py --min-score 70 --max-absolute-risk 500 +``` + +Both must pass. Tune thresholds independently. + +### Gate cheat sheet + +| Flag | Metric | Needs `--scoring` | +|------|--------|-------------------| +| `--min-score` | Legacy `overall` | No | +| `--fail-on-category` | Legacy category bars | No | +| `--min-security-score` | v2 benchmark | v2 or both (default) | +| `--max-absolute-risk` | v2 `absolute_risk` | v2 or both | +| `--max-risk-level` | v2 `risk_level` | v2 or both | +| `--min-category-score-v2` | v2 OWASP tiles | v2 or both | + +Full CI patterns: [CI integration](../platform/ci-integration.md) + +--- + +## HTML dashboard + +| UI block | Source | When shown | +|----------|--------|------------| +| **Absolute risk + risk pill** | `score_v2` | Primary when v2 present | +| **Factor radar + contributors** | `score_v2` | v2/both | +| **Legacy gauge + letter grade** | `score.overall` | Legacy-only scans; hidden when `score_v2` present | +| **Category bars (7 dimensions)** | Legacy | Always | +| **v2 OWASP tiles** | `category_scores_v2` | v2/both | + +Details: [HTML report](html-report.md) + +--- + +## API + +- Request: `scoring_mode` (`legacy` | `v2` | `both`, default `both`) +- Response: full report + `gate_violations[]` when gates fail +- HTTP status stays **200** on gate failure — check `gate_violations` or use CLI for exit code 1 + +Details: [REST API](../platform/rest-api.md) + +--- + +## Common pitfalls + +### `--no-attack-chains` under v2/both + +Does **not** turn off the chains analyzer. It only disables the v2 **multiplier** (`chain_factor = 1.0`). Graph and chain findings still appear. + +Use `--scoring legacy` if you want the old behavior (no chain meta-findings). + +### Mixing metrics in trends + +History stores `scoring_version`. The HTML trend chart uses **either** legacy score **or** `absolute_risk` — never both on one axis. Mixed history shows legacy with a warning. + +### Readiness / vet scores + +`mcts readiness` and `mcts vet` use **separate** scoring pipelines. They do not affect scan `score` or `score_v2`. + +### Fuzz / live findings + +Fuzz and runtime events are **not** merged into the default static scan v2 sum today. Run separate fuzz/pentest flows for live signal. + +### Letter grade (A–F) in HTML + +The letter grade and doughnut gauge use **legacy** `score.overall` and appear only on **legacy-only** scans. When `score_v2` is present, the HTML report shows the v2 block (absolute risk + risk pill) instead. + +### `--ci` preset + +The `--ci` preset applies **legacy** gates only (`--fail-on-critical`, `--min-score 70`). For v2 gates in CI, set flags explicitly or use the [GitHub Action](../../action/README.md) v2 inputs. + +--- + +## FAQ + +**Why is legacy overall 1/100 but absolute risk 2260?** +Different formulas and finding sets. Legacy uses exponential decay on all scorable severities including chain meta-rows; v2 sums per-finding factor brackets on tool rows only. See [Why two scores](#why-two-scores-on-one-scan). + +**Which score should my CI use?** +Keep `--min-score` if you have existing pipelines. Add `--max-absolute-risk` or `--max-risk-level` for new policies. See [CI strategies](#ci-gates--pick-one-strategy). + +**Does `--no-attack-chains` remove chain findings?** +No — it only disables the v2 **multiplier**. Use `--scoring legacy` to drop chain meta-findings from the legacy sum. + +**Where is `score_v2` in SARIF?** +Run-level property `mcts/scoreV2` on the SARIF run object. Per-finding v2 metadata is not emitted yet. + +**Do readiness or vet scores affect scan scores?** +No — separate commands and pipelines. + +--- + +## Implementing or debugging scoring + +| Task | Doc | +|------|-----| +| Change legacy formula | [Scoring spec (legacy)](scoring-spec.md) · `src/mcts/scoring/engine.py` | +| Change v2 factors / chains | [Scoring spec v2](scoring-spec-v2.md) · `src/mcts/scoring/engine_v2.py` | +| Pipeline order | [Architecture](../analysis/architecture.md#scoring-and-reporting) | +| All formulas (internal) | `local/score-calculations-reference.md` (contributors) | +| ADR decisions | [ADR-003](../analysis/adr-003-scoring-v2.md) | + +--- + +## Related + +- [Reporting overview](README.md) +- [Glossary — score terms](../glossary.md#scores-and-reports) +- [Migration notes](../migration/scoring-v2.md) — policy YAML, assets, history diff --git a/docs/reporting/scoring-spec-v2.md b/docs/reporting/scoring-spec-v2.md new file mode 100644 index 0000000..247ac25 --- /dev/null +++ b/docs/reporting/scoring-spec-v2.md @@ -0,0 +1,124 @@ +# MCTS Risk Score v2 — Specification + +> **Read first:** [Scoring developer guide](scoring-guide.md) — mental model, CI flags, JSON fields. +> This page is the **technical v2 reference** (formulas and implementation map). + +**Status:** GA (default `--scoring both`) +**ADR:** [adr-003-scoring-v2.md](../analysis/adr-003-scoring-v2.md) +**Legacy spec:** [scoring-spec.md](scoring-spec.md) +**SARIF:** [sarif-score-v2.md](sarif-score-v2.md) + +## Overview + +v2 adds `score_v2` with **absolute risk** (integer, higher = worse) next to legacy `score.overall` (0–100, higher = better). + +## Scorable set + +Excluded from v2 sum: `compliance`, `attack_chains` meta-findings. Tool-attributed findings from other analyzers are scored. + +## Per-finding formula (RFC §4.1) + +``` +bracket = 1 + Σ factor_increments +base_risk = severity_w × bracket +finding_risk = round(base_risk × chain_factor) +absolute_risk = Σ finding_risk +``` + +Factor increments come from classifiers in `weights_v1.yaml` under `classifiers:`. Evidence tags on findings refine classifiers when emitters populate `reachability_tag`, `exploitability_class`, etc. + +## Chain multiplier + +`chain_factor` applies to tool findings on validated graph paths (`hop_count` ≥ 1). Severity floor: medium+. Meta chain rows are display-only. + +| hop_count | chain_factor | +|-----------|--------------| +| 0–1 | 1.0 | +| 2 | 1.15 | +| 3 | 1.35 | +| 4+ | 1.50 | + +## Output (`score_v2`) + +| Field | Description | +|-------|-------------| +| `absolute_risk` | Stable integer sum | +| `security_score` | `100 - percentile(absolute_risk, corpus)` when stats available | +| `risk_level` | Band from corpus or literals: low/medium/high/critical | +| `risk_range` | Confidence interval on absolute risk (not driven by finding confidence) | +| `dimension_scores` | Eight factor axes 0–100 (higher = worse) | +| `top_contributors` | Top 10 findings/paths by contribution | +| `category_scores_v2` | Separate OWASP tiles, 100 = good (dashboard JSON) | +| `basis` | Scorable counts, excluded meta-rows, `weights_hash` | + +## Aggregation formulas (§8.8–8.10) + +### §8.8 `confidence_score` (RFC §4.3) + +Confidence affects `confidence_score` and `risk_range` only — **never** `absolute_risk`. Inputs are v2-scorable findings with aligned per-finding risks: + +``` +pairs = [(risk, finding) for finding, risk in zip(scorable, risks) if risk > 0] +if no pairs → confidence_score = 100 +else confidence_score = round(100 × Σ(effective_confidence(f) × risk) / Σ risk) +``` + +`effective_confidence` applies per-analyzer caps from `uncertainty.py` when `finding.confidence >= 0.99`. + +### §8.9 `risk_range` spread (RFC §4.12) + +``` +if absolute_risk == 0 → risk_range = (0, 0), label = "high" +mean_conf = weighted mean of effective_confidence by finding_risk +base_spread = absolute_risk × (1 - mean_conf) × 0.35 +spread = base_spread × evidence_quality_factor × analyzer_disagreement_factor +low = max(0, round(absolute_risk - spread)) +high = round(absolute_risk + spread) +label = high if mean_conf >= 0.85 else medium if mean_conf >= 0.65 else low +``` + +- `evidence_quality_factor`: 0.8 when live_probe + handler_traced tags present; else 1.2 +- `analyzer_disagreement_factor`: 1.4 when conflicting severities share a tool; else 1.0 + +### §8.10 `top_contributors` selection (RFC §4.14) + +1. Rank scorable findings by `finding_risk` descending; take up to **9** rows (`type=finding`). +2. Append one explainability row (`type=attack_chain`) for the highest `hop_count` path when paths exist and total rows < 10. +3. JSON export caps at **10** rows and omits verbose `evidence_tags`. + +Per-finding contributor fields: `risk_contribution`, `confidence` (effective × 100), `chain_factor`, `factors` breakdown. + +### `dimension_scores` normalization (§7.5) + +Per-axis raw sum = Σ factor increment for that axis across scorable findings. Normalized **relative to this scan** (0–100; highest-loaded axis = 100): + +``` +if raw <= 0 → 0 +else → min(100, round(100 × raw / max(raw across all axes on this scan))) +``` + +This shapes the factor radar (which axes dominate on the current server). Corpus-wide benchmarking uses `absolute_risk` and `security_score`, not per-axis tiles. + +Packaged `dimension_p95` in corpus stats is retained for calibration scripts but is not used for `dimension_scores` display. + +## CI gates + +| Flag | Applies to | +|------|------------| +| `--min-score` | Legacy only | +| `--min-security-score` | v2 benchmark score | +| `--max-absolute-risk` | v2 absolute risk | +| `--max-risk-level` | v2 band | +| `--min-category-score-v2` | v2 OWASP tiles (100=good; fail when below minimum) | +| `--fail-on-category` | Legacy category tiles only | + +## Implementation map + +| Module | Role | +|--------|------| +| `scoring/engine_v2.py` | Sum, verify, contributors | +| `scoring/context.py` | `build_scoring_context`, chain factors | +| `scoring/graph.py` | `canonical_attack_graph`, `build_paths` | +| `scoring/evidence_tags.py` | PR-4b analyzer evidence tag helpers | +| `scoring/evidence_emit.py` | Graph/scope-dependent evidence enrichment | +| `scoring/weights_v1.yaml` | Classifier lookup tables | diff --git a/docs/reporting/scoring-spec.md b/docs/reporting/scoring-spec.md index ac9d3a8..e7c3560 100644 --- a/docs/reporting/scoring-spec.md +++ b/docs/reporting/scoring-spec.md @@ -2,23 +2,23 @@ > [Documentation](../index.md) → [Reporting](README.md) -This document explains how MCTS calculates the **security score** (0–100) and **risk index** from findings. Use it to set CI gate thresholds, explain scores to stakeholders, or verify that scoring is working correctly. +This document is the **legacy** scoring reference (`score.overall`, 0–100). For the full picture (legacy + v2), read the **[Scoring developer guide](scoring-guide.md)** first. -> **Just want to set a CI gate?** Use `--min-score 70 --fail-on-critical`. See [CI Integration](../platform/ci-integration.md). -> **Unfamiliar with terms?** See the [Glossary](../glossary.md). +> **CI gate on legacy score:** `--min-score 70 --fail-on-critical` · [CI Integration](../platform/ci-integration.md) +> **v2 scoring:** [Scoring spec v2](scoring-spec-v2.md) --- ## In plain English -After MCTS finds security issues, it converts them into a single number: +The **legacy overall score** is 0–100 where **higher is better**. It uses severity weights and exponential decay — see formulas below. -- **Security score (0–100):** Higher is better. 100 means no issues. Below 50 is serious. -- **Risk index (0–100):** Higher is worse. A linear measure of total risk burden. +- **Risk index (0–100):** Higher is worse. Linear measure of total risk burden. +- **Default scans** also compute v2 (`score_v2`) — this doc does **not** cover v2. See [scoring guide](scoring-guide.md). -The score is calculated from finding severities using a transparent formula — nothing is hardcoded per target. Every report includes a `score.basis` field showing exactly which findings contributed, so you can verify the math. +Every report includes `score.basis` showing which severities contributed. The scanner verifies the math on every run. -**Example:** A server with 3 Critical + 7 High + 2 Medium findings scores approximately **5/100**. +**Example:** `examples/vulnerable-mcp-server/server.py` scores approximately **1/100** legacy overall (v2 is separate — see [scoring guide](scoring-guide.md)). Compliance findings (OWASP mapping) appear in reports but do **not** affect the score. @@ -47,7 +47,7 @@ Reports may include `score_breakdown` with decomposed scores: 1. **Deterministic** — same findings always produce the same score 2. **Auditable** — `score.basis` documents exact severity counts used -3. **CI-friendly** — gates on overall score, critical count, and category thresholds +3. **CI-friendly** — legacy gates on overall score, critical count, and category thresholds; v2 gates documented in [scoring-spec-v2](scoring-spec-v2.md) 4. **Separated compliance** — OWASP meta-findings do not inflate risk score The scanner calls `RiskScoringEngine.verify()` after scoring; mismatch raises `RuntimeError` (regression guard). @@ -201,7 +201,7 @@ Benchmarks are illustrative overlays — not pass/fail thresholds. --- -## CI gate semantics +## CI gate semantics (legacy) Exit code **1** when a gate fails; **2** for usage/consent errors. @@ -210,10 +210,23 @@ Exit code **1** when a gate fails; **2** for usage/consent errors. | `--fail-on-critical` | `summary.critical > 0` (scorable findings) | | `--min-score N` | `score.overall < N` | | `--max-critical N` | `summary.critical > N` | -| `--fail-on-category KEY:LIMIT` | Category score ≥ LIMIT | +| `--fail-on-category KEY:LIMIT` | Legacy category score ≥ LIMIT | Category gates are **inclusive** at the limit: `--fail-on-category permissions:10` fails when permissions category score is **10 or higher**. +### v2 gates (shipped) + +Requires `--scoring v2` or `both` (default). Canonical reference: [Scoring spec v2](scoring-spec-v2.md) · [Migration guide](../migration/scoring-v2.md). + +| Flag | Fails when | +|------|------------| +| `--min-security-score N` | `score_v2.security_score < N` (needs corpus stats) | +| `--max-absolute-risk N` | `score_v2.absolute_risk > N` | +| `--max-risk-level LEVEL` | `score_v2.risk_level` exceeds band | +| `--min-category-score-v2 KEY:MIN` | v2 OWASP tile < MIN (100=good) | + +REST API returns `gate_violations` but does not change HTTP status — use CLI for CI exit codes. + ### Recommended starter policy ```bash @@ -241,10 +254,12 @@ Tune limits per team risk appetite. Start strict on `max-critical` and relax `mi Grades are derived from `score.overall` in `report/data.py`. -### Planned scoring modes (gap audit) +### Scoring modes -| Mode | Status | GAP | -|------|--------|-----| +| Mode | Status | Notes | +|------|--------|-------| +| Legacy exponential (`--scoring legacy`) | Shipped | This document — `score.overall` | +| Multi-factor v2 (`--scoring v2\|both`) | Shipped (default `both`) | [Scoring spec v2](scoring-spec-v2.md) | | AIVSS v2 (`--scoring aivss`) | Missing | GAP-060 | | CVSS v4 vector per finding | Missing | GAP-061 | | Runtime trust score (live/proxy) | Planned | L10-01 | diff --git a/docs/reporting/taxonomy.md b/docs/reporting/taxonomy.md index 157000b..1a4a75e 100644 --- a/docs/reporting/taxonomy.md +++ b/docs/reporting/taxonomy.md @@ -61,7 +61,7 @@ Extend `src/mcts/taxonomy/crosswalk.json` when mapping new techniques to externa | MCTS-T-1002 | Path Traversal / Missing Validation | `path_validation`, `tool_abuse` | High | | MCTS-T-1003 | Command Execution via Tool Handler | `command_execution` | Critical | | MCTS-T-1004 | Sensitive Data Exposure | `data_leakage` | High | -| MCTS-T-1005 | Multi-Step Attack Chain | `attack_chains` | Critical | +| MCTS-T-1005 | Multi-Step Attack Chain | `attack_chains` | Critical (meta-rows displayed; excluded from v2 `absolute_risk` sum — chain signal is `chain_factor` on tool findings) | | MCTS-T-1006 | Excessive Tool Permissions | `permission_analyzer` | Critical | | MCTS-T-1007 | Tool Output Prompt Injection | `jailbreak`, `runtime_events` | High | | MCTS-T-1008 | Cross-Server Tool Shadowing | `cross_server` | High | diff --git a/docs/scanning/README.md b/docs/scanning/README.md index 7d53147..3d26ef8 100644 --- a/docs/scanning/README.md +++ b/docs/scanning/README.md @@ -14,6 +14,8 @@ Answer these questions: - **Yes** → `mcts scan ./server.py` or `mcts scan ./repo/` ([getting started](../get-started/getting-started.md)) - **Not sure which file** → `mcts scan . --auto` +**Confused by Overall Score vs Absolute Risk?** → [Scoring developer guide](../reporting/scoring-guide.md) + **Do you need what the server advertises at runtime?** - Add `--live --i-understand-live-risk` → [Live scanning](live-scanning.md) @@ -42,7 +44,7 @@ Answer these questions: | **Inventory** | Config only | No | No | `mcts inventory --scan` | | **Fuzz** | No | Yes | No | `mcts fuzz …` | -After discovery, all modes feed the same analyzers and produce the same report format. +After discovery, all modes feed the same analyzers and produce the same report format (legacy `score` + v2 `score_v2` when `--scoring v2|both`, default `both`). --- diff --git a/docs/scanning/live-scanning.md b/docs/scanning/live-scanning.md index be9ec36..ee059c4 100644 --- a/docs/scanning/live-scanning.md +++ b/docs/scanning/live-scanning.md @@ -198,7 +198,7 @@ If source **is** present, static TS discovery still runs in parallel. See [TypeS | `--runtime-events` | Merged with live-generated events | | `--sigma-rules-path` | Applies to merged `MCPServerInfo` | | `--semantic-secrets` | Static source analysis; independent of live | -| `--fail-on-*` gates | Apply to final report regardless of discovery mode | +| `--fail-on-*` / v2 gates | Apply to final report regardless of discovery mode (`score_v2` included when `--scoring v2\|both`, default `both`) | --- @@ -261,6 +261,11 @@ mcts scan examples/live-mcp-server/server.py \ --live --no-progress \ -o report.json \ --min-score 70 + +# Or v2 gates (scoring both is default) +mcts scan examples/live-mcp-server/server.py \ + --live --no-progress \ + --max-absolute-risk 500 -o report.json ``` --- diff --git a/docs/scanning/static-snapshot.md b/docs/scanning/static-snapshot.md index ff08d4b..7803902 100644 --- a/docs/scanning/static-snapshot.md +++ b/docs/scanning/static-snapshot.md @@ -98,10 +98,14 @@ Use individual paths via CLI when exporting prompts/resources separately (future # Export tools from a trusted environment, then scan offline mcts scan . --snapshot ./artifacts/tools-list.json -o report.json -# With CI gates +# With CI gates (legacy or v2 — default scoring is both) mcts scan . --snapshot tools.json \ --fail-on-critical --min-score 70 \ -o report.json + +mcts scan . --snapshot tools.json \ + --max-absolute-risk 500 --max-risk-level high \ + -o report.json ``` `discovery_mode` on the resulting `MCPServerInfo` is `static-json`. diff --git a/pyproject.toml b/pyproject.toml index ea5903b..a5e80d2 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -117,6 +117,7 @@ path = "src/mcts/__init__.py" [tool.hatch.build.targets.wheel] packages = ["src/mcts"] +force-include = { "src/mcts/scoring/weights_v1.yaml" = "mcts/scoring/weights_v1.yaml", "src/mcts/scoring/weights_learned.yaml" = "mcts/scoring/weights_learned.yaml", "src/mcts/scoring/data/scoring_v2_corpus_stats.json" = "mcts/scoring/data/scoring_v2_corpus_stats.json" } [tool.hatch.build.targets.sdist] only-include = [ diff --git a/scripts/calibrate_scoring_weights.py b/scripts/calibrate_scoring_weights.py new file mode 100644 index 0000000..7be3388 --- /dev/null +++ b/scripts/calibrate_scoring_weights.py @@ -0,0 +1,70 @@ +#!/usr/bin/env python3 +"""Refresh packaged corpus stats and print Spearman correlation vs expert rankings.""" + +from __future__ import annotations + +import argparse +import json + +from mcts.scoring.corpus_runner import ( + EXPERT_RANKINGS_PATH, + PACKAGE_STATS_PATH, + build_package_stats_from_metrics, + scan_corpus_metrics, + spearman_rho, +) +from mcts.scoring.weights import PACKAGE_DIR + + +def main() -> int: + parser = argparse.ArgumentParser(description="Calibrate v2 scoring corpus stats") + parser.add_argument("--scoring", default="v2", choices=["v2", "both"]) + parser.add_argument("--write-package-stats", action="store_true") + parser.add_argument("--min-rho", type=float, default=0.0, help="Exit 1 if Spearman rho below threshold") + parser.add_argument( + "--stats-version", + default="corpus-2026-06", + help="Version label written into packaged corpus stats JSON", + ) + parser.add_argument( + "--write-learned-weights", + action="store_true", + help="Copy manual_v1 weights to weights_learned.yaml (offline calibration placeholder)", + ) + args = parser.parse_args() + + metrics = scan_corpus_metrics(scoring_mode=args.scoring) + risks = metrics.risks + for server_id, absolute_risk in risks.items(): + print(f"{server_id}: absolute_risk={absolute_risk}") + + if args.write_package_stats and risks: + stats = build_package_stats_from_metrics(metrics, version=args.stats_version) + PACKAGE_STATS_PATH.write_text(json.dumps(stats, indent=2) + "\n", encoding="utf-8") + print(f"Wrote {PACKAGE_STATS_PATH}") + + if args.write_learned_weights: + manual = PACKAGE_DIR / "weights_v1.yaml" + learned = PACKAGE_DIR / "weights_learned.yaml" + text = manual.read_text(encoding="utf-8").replace("version: manual_v1", "version: learned_v1", 1) + learned.write_text(text, encoding="utf-8") + print(f"Wrote {learned}") + + if EXPERT_RANKINGS_PATH.exists(): + expert = json.loads(EXPERT_RANKINGS_PATH.read_text(encoding="utf-8")) + ids = [row["server_id"] for row in expert["rankings"] if row["server_id"] in risks] + model_vals = [float(risks[sid]) for sid in ids] + expert_vals = [ + float(row.get("expert_score") or max(0, 100 - (int(row["rank"]) - 1) * 15)) + for row in expert["rankings"] + if row["server_id"] in risks + ] + rho = spearman_rho(model_vals, expert_vals) + print(f"Spearman rho={rho:.3f} (n={len(ids)})") + if rho < args.min_rho: + raise SystemExit(1) + return 0 + + +if __name__ == "__main__": + raise SystemExit(main()) diff --git a/scripts/enable-branch-protection.sh b/scripts/enable-branch-protection.sh index 1c5f7ae..76124e4 100755 --- a/scripts/enable-branch-protection.sh +++ b/scripts/enable-branch-protection.sh @@ -1,8 +1,8 @@ #!/usr/bin/env bash set -euo pipefail -# Apply branch protection requiring the CI "test" job to pass. -# Idempotent: updates an existing "Protect main" ruleset instead of creating duplicates. +# Apply repository rulesets from .github/rulesets/*.json. +# Idempotent: updates an existing ruleset with the same name instead of creating duplicates. # Requires: gh CLI authenticated with admin access to the repository. # # Usage: @@ -10,6 +10,17 @@ set -euo pipefail # ./scripts/enable-branch-protection.sh MCP-Audit/MCTS # ./scripts/enable-branch-protection.sh MCP-Audit/MCTS --dry-run +GH_CONFIG="${GH_CONFIG_DIR:-${XDG_CONFIG_HOME:-$HOME/.config}/gh}/config.yml" +if [[ ! -r "${GH_CONFIG}" ]]; then + echo "error: cannot read gh config at ${GH_CONFIG}" >&2 + echo "Fix ownership (run in your terminal, enter your password):" >&2 + echo " sudo chown -R \"\$(whoami)\":staff \"\${XDG_CONFIG_HOME:-\$HOME/.config}/gh\"" >&2 + echo " chmod 700 \"\${XDG_CONFIG_HOME:-\$HOME/.config}/gh\"" >&2 + echo " chmod 600 \"\${XDG_CONFIG_HOME:-\$HOME/.config}/gh\"/*.yml" >&2 + echo "Or apply rulesets manually: Settings → Rules → see .github/rulesets/README.md" >&2 + exit 1 +fi + REPO="${1:-$(gh repo view --json nameWithOwner -q .nameWithOwner)}" DRY_RUN=false if [[ "${2:-}" == "--dry-run" ]]; then @@ -19,40 +30,69 @@ elif [[ "${1:-}" == "--dry-run" ]]; then REPO="$(gh repo view --json nameWithOwner -q .nameWithOwner)" fi -RULESET_FILE="$(cd "$(dirname "$0")/.." && pwd)/.github/rulesets/main.json" -RULESET_NAME="$(python3 -c "import json,sys; print(json.load(open(sys.argv[1]))['name'])" "${RULESET_FILE}")" +ROOT="$(cd "$(dirname "$0")/.." && pwd)" +RULESETS_DIR="${ROOT}/.github/rulesets" + +if [[ ! -d "${RULESETS_DIR}" ]]; then + echo "Rulesets directory not found: ${RULESETS_DIR}" >&2 + exit 1 +fi + +RULESET_FILES=() +while IFS= read -r ruleset_file; do + RULESET_FILES+=("${ruleset_file}") +done < <(find "${RULESETS_DIR}" -maxdepth 1 -name '*.json' -print | sort) +if [[ "${#RULESET_FILES[@]}" -eq 0 ]]; then + echo "No ruleset JSON files found in ${RULESETS_DIR}" >&2 + exit 1 +fi echo "Checking existing rulesets on ${REPO}..." -EXISTING_ID="$( - gh api "repos/${REPO}/rulesets" --paginate \ - | python3 -c " +EXISTING_JSON="$( + gh api "repos/${REPO}/rulesets" --paginate 2>/dev/null || echo '[]' +)" + +apply_ruleset() { + local ruleset_file="$1" + local ruleset_name + ruleset_name="$(python3 -c "import json,sys; print(json.load(open(sys.argv[1]))['name'])" "${ruleset_file}")" + + local existing_id + existing_id="$( + python3 -c " import json, sys name = sys.argv[1] -for row in json.load(sys.stdin): +rows = json.loads(sys.argv[2]) +for row in rows: if row.get('name') == name: print(row.get('id', '')) break -" "${RULESET_NAME}" -)" +" "${ruleset_name}" "${EXISTING_JSON}" + )" -if [[ -n "${EXISTING_ID}" ]]; then - echo "Found existing ruleset \"${RULESET_NAME}\" (id=${EXISTING_ID}). Updating in place..." - if [[ "${DRY_RUN}" == true ]]; then - echo "[dry-run] Would PUT repos/${REPO}/rulesets/${EXISTING_ID}" - exit 0 + if [[ -n "${existing_id}" ]]; then + echo "Updating ruleset \"${ruleset_name}\" (id=${existing_id}) from ${ruleset_file##*/}..." + if [[ "${DRY_RUN}" == true ]]; then + echo "[dry-run] Would PUT repos/${REPO}/rulesets/${existing_id}" + return 0 + fi + gh api "repos/${REPO}/rulesets/${existing_id}" \ + --method PUT \ + --input "${ruleset_file}" + else + echo "Creating ruleset \"${ruleset_name}\" from ${ruleset_file##*/}..." + if [[ "${DRY_RUN}" == true ]]; then + echo "[dry-run] Would POST repos/${REPO}/rulesets" + return 0 + fi + gh api "repos/${REPO}/rulesets" \ + --method POST \ + --input "${ruleset_file}" fi - gh api "repos/${REPO}/rulesets/${EXISTING_ID}" \ - --method PUT \ - --input "${RULESET_FILE}" -else - echo "No existing ruleset named \"${RULESET_NAME}\". Creating..." - if [[ "${DRY_RUN}" == true ]]; then - echo "[dry-run] Would POST repos/${REPO}/rulesets" - exit 0 - fi - gh api "repos/${REPO}/rulesets" \ - --method POST \ - --input "${RULESET_FILE}" -fi +} + +for ruleset_file in "${RULESET_FILES[@]}"; do + apply_ruleset "${ruleset_file}" +done echo "Done. Verify at: https://github.com/${REPO}/settings/rules" diff --git a/scripts/run_scoring_corpus.py b/scripts/run_scoring_corpus.py new file mode 100644 index 0000000..aca23ea --- /dev/null +++ b/scripts/run_scoring_corpus.py @@ -0,0 +1,44 @@ +#!/usr/bin/env python3 +"""Batch-scan scoring corpus servers and optionally refresh packaged stats.""" + +from __future__ import annotations + +import argparse +import json + +from mcts.scoring.corpus_runner import ( + PACKAGE_STATS_PATH, + build_package_stats_from_metrics, + scan_corpus_metrics, +) + + +def main() -> int: + parser = argparse.ArgumentParser(description="Run v2 scoring across corpus servers") + parser.add_argument("--scoring", default="v2", choices=["v2", "both"]) + parser.add_argument( + "--write-package-stats", + action="store_true", + help="Write distribution snapshot to packaged corpus stats JSON", + ) + parser.add_argument( + "--stats-version", + default="corpus-2026-06", + help="Version label written into packaged corpus stats JSON", + ) + args = parser.parse_args() + + metrics = scan_corpus_metrics(scoring_mode=args.scoring) + risks = metrics.risks + for server_id, absolute_risk in risks.items(): + print(f"{server_id}: absolute_risk={absolute_risk}") + + if args.write_package_stats and risks: + stats = build_package_stats_from_metrics(metrics, version=args.stats_version) + PACKAGE_STATS_PATH.write_text(json.dumps(stats, indent=2) + "\n", encoding="utf-8") + print(f"Wrote {PACKAGE_STATS_PATH}") + return 0 + + +if __name__ == "__main__": + raise SystemExit(main()) diff --git a/src/mcts/__init__.py b/src/mcts/__init__.py index bbea07d..b172a8c 100644 --- a/src/mcts/__init__.py +++ b/src/mcts/__init__.py @@ -1,6 +1,6 @@ """MCTS (Model Context Threat Scanner) — security analysis for MCP servers.""" -__version__ = "0.1.2" +__version__ = "0.1.3" from mcts.core.config import ScanConfig from mcts.core.scanner import Scanner diff --git a/src/mcts/analyzers/attack_chains.py b/src/mcts/analyzers/attack_chains.py index 13b6330..7c3bd5b 100644 --- a/src/mcts/analyzers/attack_chains.py +++ b/src/mcts/analyzers/attack_chains.py @@ -2,12 +2,13 @@ from __future__ import annotations -from collections import deque from typing import Any from mcts.analyzers.base import BaseAnalyzer from mcts.mcp.models import MCPServerInfo, MCPTool from mcts.reporting.models import Finding, Severity +from mcts.scoring.evidence_tags import tag_attack_chain_finding +from mcts.scoring.graph import bfs_path, build_paths class AttackChainAnalyzer(BaseAnalyzer): @@ -28,7 +29,7 @@ def analyze(self, server: MCPServerInfo) -> list[Finding]: exec_tools = [t for t in server.tools if _cap(t, "executes_commands")] if read_tools and exfil_tools: - path = _find_path(self.last_graph, read_tools[0].name, exfil_tools[0].name) + path = bfs_path(self.last_graph, read_tools[0].name, exfil_tools[0].name) findings.append( Finding( id="chain-read-exfil", @@ -81,7 +82,9 @@ def analyze(self, server: MCPServerInfo) -> list[Finding]: ) ) - return findings + paths = build_paths(self.last_graph, findings) + self.last_graph = {**self.last_graph, "paths": paths} + return [tag_attack_chain_finding(f) for f in findings] def _build_graph(self, server: MCPServerInfo) -> dict[str, Any]: nodes: dict[str, dict[str, str]] = {} @@ -135,9 +138,9 @@ def _can_chain(src: MCPTool, dst: MCPTool) -> bool: if not src.capability or not dst.capability: return False s, d = src.capability, dst.capability - return (s.reads_untrusted_input and (d.egresses_network or d.executes_commands)) or ( - s.accesses_sensitive_data and d.egresses_network - ) + return ( + s.reads_untrusted_input and (d.egresses_network or d.executes_commands or d.accesses_sensitive_data) + ) or (s.accesses_sensitive_data and d.egresses_network) def _edge_label(src: MCPTool, dst: MCPTool) -> str: @@ -148,23 +151,3 @@ def _edge_label(src: MCPTool, dst: MCPTool) -> str: if dst.capability and dst.capability.accesses_sensitive_data: return "→ cred" return "→ chain" - - -def _find_path(graph: dict[str, Any], start: str, end: str) -> list[str]: - adjacency: dict[str, list[str]] = {} - for edge in graph.get("edges", []): - adjacency.setdefault(edge["from"], []).append(edge["to"]) - - queue: deque[list[str]] = deque([[start]]) - visited = {start} - while queue: - path = queue.popleft() - node = path[-1] - if node == end: - return path - for neighbor in adjacency.get(node, []): - if neighbor in visited: - continue - visited.add(neighbor) - queue.append([*path, neighbor]) - return [start, end] diff --git a/src/mcts/analyzers/behavioral_static.py b/src/mcts/analyzers/behavioral_static.py index a067c9c..cd6b04a 100644 --- a/src/mcts/analyzers/behavioral_static.py +++ b/src/mcts/analyzers/behavioral_static.py @@ -18,6 +18,7 @@ from mcts.sast.rust.taint import analyze_rust_taint from mcts.sast.typescript.sinks import detect_typescript_sinks from mcts.sast.typescript.taint import analyze_typescript_taint +from mcts.scoring.evidence_tags import tag_behavioral_static_finding _BENIGN_CLAIMS = ( ( @@ -83,7 +84,7 @@ def analyze(self, server: MCPServerInfo) -> list[Finding]: if not snippet: continue findings.extend(self._analyze_tool(tool, snippet, server)) - return findings + return [tag_behavioral_static_finding(f) for f in findings] def _analyze_tool( self, diff --git a/src/mcts/analyzers/command_execution.py b/src/mcts/analyzers/command_execution.py index 79149e1..684f899 100644 --- a/src/mcts/analyzers/command_execution.py +++ b/src/mcts/analyzers/command_execution.py @@ -7,6 +7,7 @@ from mcts.analyzers.base import BaseAnalyzer from mcts.mcp.models import MCPServerInfo, MCPTool from mcts.reporting.models import Finding, Severity, SourceLocation +from mcts.scoring.evidence_tags import tag_command_execution_finding DANGEROUS_CALLS: dict[str, tuple[str, Severity]] = { "subprocess": ("subprocess invocation", Severity.CRITICAL), @@ -25,7 +26,7 @@ def analyze(self, server: MCPServerInfo) -> list[Finding]: findings: list[Finding] = [] for tool in server.tools: findings.extend(self._analyze_tool(tool, server.source_files)) - return findings + return [tag_command_execution_finding(f) for f in findings] def _analyze_tool(self, tool: MCPTool, source_files: dict[str, str]) -> list[Finding]: if not tool.source_file or tool.source_file not in source_files: diff --git a/src/mcts/analyzers/cross_server.py b/src/mcts/analyzers/cross_server.py index 0cfc51c..a0ede1c 100644 --- a/src/mcts/analyzers/cross_server.py +++ b/src/mcts/analyzers/cross_server.py @@ -8,6 +8,7 @@ from mcts.inventory.models import InventoryEntry from mcts.mcp.models import MCPServerInfo from mcts.reporting.models import Finding, Severity +from mcts.scoring.evidence_tags import tag_cross_server_finding def _similarity(a: str, b: str) -> float: @@ -87,4 +88,4 @@ def analyze_inventory(self, inventory: list[InventoryEntry]) -> list[Finding]: ) ) - return findings + return [tag_cross_server_finding(f) for f in findings] diff --git a/src/mcts/analyzers/data_leakage.py b/src/mcts/analyzers/data_leakage.py index 73bc44a..488d205 100644 --- a/src/mcts/analyzers/data_leakage.py +++ b/src/mcts/analyzers/data_leakage.py @@ -7,6 +7,7 @@ from mcts.analyzers.base import BaseAnalyzer from mcts.mcp.models import MCPServerInfo from mcts.reporting.models import Finding, Severity, SourceLocation +from mcts.scoring.evidence_tags import tag_data_leakage_finding SECRET_PATTERNS: list[tuple[str, re.Pattern[str], Severity]] = [ ("OpenAI API Key", re.compile(r"sk-[A-Za-z0-9]{20,}"), Severity.CRITICAL), @@ -68,7 +69,7 @@ def analyze(self, server: MCPServerInfo) -> list[Finding]: findings: list[Finding] = [] findings.extend(self._scan_metadata(server)) findings.extend(self._scan_source_files(server)) - return findings + return [tag_data_leakage_finding(f) for f in findings] def _scan_metadata(self, server: MCPServerInfo) -> list[Finding]: findings: list[Finding] = [] diff --git a/src/mcts/analyzers/jailbreak.py b/src/mcts/analyzers/jailbreak.py index 0fcdf7f..c7eaae7 100644 --- a/src/mcts/analyzers/jailbreak.py +++ b/src/mcts/analyzers/jailbreak.py @@ -8,6 +8,7 @@ from mcts.mcp.models import MCPServerInfo, MCPTool from mcts.probe.jailbreak import summarize_jailbreak_events from mcts.reporting.models import Finding, Severity +from mcts.scoring.evidence_tags import tag_jailbreak_finding class JailbreakAnalyzer(BaseAnalyzer): @@ -26,7 +27,7 @@ def analyze(self, server: MCPServerInfo) -> list[Finding]: elif score >= 5: severity = Severity.MEDIUM else: - return findings + return [tag_jailbreak_finding(f) for f in findings] findings.append( Finding( @@ -51,7 +52,7 @@ def analyze(self, server: MCPServerInfo) -> list[Finding]: }, ) ) - return findings + return [tag_jailbreak_finding(f) for f in findings] def _live_finding(self, summary: dict[str, Any]) -> Finding: accepted = int(summary["accepted_count"]) diff --git a/src/mcts/analyzers/path_validation.py b/src/mcts/analyzers/path_validation.py index 8bd2519..f5f5470 100644 --- a/src/mcts/analyzers/path_validation.py +++ b/src/mcts/analyzers/path_validation.py @@ -8,6 +8,7 @@ from mcts.analyzers.tool_classification import is_file_access_tool from mcts.mcp.models import MCPServerInfo from mcts.reporting.models import Finding, Severity, SourceLocation +from mcts.scoring.evidence_tags import tag_path_validation_finding CANONICALIZATION_HINTS = re.compile( r"\b(resolve|realpath|abspath|canonicalize|normpath|is_relative_to|startswith)\b", @@ -44,4 +45,4 @@ def analyze(self, server: MCPServerInfo) -> list[Finding]: evidence={"missing": "path_canonicalization"}, ) ) - return findings + return [tag_path_validation_finding(f) for f in findings] diff --git a/src/mcts/analyzers/permissions.py b/src/mcts/analyzers/permissions.py index cea151f..64c4ba7 100644 --- a/src/mcts/analyzers/permissions.py +++ b/src/mcts/analyzers/permissions.py @@ -7,6 +7,7 @@ from mcts.analyzers.base import BaseAnalyzer from mcts.mcp.models import MCPServerInfo, MCPTool from mcts.reporting.models import Finding, Severity +from mcts.scoring.evidence_tags import tag_permission_finding DESTRUCTIVE_PATTERNS = re.compile( r"\b(delete|drop|remove|destroy|wipe|purge|truncate|kill|shutdown)\b", @@ -27,7 +28,7 @@ def analyze(self, server: MCPServerInfo) -> list[Finding]: findings: list[Finding] = [] for tool in server.tools: findings.extend(self._analyze_tool(tool)) - return findings + return [tag_permission_finding(f) for f in findings] def _analyze_tool(self, tool: MCPTool) -> list[Finding]: findings: list[Finding] = [] diff --git a/src/mcts/analyzers/prompt_injection.py b/src/mcts/analyzers/prompt_injection.py index c83aea9..35925dc 100644 --- a/src/mcts/analyzers/prompt_injection.py +++ b/src/mcts/analyzers/prompt_injection.py @@ -21,6 +21,7 @@ ) from mcts.mcp.models import MCPServerInfo, MCPTool from mcts.reporting.models import Finding, Severity +from mcts.scoring.evidence_tags import tag_prompt_injection_finding INSTRUCTION_LIKE = re.compile( r"(?i)\b(ignore|disregard|forget|override|system prompt|you must|always|never reveal)\b" @@ -36,7 +37,7 @@ def analyze(self, server: MCPServerInfo) -> list[Finding]: findings: list[Finding] = [] for surface in scan_surfaces(server): findings.extend(self._analyze_surface(server, surface)) - return findings + return [tag_prompt_injection_finding(f) for f in findings] def _analyze_surface(self, server: MCPServerInfo, surface: ScanSurface) -> list[Finding]: findings: list[Finding] = [] diff --git a/src/mcts/analyzers/schema_surface.py b/src/mcts/analyzers/schema_surface.py index 0fafd68..f99dab3 100644 --- a/src/mcts/analyzers/schema_surface.py +++ b/src/mcts/analyzers/schema_surface.py @@ -12,6 +12,7 @@ ) from mcts.mcp.models import MCPServerInfo, MCPTool from mcts.reporting.models import Finding, Severity, SourceLocation +from mcts.scoring.evidence_tags import tag_schema_surface_finding CREDENTIAL_PARAM_NAMES = re.compile( r"(?i)^(password|secret|token|api_key|apikey|credential|auth|private_key)$" @@ -27,7 +28,7 @@ def analyze(self, server: MCPServerInfo) -> list[Finding]: findings: list[Finding] = [] for tool in server.tools: findings.extend(self._analyze_tool(tool)) - return findings + return [tag_schema_surface_finding(f) for f in findings] def _analyze_tool(self, tool: MCPTool) -> list[Finding]: findings: list[Finding] = [] diff --git a/src/mcts/analyzers/tool_abuse.py b/src/mcts/analyzers/tool_abuse.py index 642b7f7..1dc3342 100644 --- a/src/mcts/analyzers/tool_abuse.py +++ b/src/mcts/analyzers/tool_abuse.py @@ -7,6 +7,7 @@ from mcts.analyzers.tool_classification import is_file_access_tool from mcts.mcp.models import MCPServerInfo from mcts.reporting.models import Finding, Severity +from mcts.scoring.evidence_tags import tag_tool_abuse_finding class ToolAbuseAnalyzer(BaseAnalyzer): @@ -37,4 +38,4 @@ def analyze(self, server: MCPServerInfo) -> list[Finding]: }, ) ) - return findings + return [tag_tool_abuse_finding(f) for f in findings] diff --git a/src/mcts/api/app.py b/src/mcts/api/app.py index fab08d1..1061beb 100644 --- a/src/mcts/api/app.py +++ b/src/mcts/api/app.py @@ -3,7 +3,7 @@ from __future__ import annotations from pathlib import Path -from typing import Any +from typing import Any, Literal from fastapi import Depends, FastAPI, HTTPException, Request from pydantic import BaseModel, Field, field_validator @@ -52,6 +52,14 @@ class ScanRequest(BaseModel): runtime_events: list[dict[str, Any]] = Field(default_factory=list) fail_on_critical: bool = False min_score: int | None = Field(default=None, ge=0, le=100) + scoring_mode: Literal["legacy", "v2", "both"] = "both" + weights_profile: str = "manual_v1" + corpus_stats_path: str | None = None + min_security_score: int | None = Field(default=None, ge=0, le=100) + max_absolute_risk: int | None = Field(default=None, ge=0) + max_risk_level: Literal["low", "medium", "high", "critical"] | None = None + min_category_score_v2: dict[str, int] = Field(default_factory=dict) + assets_path: str | None = None understand_live_risk: bool = False fanout_offset: int = Field(default=0, ge=0) fanout_limit: int | None = Field(default=None, ge=1) @@ -65,6 +73,13 @@ def _limit_runtime_events(cls, value: list[dict[str, Any]]) -> list[dict[str, An return value +class ScanResponse(ScanReport): + """REST scan payload with echoed scoring mode and gate violations.""" + + scoring_mode: str = "both" + gate_violations: list[str] = Field(default_factory=list) + + class ToolScanRequest(ScanRequest): tool_name: str @@ -123,6 +138,14 @@ def _build_config(req: ScanRequest, *, request: Request | None = None) -> ScanCo runtime_events=req.runtime_events, fail_on_critical=req.fail_on_critical, min_score=req.min_score, + scoring_mode=req.scoring_mode, + weights_profile=req.weights_profile, + corpus_stats_path=Path(req.corpus_stats_path) if req.corpus_stats_path else None, + min_security_score=req.min_security_score, + max_absolute_risk=req.max_absolute_risk, + max_risk_level=req.max_risk_level, + min_category_score_v2=req.min_category_score_v2, + assets_path=Path(req.assets_path) if req.assets_path else None, oauth_client_id=req.oauth_client_id, oauth_client_secret=req.oauth_client_secret, oauth_token_url=req.oauth_token_url, @@ -149,7 +172,13 @@ def _scan_server( report: ScanReport = Scanner(config).analyze_server(server) except Exception as exc: raise HTTPException(status_code=400, detail=str(exc)) from exc - return report.model_dump() + from mcts.governance.scan_gates import evaluate_scan_gate_violations + + return ScanResponse( + **report.model_dump(), + scoring_mode=config.scoring_mode, + gate_violations=evaluate_scan_gate_violations(report, config), + ).model_dump() async def _discover_async(req: ScanRequest, *, request: Request) -> MCPServerInfo: diff --git a/src/mcts/brand/Logo 2.jpg b/src/mcts/brand/Logo 2.jpg new file mode 100644 index 0000000..3a32e15 Binary files /dev/null and b/src/mcts/brand/Logo 2.jpg differ diff --git a/src/mcts/brand/Logo 2.png b/src/mcts/brand/Logo 2.png new file mode 100644 index 0000000..35463bb Binary files /dev/null and b/src/mcts/brand/Logo 2.png differ diff --git a/src/mcts/brand/README.md b/src/mcts/brand/README.md index 3c6c5c6..497dcc7 100644 --- a/src/mcts/brand/README.md +++ b/src/mcts/brand/README.md @@ -4,8 +4,6 @@ Canonical logos for the Model Context Threat Scanner. | File | Use | |------|-----| -| `logo.png` | Full wordmark (1024×1024) — terminal header, large displays | -| `logo-report.png` | Hex icon only (256×256, transparent) — HTML dashboard sidebar | -| `logo.jpg` | Compressed full wordmark (716×716) — docs, README, presentations | +| `Logo 2.jpg` | Canonical logo — terminal header, HTML dashboard, docs | -Loaded in code via `mcts.brand.logo_data_uri()`. +Loaded in code via `mcts.brand.logo_data_uri()` and `mcts.brand.LOGO_PATH`. diff --git a/src/mcts/brand/__init__.py b/src/mcts/brand/__init__.py index b97a16b..9822ba2 100644 --- a/src/mcts/brand/__init__.py +++ b/src/mcts/brand/__init__.py @@ -6,17 +6,15 @@ from pathlib import Path BRAND_DIR = Path(__file__).resolve().parent -LOGO_PATH = BRAND_DIR / "logo.png" -LOGO_JPG_PATH = BRAND_DIR / "logo.jpg" -LOGO_REPORT_PATH = BRAND_DIR / "logo-report.png" # hex icon only for small HTML embeds +LOGO_PATH = BRAND_DIR / "Logo 2.jpg" +LOGO_JPG_PATH = LOGO_PATH def logo_data_uri(*, for_report: bool = True) -> str: """Return a data URI for embedding the logo in HTML. - Reports use ``logo-report.png`` (hex mark only) so the sidebar stays legible - at 44×44px. Terminals and large displays use the full ``logo.png``. + Uses ``Logo 2.jpg`` for terminal headers, HTML dashboard sidebar, and exports. """ - path = LOGO_REPORT_PATH if for_report and LOGO_REPORT_PATH.is_file() else LOGO_PATH - payload = base64.b64encode(path.read_bytes()).decode("ascii") - return f"data:image/png;base64,{payload}" + del for_report + payload = base64.b64encode(LOGO_PATH.read_bytes()).decode("ascii") + return f"data:image/jpeg;base64,{payload}" diff --git a/src/mcts/brand/logo-report 2.png b/src/mcts/brand/logo-report 2.png new file mode 100644 index 0000000..61c5cfa Binary files /dev/null and b/src/mcts/brand/logo-report 2.png differ diff --git a/src/mcts/cli/machine_wide.py b/src/mcts/cli/machine_wide.py index abe04f8..07645cd 100644 --- a/src/mcts/cli/machine_wide.py +++ b/src/mcts/cli/machine_wide.py @@ -31,9 +31,13 @@ def run_machine_wide_cli( for row in summary.results: label = f"[{row.entry.client}] {row.entry.server_name}" if row.report is not None: - console.print( - f" {label} — score {row.report.score.overall}/100, {len(row.report.findings)} finding(s)" - ) + line = f" {label} — score {row.report.score.overall}/100" + if row.report.score_v2 is not None: + line += f", absolute_risk {row.report.score_v2.absolute_risk}" + if row.report.score_v2.security_score is not None: + line += f", security_score {row.report.score_v2.security_score}/100" + line += f", {len(row.report.findings)} finding(s)" + console.print(line) elif row.error: console.print(f" {label} — [dim]skipped: {row.error}[/dim]") diff --git a/src/mcts/cli/main.py b/src/mcts/cli/main.py index 9275702..8c92d7c 100644 --- a/src/mcts/cli/main.py +++ b/src/mcts/cli/main.py @@ -20,7 +20,10 @@ resolve_report_input_path, ) from mcts.output.artifacts import persist_scan_artifacts -from mcts.report.data import category_gate_failures, parse_category_gates +from mcts.report.data import ( + parse_category_gates, + parse_min_category_score_v2, +) from mcts.reporting.sarif import write_sarif_report from mcts.ui.progress import print_scan_command, run_with_progress from mcts.ui.report_renderer import ReportRenderer @@ -186,25 +189,49 @@ def _print_min_score_gate_failure(report, min_score: int) -> None: f"[dim]Lowest bucket ({lowest_label}) is below the overall minimum; " "review findings in that area before changing MCP tool code.[/dim]" ) + if report.score_v2 is not None: + console.print( + f"[dim]v2 absolute_risk={report.score_v2.absolute_risk}, " + f"risk_level={report.score_v2.risk_level}[/dim]" + ) + + +_LEVEL_ORDER = {"low": 0, "medium": 1, "high": 2, "critical": 3} + + +def _any_v2_gate(config: ScanConfig) -> bool: + from mcts.governance.scan_gates import _any_v2_gate as gate_any_v2 + + return gate_any_v2(config) + + +def _level_exceeds(actual: str, maximum: str) -> bool: + return _LEVEL_ORDER.get(actual, 0) > _LEVEL_ORDER.get(maximum, 0) def _check_gates(report, config: ScanConfig) -> None: - if config.fail_on_critical and report.summary.critical > 0: - raise typer.Exit(code=1) + from mcts.governance.scan_gates import evaluate_scan_gate_violations + if config.min_score is not None and report.score.overall < config.min_score: _print_min_score_gate_failure(report, config.min_score) - raise typer.Exit(code=1) - if config.max_critical is not None and report.summary.critical > config.max_critical: - console.print( - f"[red]Critical findings ({report.summary.critical}) exceed maximum ({config.max_critical})[/red]" - ) - raise typer.Exit(code=1) - category_failures = category_gate_failures(report.findings, config.fail_on_category) + + violations = evaluate_scan_gate_violations(report, config) + if not violations: + return + + category_failures = [item for item in violations if "risk score" in item] + other_failures = [ + item for item in violations if item not in category_failures and not item.startswith("legacy overall") + ] if category_failures: console.print("[red]Category risk thresholds exceeded:[/red]") for failure in category_failures: console.print(f" [red]•[/red] {failure}") - raise typer.Exit(code=1) + if other_failures: + console.print("[red]CI gate failed:[/red]") + for failure in other_failures: + console.print(f"[red]{failure}[/red]") + raise typer.Exit(code=1) @app.callback() @@ -289,9 +316,9 @@ def scan( typer.Option( "--fail-on-category", help=( - "Exit 1 when category risk score meets or exceeds threshold (inclusive). " - "e.g. permissions:0 fails when score is 0 or more. " - "Use permissions:1 to allow zero-point categories. Repeatable." + "Exit 1 when legacy category risk score meets or exceeds threshold (inclusive). " + "Legacy v1 tiles only — not category_scores_v2. " + "e.g. permissions:0 fails when score is 0 or more. Repeatable." ), ), ] = None, @@ -568,6 +595,65 @@ def scan( help="When --surfaces is a subset, run only analyzers relevant to those surfaces", ), ] = True, + scoring: Annotated[ + str, + typer.Option( + "--scoring", + help="Scoring mode: legacy, v2, or both (default: both)", + case_sensitive=False, + ), + ] = "both", + no_attack_chains: Annotated[ + bool, + typer.Option( + "--no-attack-chains", + help="Disable chain multiplier (chain_factor=1.0); under v2/both the analyzer still runs", + ), + ] = False, + min_security_score: Annotated[ + int | None, + typer.Option( + "--min-security-score", + help="Exit 1 when v2 security_score is below this (requires --scoring v2 or both)", + ), + ] = None, + max_absolute_risk: Annotated[ + int | None, + typer.Option( + "--max-absolute-risk", + help="Exit 1 when v2 absolute_risk exceeds this (requires --scoring v2 or both)", + ), + ] = None, + max_risk_level: Annotated[ + str | None, + typer.Option( + "--max-risk-level", + help="Exit 1 when v2 risk_level exceeds threshold (low|medium|high|critical)", + case_sensitive=False, + ), + ] = None, + min_category_score_v2: Annotated[ + list[str] | None, + typer.Option( + "--min-category-score-v2", + help=( + "Exit 1 when v2 OWASP category health score is below minimum (100=good). " + "e.g. injection:80. Requires --scoring v2 or both." + ), + ), + ] = None, + weights_profile: Annotated[ + str, + typer.Option("--weights", help="Scoring weights profile (default: manual_v1)"), + ] = "manual_v1", + corpus_stats_path: Annotated[ + Path | None, + typer.Option("--corpus-stats-path", help="Override packaged v2 corpus statistics JSON"), + ] = None, + assets_path: Annotated[ + Path | None, + typer.Option("--assets-path", help="YAML asset value overrides for v2 scoring (.mcts/assets.yaml)"), + ] = None, ) -> None: """Run a full security scan against an MCP server.""" import json @@ -669,6 +755,12 @@ def scan( console.print(f"[red]Error:[/red] {exc}") raise typer.Exit(code=2) from exc + try: + category_gates_v2 = parse_min_category_score_v2(min_category_score_v2) + except ValueError as exc: + console.print(f"[red]Error:[/red] {exc}") + raise typer.Exit(code=2) from exc + output_format = format.lower() if output_format not in ("json", "sarif", "raw"): console.print(f"[red]Error:[/red] Unknown format {format!r}. Use json, sarif, or raw.") @@ -758,6 +850,15 @@ def scan( instruction_files=instruction_file or [], skills_dirs=skills_dir or [], surface_scoped_analyzers=surface_scoped, + scoring_mode=scoring.lower(), + enable_attack_chains=not no_attack_chains, + min_security_score=min_security_score, + max_absolute_risk=max_absolute_risk, + max_risk_level=max_risk_level.lower() if max_risk_level else None, + min_category_score_v2=category_gates_v2, + weights_profile=weights_profile, + corpus_stats_path=corpus_stats_path, + assets_path=assets_path, ) try: @@ -872,9 +973,9 @@ def _execute_scan(): raw_path = resolve_output_path(output, "scan-report.raw.json") _write_report(report, raw_path, "raw", target=str(display_target), remote_url=url) renderer.render_saved_notice(str(raw_path)) - renderer.render_saved_notice(str(json_path)) - renderer.render_saved_notice(str(html_path)) - renderer.render_saved_notice(str(sarif_path)) + renderer.render_saved_notice(str(json_path), report) + renderer.render_saved_notice(str(html_path), report) + renderer.render_saved_notice(str(sarif_path), report) console.print(f"[dim] mcts report {json_path}[/dim] [dim](or open {html_path})[/dim]") _print_discovery_warnings(report.server, stderr_file) @@ -888,6 +989,10 @@ def _execute_scan(): critical=report.summary.critical, high=report.summary.high, servers=[str(display_target)], + absolute_risk=report.score_v2.absolute_risk if report.score_v2 else None, + security_score=report.score_v2.security_score if report.score_v2 else None, + risk_level=report.score_v2.risk_level if report.score_v2 else None, + findings=report.findings, ) if violations: console.print("[red]Governance policy violations:[/red]") diff --git a/src/mcts/core/config.py b/src/mcts/core/config.py index 6afb223..ba34f9e 100644 --- a/src/mcts/core/config.py +++ b/src/mcts/core/config.py @@ -5,7 +5,7 @@ from pathlib import Path from typing import Any -from pydantic import BaseModel, Field +from pydantic import BaseModel, Field, field_validator DEFAULT_EXCLUDE_DIRS = ( ".git", @@ -123,3 +123,19 @@ class ScanConfig(BaseModel): instruction_files: list[Path] = Field(default_factory=list) skills_dirs: list[Path] = Field(default_factory=list) surface_scoped_analyzers: bool = True + scoring_mode: str = "both" + weights_profile: str = "manual_v1" + corpus_stats_path: Path | None = None + assets_path: Path | None = None + min_security_score: int | None = Field(default=None, ge=0, le=100) + max_absolute_risk: int | None = Field(default=None, ge=0) + max_risk_level: str | None = None + min_category_score_v2: dict[str, int] = Field(default_factory=dict) + + @field_validator("scoring_mode") + @classmethod + def _validate_scoring_mode(cls, value: str) -> str: + normalized = value.lower() + if normalized not in {"legacy", "v2", "both"}: + raise ValueError("scoring_mode must be legacy, v2, or both") + return normalized diff --git a/src/mcts/core/scanner.py b/src/mcts/core/scanner.py index d34b2f4..6f61d59 100644 --- a/src/mcts/core/scanner.py +++ b/src/mcts/core/scanner.py @@ -47,14 +47,18 @@ from mcts.mcp.models import MCPServerInfo, SurfaceScanOptions from mcts.probe.protocol_checks import probe_protocol_security from mcts.report.scan_meta import ( + append_chain_scan_notes, build_scan_notes, infer_scan_scope, is_config_static_scan, tool_discovery_notice_text, ) from mcts.reporting.models import Finding, ScanReport, ScanSummary +from mcts.scoring.context import build_scoring_context from mcts.scoring.engine import RiskScoringEngine +from mcts.scoring.engine_v2 import RiskScoringEngineV2 from mcts.scoring.partitions import score_partitioned +from mcts.scoring.pipeline_trace import record as _trace_pipeline from mcts.taxonomy.mapper import enrich_findings @@ -205,19 +209,44 @@ def analyze_server(self, server_info: MCPServerInfo) -> ScanReport: findings = enrich_findings(findings) findings.extend(self.compliance.check(findings, tools_discovered=len(server_info.tools))) analyzers_executed.append("compliance") - score = self.scoring.score(findings) - summary = ScanSummary.from_findings(findings) + raw_graph = self.attack_chains.last_graph if "attack_chains" in analyzers_executed else {} + _trace_pipeline("graph") + + scan_scope = infer_scan_scope(self.config) + from mcts.scoring.evidence_emit import enrich_scoring_evidence + + findings = enrich_scoring_evidence(findings, attack_graph=raw_graph, scan_scope=scan_scope) + _trace_pipeline("scope") + scan_notes = build_scan_notes(self.config) + + score = self.scoring.score(findings) + _trace_pipeline("v1") if not RiskScoringEngine.verify(findings, score): raise RuntimeError("Risk score does not match findings — scoring regression") - attack_graph = self.attack_chains.last_graph if self.config.enable_attack_chains else {} + score_v2 = None + report_attack_graph = raw_graph + if self.config.scoring_mode in {"v2", "both"}: + chain_factor_mode = "paths_v1" if self.config.enable_attack_chains else "disabled" + ctx = build_scoring_context( + findings=findings, + server=server_info, + attack_graph=raw_graph, + scan_scope=scan_scope, + config=self.config, + chain_factor_mode=chain_factor_mode, + ) + score_v2 = RiskScoringEngineV2().score(ctx, legacy_overall=score.overall) + if not RiskScoringEngineV2.verify(ctx, score_v2): + raise RuntimeError("Risk score v2 does not match context — scoring regression") + report_attack_graph = ctx.attack_graph + _trace_pipeline("v2") + + summary = ScanSummary.from_findings(findings) if self.config.save_baseline_path is not None: save_baseline(server_info, self.config.save_baseline_path, target=str(self.config.target)) - - scan_scope = infer_scan_scope(self.config) - scan_notes = build_scan_notes(self.config) if server_info.agent_skills or server_info.instruction_sources: scan_notes.append( "Instruction discovery: found " @@ -226,7 +255,7 @@ def analyze_server(self, server_info: MCPServerInfo) -> ScanReport: f"{len(server_info.instruction_sources)} system instruction file(s) in repository markdown." ) - return ScanReport( + report = ScanReport( version=__version__, target=str(self.config.target), scanned_at=datetime.now(UTC), @@ -234,13 +263,17 @@ def analyze_server(self, server_info: MCPServerInfo) -> ScanReport: findings=findings, summary=summary, score=score, - attack_graph=attack_graph, + score_v2=score_v2, + scoring_version=self.config.scoring_mode, + attack_graph=report_attack_graph, scan_scope=scan_scope, scan_notes=scan_notes, score_breakdown=score_partitioned(findings), tool_discovery_notice=tool_discovery_notice_text(server_info, scan_scope=scan_scope), analyzers_executed=analyzers_executed, ) + append_chain_scan_notes(report.scan_notes, report, self.config) + return report def _attach_surface_options(self, server_info: MCPServerInfo) -> MCPServerInfo: cfg = self.config @@ -266,6 +299,8 @@ def _is_enabled(self, analyzer: object) -> bool: if name == "JailbreakAnalyzer": return self.config.enable_jailbreak if name == "AttackChainAnalyzer": + if self.config.scoring_mode in {"v2", "both"}: + return True return self.config.enable_attack_chains if name == "MetadataDiffAnalyzer": return self.config.baseline_path is not None @@ -276,6 +311,8 @@ def _is_enabled(self, analyzer: object) -> bool: return True def _analyzer_allowed(self, analyzer: object) -> bool: + if self.config.scoring_mode in {"v2", "both"} and getattr(analyzer, "name", None) == "attack_chains": + return True if self.config.analyzers: name = getattr(analyzer, "name", type(analyzer).__name__) if name not in self.config.analyzers and type(analyzer).__name__ not in self.config.analyzers: diff --git a/src/mcts/discovery/static_meta.py b/src/mcts/discovery/static_meta.py index 6e994b7..b90fe42 100644 --- a/src/mcts/discovery/static_meta.py +++ b/src/mcts/discovery/static_meta.py @@ -9,6 +9,7 @@ from mcts.discovery.language_detect import RUST_MCP_INDICATORS, detect_repo_languages from mcts.mcp.models import MCPServerInfo from mcts.reporting.models import Finding, Severity +from mcts.scoring.evidence_tags import tag_static_discovery_finding def static_discovery_meta_findings(server: MCPServerInfo, config: ScanConfig) -> list[Finding]: @@ -28,50 +29,54 @@ def static_discovery_meta_findings(server: MCPServerInfo, config: ScanConfig) -> if rust_sources and ("rust" in langs or "rs" in langs): return [ - Finding( - id="static-discovery-rust-incomplete", - analyzer="static_discovery", - title="Rust MCP sources found but no tools discovered", - description=( - "The repository contains Rust MCP indicators but static discovery " - "returned zero tools. Handler analysis and behavioral SAST did not run." - ), - severity=Severity.HIGH, - recommendation=( - "Verify rmcp #[tool] registration patterns are supported, pass " - "--languages rust, or use --live --i-understand-live-risk for live discovery." - ), - technique_id="MCTS-T-1001", - confidence=0.9, - evidence={ - "languages": sorted(langs), - "detected_languages": sorted(detected), - "discovery_mode": server.discovery_mode, - }, + tag_static_discovery_finding( + Finding( + id="static-discovery-rust-incomplete", + analyzer="static_discovery", + title="Rust MCP sources found but no tools discovered", + description=( + "The repository contains Rust MCP indicators but static discovery " + "returned zero tools. Handler analysis and behavioral SAST did not run." + ), + severity=Severity.HIGH, + recommendation=( + "Verify rmcp #[tool] registration patterns are supported, pass " + "--languages rust, or use --live --i-understand-live-risk for live discovery." + ), + technique_id="MCTS-T-1001", + confidence=0.9, + evidence={ + "languages": sorted(langs), + "detected_languages": sorted(detected), + "discovery_mode": server.discovery_mode, + }, + ) ) ] if detected & langs: return [ - Finding( - id="static-discovery-incomplete", - analyzer="static_discovery", - title="Static MCP tool discovery returned zero tools", - description=( - "MCP source indicators were found for enabled languages but no tools " - "were discovered. Security analysis may be incomplete." - ), - severity=Severity.MEDIUM, - recommendation=( - "Use --live --i-understand-live-risk, export a tools/list snapshot, " - "or verify static discovery supports your SDK registration patterns." - ), - confidence=0.8, - evidence={ - "languages": sorted(langs), - "detected_languages": sorted(detected), - "discovery_mode": server.discovery_mode, - }, + tag_static_discovery_finding( + Finding( + id="static-discovery-incomplete", + analyzer="static_discovery", + title="Static MCP tool discovery returned zero tools", + description=( + "MCP source indicators were found for enabled languages but no tools " + "were discovered. Security analysis may be incomplete." + ), + severity=Severity.MEDIUM, + recommendation=( + "Use --live --i-understand-live-risk, export a tools/list snapshot, " + "or verify static discovery supports your SDK registration patterns." + ), + confidence=0.8, + evidence={ + "languages": sorted(langs), + "detected_languages": sorted(detected), + "discovery_mode": server.discovery_mode, + }, + ) ) ] return [] diff --git a/src/mcts/governance/policy.py b/src/mcts/governance/policy.py index 7eb8c76..788e60b 100644 --- a/src/mcts/governance/policy.py +++ b/src/mcts/governance/policy.py @@ -11,6 +11,10 @@ class GovernancePolicy(BaseModel): min_score: int | None = Field(default=None, ge=0, le=100) + min_security_score: int | None = Field(default=None, ge=0, le=100) + max_absolute_risk: int | None = Field(default=None, ge=0) + max_risk_level: str | None = Field(default=None) + min_category_score_v2: dict[str, int] = Field(default_factory=dict) max_critical: int | None = Field(default=None, ge=0) max_high: int | None = Field(default=None, ge=0) allowed_servers: list[str] = Field(default_factory=list) @@ -40,10 +44,44 @@ def evaluate_policy( critical: int, high: int, servers: list[str], + absolute_risk: int | None = None, + security_score: int | None = None, + risk_level: str | None = None, + findings: list | None = None, ) -> list[str]: + from mcts.report.data import category_scores_v2_gate_failures + + _LEVEL_ORDER = {"low": 0, "medium": 1, "high": 2, "critical": 3} violations: list[str] = [] if policy.min_score is not None and score < policy.min_score: - violations.append(f"score {score} below minimum {policy.min_score}") + violations.append(f"legacy score {score} below minimum {policy.min_score}") + if policy.min_security_score is not None: + if security_score is None: + violations.append( + f"min_security_score {policy.min_security_score} requires v2 scoring " + "(use --scoring v2 or both)" + ) + elif security_score < policy.min_security_score: + violations.append(f"security score {security_score} below minimum {policy.min_security_score}") + if policy.max_absolute_risk is not None: + if absolute_risk is None: + violations.append( + f"max_absolute_risk {policy.max_absolute_risk} requires v2 scoring (use --scoring v2 or both)" + ) + elif absolute_risk > policy.max_absolute_risk: + violations.append(f"absolute risk {absolute_risk} exceeds maximum {policy.max_absolute_risk}") + if policy.max_risk_level is not None: + if risk_level is None: + violations.append( + f"max_risk_level {policy.max_risk_level!r} requires v2 scoring (use --scoring v2 or both)" + ) + elif _LEVEL_ORDER.get(risk_level, 0) > _LEVEL_ORDER.get(policy.max_risk_level, 0): + violations.append(f"risk level {risk_level!r} exceeds maximum {policy.max_risk_level!r}") + if policy.min_category_score_v2: + if absolute_risk is None: + violations.append("min_category_score_v2 requires v2 scoring (use --scoring v2 or both)") + elif findings is not None: + violations.extend(category_scores_v2_gate_failures(findings, policy.min_category_score_v2)) if policy.max_critical is not None and critical > policy.max_critical: violations.append(f"critical findings {critical} exceed max {policy.max_critical}") if policy.max_high is not None and high > policy.max_high: diff --git a/src/mcts/governance/scan_gates.py b/src/mcts/governance/scan_gates.py new file mode 100644 index 0000000..f695dbf --- /dev/null +++ b/src/mcts/governance/scan_gates.py @@ -0,0 +1,72 @@ +"""Evaluate CI/policy scan gates without exiting the process.""" + +from __future__ import annotations + +from mcts.core.config import ScanConfig +from mcts.report.data import category_gate_failures, category_scores_v2_gate_failures +from mcts.reporting.models import ScanReport + +_LEVEL_ORDER = {"low": 0, "medium": 1, "high": 2, "critical": 3} + + +def _level_exceeds(actual: str, maximum: str) -> bool: + return _LEVEL_ORDER.get(actual, 0) > _LEVEL_ORDER.get(maximum, 0) + + +def _any_v2_gate(config: ScanConfig) -> bool: + return any( + value is not None + for value in ( + config.min_security_score, + config.max_absolute_risk, + config.max_risk_level, + ) + ) or bool(config.min_category_score_v2) + + +def evaluate_scan_gate_violations(report: ScanReport, config: ScanConfig) -> list[str]: + """Return human-readable gate violations for CLI, API, and GitHub Action consumers.""" + violations: list[str] = [] + + if config.fail_on_critical and report.summary.critical > 0: + violations.append(f"critical findings present ({report.summary.critical})") + + if config.min_score is not None and report.score.overall < config.min_score: + violations.append(f"legacy overall score {report.score.overall}/100 below minimum {config.min_score}") + + if _any_v2_gate(config): + if report.score_v2 is None: + violations.append("v2 gate requires scoring_mode v2 or both") + elif report.score_v2 is not None: + if config.min_security_score is not None: + if report.score_v2.security_score is None: + violations.append("min_security_score requires packaged corpus stats") + elif report.score_v2.security_score < config.min_security_score: + violations.append( + f"security_score {report.score_v2.security_score} " + f"below minimum {config.min_security_score}" + ) + if ( + config.max_absolute_risk is not None + and report.score_v2.absolute_risk > config.max_absolute_risk + ): + violations.append( + f"absolute_risk {report.score_v2.absolute_risk} " + f"exceeds maximum {config.max_absolute_risk}" + ) + if config.max_risk_level is not None and _level_exceeds( + report.score_v2.risk_level, config.max_risk_level + ): + violations.append( + f"risk_level {report.score_v2.risk_level} exceeds maximum {config.max_risk_level}" + ) + + if config.max_critical is not None and report.summary.critical > config.max_critical: + violations.append( + f"critical findings ({report.summary.critical}) exceed maximum ({config.max_critical})" + ) + + violations.extend(category_gate_failures(report.findings, config.fail_on_category)) + if config.min_category_score_v2 and report.score_v2 is not None: + violations.extend(category_scores_v2_gate_failures(report.findings, config.min_category_score_v2)) + return violations diff --git a/src/mcts/inventory/scan_all.py b/src/mcts/inventory/scan_all.py index 38b41cf..ef24744 100644 --- a/src/mcts/inventory/scan_all.py +++ b/src/mcts/inventory/scan_all.py @@ -27,14 +27,17 @@ def run_inventory_scan_all(base_config: ScanConfig) -> tuple[InventoryReport, li except Exception as exc: # noqa: BLE001 rows.append(_row(entry, error=str(exc))) continue - rows.append( - _row( - entry, - report=report, - score=report.score.overall, - findings=len(report.findings), - ) - ) + row_payload: dict = { + "score": report.score.overall, + "findings": len(report.findings), + "scoring_version": report.scoring_version, + "report": report.model_dump(mode="json"), + } + if report.score_v2 is not None: + row_payload["absolute_risk"] = report.score_v2.absolute_risk + row_payload["security_score"] = report.score_v2.security_score + row_payload["risk_level"] = report.score_v2.risk_level + rows.append(_row(entry, **row_payload)) return inventory, rows diff --git a/src/mcts/mcp_server/server.py b/src/mcts/mcp_server/server.py index f333d3a..702b646 100644 --- a/src/mcts/mcp_server/server.py +++ b/src/mcts/mcp_server/server.py @@ -71,13 +71,26 @@ def compare_baselines(baseline_report_json: str, current_report_json: str) -> st """Compare two scan reports and summarize score and finding deltas.""" baseline = _report_summary(json.loads(baseline_report_json)) current = _report_summary(json.loads(current_report_json)) - delta = { + delta: dict[str, Any] = { "baseline": baseline, "current": current, "score_delta": current["overall_score"] - baseline["overall_score"], "finding_delta": current["finding_count"] - baseline["finding_count"], "new_findings": _new_finding_ids(baseline, current), } + if baseline.get("absolute_risk") is not None and current.get("absolute_risk") is not None: + delta["absolute_risk_delta"] = current["absolute_risk"] - baseline["absolute_risk"] + if baseline.get("security_score") is not None and current.get("security_score") is not None: + delta["security_score_delta"] = current["security_score"] - baseline["security_score"] + if baseline.get("scoring_version") or current.get("scoring_version"): + delta["scoring_mode_note"] = ( + "Legacy overall_score and v2 absolute_risk use different scales — compare like with like." + ) + chain_delta = (current.get("critical") or 0) - (baseline.get("critical") or 0) + if chain_delta and delta.get("finding_delta", 0) != chain_delta: + delta["chain_meta_note"] = ( + "Finding deltas may include attack_chains meta-rows excluded from v2 absolute_risk." + ) return json.dumps(delta, indent=2) @@ -103,14 +116,24 @@ def create_server(): def _report_summary(payload: dict[str, Any]) -> dict[str, Any]: score = payload.get("score") or {} + score_v2 = payload.get("score_v2") or {} findings = payload.get("findings") or [] - return { + summary: dict[str, Any] = { "overall_score": int(score.get("overall") or 0), "finding_count": len(findings), "finding_ids": sorted(str(row.get("id")) for row in findings if row.get("id")), "critical": int((payload.get("summary") or {}).get("critical") or 0), "high": int((payload.get("summary") or {}).get("high") or 0), + "scoring_version": payload.get("scoring_version") or "legacy", } + if score_v2: + if score_v2.get("absolute_risk") is not None: + summary["absolute_risk"] = int(score_v2["absolute_risk"]) + if score_v2.get("security_score") is not None: + summary["security_score"] = int(score_v2["security_score"]) + if score_v2.get("risk_level"): + summary["risk_level"] = str(score_v2["risk_level"]) + return summary def _new_finding_ids(baseline: dict[str, Any], current: dict[str, Any]) -> list[str]: diff --git a/src/mcts/output/artifacts.py b/src/mcts/output/artifacts.py index 87868b8..0634754 100644 --- a/src/mcts/output/artifacts.py +++ b/src/mcts/output/artifacts.py @@ -25,8 +25,13 @@ def _report_with_scan_history(report: ScanReport) -> ScanReport: "date": scanned.strftime("%b %d"), "score": report.score.overall, "scanned_at": scanned.isoformat(), + "scoring_version": report.scoring_version, } ] + if report.score_v2 is not None: + points[0]["absolute_risk"] = report.score_v2.absolute_risk + if report.score_v2.security_score is not None: + points[0]["security_score"] = report.score_v2.security_score return report.model_copy(update={"scan_history": points}) diff --git a/src/mcts/output/history.py b/src/mcts/output/history.py index 76f6e29..12caa65 100644 --- a/src/mcts/output/history.py +++ b/src/mcts/output/history.py @@ -78,12 +78,19 @@ def record_scan_run(report: ScanReport, root: Path | None = None) -> None: store = _load_store(root) runs: list[dict[str, Any]] = store["runs"] key = normalize_target(report.target) - entry = { + entry: dict[str, Any] = { "scanned_at": report.scanned_at.astimezone(UTC).isoformat(), "target": key, + "scoring_version": report.scoring_version, "score": report.score.overall, "findings_total": report.summary.total, + "critical": report.summary.critical, + "high": report.summary.high, } + if report.score_v2 is not None: + entry["absolute_risk"] = report.score_v2.absolute_risk + entry["security_score"] = report.score_v2.security_score + entry["risk_level"] = report.score_v2.risk_level if runs and runs[-1].get("scanned_at") == entry["scanned_at"] and runs[-1].get("target") == key: runs[-1] = entry else: @@ -122,13 +129,25 @@ def trend_points_for_target(target: str, root: Path | None = None) -> list[dict[ scanned_at = datetime.fromisoformat(str(raw)) if scanned_at.tzinfo is None: scanned_at = scanned_at.replace(tzinfo=UTC) - points.append( - { - "date": _trend_label(scanned_at, day_counts), - "score": int(row.get("score", 0)), - "scanned_at": scanned_at.isoformat(), - } - ) + point: dict[str, Any] = { + "date": _trend_label(scanned_at, day_counts), + "score": int(row.get("score", 0)), + "scanned_at": scanned_at.isoformat(), + "scoring_version": row.get("scoring_version", "legacy"), + } + if "absolute_risk" in row: + point["absolute_risk"] = int(row["absolute_risk"]) + if row.get("security_score") is not None: + point["security_score"] = int(row["security_score"]) + if row.get("risk_level"): + point["risk_level"] = str(row["risk_level"]) + if "findings_total" in row: + point["findings_total"] = int(row["findings_total"]) + if "critical" in row: + point["critical"] = int(row["critical"]) + if "high" in row: + point["high"] = int(row["high"]) + points.append(point) return points diff --git a/src/mcts/pentest/models.py b/src/mcts/pentest/models.py index 08e93ff..a8df50c 100644 --- a/src/mcts/pentest/models.py +++ b/src/mcts/pentest/models.py @@ -12,13 +12,21 @@ class PentestPhase(BaseModel): details: dict = Field(default_factory=dict) +class PentestLimits(BaseModel): + tools_discovered: int = 0 + attack_chains_available: bool = True + coverage: str = "full" + + class PentestReport(BaseModel): target: str verdict: str score: int + absolute_risk: int | None = None phases: list[PentestPhase] = Field(default_factory=list) attack_paths: list[dict] = Field(default_factory=list) top_findings: list[dict] = Field(default_factory=list) fuzz_findings: list[dict] = Field(default_factory=list) recommendations: list[str] = Field(default_factory=list) static_report: dict = Field(default_factory=dict) + pentest_limits: PentestLimits = Field(default_factory=PentestLimits) diff --git a/src/mcts/pentest/runner.py b/src/mcts/pentest/runner.py index c9cf7f2..e0aeb90 100644 --- a/src/mcts/pentest/runner.py +++ b/src/mcts/pentest/runner.py @@ -5,7 +5,7 @@ from mcts.core.config import ScanConfig from mcts.core.scanner import Scanner from mcts.fuzz.runner import FuzzRunner -from mcts.pentest.models import PentestPhase, PentestReport +from mcts.pentest.models import PentestLimits, PentestPhase, PentestReport from mcts.reporting.models import Finding, ScanReport, Severity @@ -38,14 +38,26 @@ def run_pentest(config: ScanConfig, *, run_fuzz: bool = True) -> PentestReport: attack_graph = static_report.attack_graph or {} attack_paths = list(attack_graph.get("paths") or []) - phases.append( - PentestPhase( - name="attack_chains", - status="complete", - findings=len(attack_paths), - details={"nodes": len(attack_graph.get("nodes") or [])}, + has_tools = bool(static_report.server.tools) + if has_tools: + phases.append( + PentestPhase( + name="attack_chains", + status="complete", + findings=len(attack_paths), + details={"nodes": len(attack_graph.get("nodes") or [])}, + ) + ) + else: + phases.append( + PentestPhase( + name="attack_chains", + status="skipped", + details={ + "reason": "No MCP tools discovered — attack graph requires a tool surface", + }, + ) ) - ) fuzz_rows: list[Finding] = [] if run_fuzz and config.live and config.live_consent: @@ -73,16 +85,23 @@ def run_pentest(config: ScanConfig, *, run_fuzz: bool = True) -> PentestReport: recommendations = _recommendations(static_report, fuzz_rows, attack_paths) verdict = _verdict(static_report, fuzz_rows) + limits = PentestLimits( + tools_discovered=len(static_report.server.tools), + attack_chains_available=has_tools, + coverage="full" if has_tools else "static-only", + ) return PentestReport( target=str(config.target), verdict=verdict, score=static_report.score.overall, + absolute_risk=static_report.score_v2.absolute_risk if static_report.score_v2 else None, phases=phases, attack_paths=attack_paths[:20], top_findings=[row.model_dump(mode="json") for row in combined[:15]], fuzz_findings=[row.model_dump(mode="json") for row in fuzz_rows], recommendations=recommendations, static_report=static_report.model_dump(mode="json"), + pentest_limits=limits, ) @@ -98,6 +117,10 @@ def _rank_findings(static_report: ScanReport, fuzz_rows: list[Finding]) -> list[ def _verdict(static_report: ScanReport, fuzz_rows: list[Finding]) -> str: + if static_report.score_v2 is not None: + if any(f.severity == Severity.CRITICAL for f in fuzz_rows): + return "critical" + return static_report.score_v2.risk_level if static_report.summary.critical: return "critical" if static_report.summary.high or any(f.severity == Severity.HIGH for f in fuzz_rows): diff --git a/src/mcts/probe/discovery_meta.py b/src/mcts/probe/discovery_meta.py index e12636b..d765b7b 100644 --- a/src/mcts/probe/discovery_meta.py +++ b/src/mcts/probe/discovery_meta.py @@ -4,6 +4,7 @@ from mcts.mcp.models import MCPServerInfo from mcts.reporting.models import Finding, Severity +from mcts.scoring.evidence_tags import tag_live_discovery_finding def list_failure_warning(operation: str, exc: Exception, stderr_file: str | None) -> str: @@ -42,25 +43,27 @@ def discovery_meta_findings(server: MCPServerInfo) -> list[Finding]: ) return [ - Finding( - id="live-discovery-incomplete", - analyzer="live_discovery", - title="Live MCP discovery incomplete", - description=description, - severity=severity, - recommendation=( - "Investigate MCP server list_tools/list_prompts/list_resources handlers; " - "increase --timeout if needed. Capture server stderr with --stderr-file " - "for diagnostics. Use --strict-live in CI to fail the scan when discovery " - "is incomplete." - ), - evidence={ - "discovery_mode": server.discovery_mode, - "discovery_warnings": list(server.discovery_warnings), - "tool_count": len(server.tools), - "initialize_succeeded": server.initialize_succeeded, - }, - confidence=1.0, + tag_live_discovery_finding( + Finding( + id="live-discovery-incomplete", + analyzer="live_discovery", + title="Live MCP discovery incomplete", + description=description, + severity=severity, + recommendation=( + "Investigate MCP server list_tools/list_prompts/list_resources handlers; " + "increase --timeout if needed. Capture server stderr with --stderr-file " + "for diagnostics. Use --strict-live in CI to fail the scan when discovery " + "is incomplete." + ), + evidence={ + "discovery_mode": server.discovery_mode, + "discovery_warnings": list(server.discovery_warnings), + "tool_count": len(server.tools), + "initialize_succeeded": server.initialize_succeeded, + }, + confidence=1.0, + ) ) ] diff --git a/src/mcts/report/assets/dashboard.js b/src/mcts/report/assets/dashboard.js index d156baa..7f078ad 100644 --- a/src/mcts/report/assets/dashboard.js +++ b/src/mcts/report/assets/dashboard.js @@ -79,12 +79,300 @@ return `${value} / 100 pts`; } + const V2_DIMENSION_LABELS = { + exploitability: "Exploitability", + reachability: "Reachability", + exposure: "Exposure", + blast_radius: "Blast radius", + business_impact: "Business impact", + asset_value: "Asset value", + attack_preconditions: "Preconditions", + threat_maturity: "Threat maturity", + }; + + const V2_FACTOR_LABELS = { + exploitability: "easy to exploit", + reachability: "reachable by attackers", + exposure: "exposed to users", + blast_radius: "wide blast radius", + business_impact: "high business impact", + asset_value: "valuable asset", + attack_preconditions: "few preconditions", + threat_maturity: "known attack pattern", + chain_factor: "part of attack chain", + }; + + function applyScoringMode() { + const isV2 = Boolean(DATA.score_v2); + const legacyCard = document.getElementById("score-card"); + const v2Panel = document.getElementById("v2-score-section"); + const zoneRiskDetail = document.getElementById("zone-risk-detail"); + const legacyBreakdown = document.getElementById("legacy-breakdown-card"); + const scoreBreakdown = document.getElementById("score-breakdown-section"); + const legendV2 = document.getElementById("legend-v2-block"); + const legendScores = document.getElementById("legend-scores-block"); + const heroTitle = document.getElementById("hero-title"); + const trendTitle = document.getElementById("trend-card-title"); + const trendHint = document.getElementById("trend-card-hint"); + const riskGuideTitle = document.getElementById("risk-guide-title"); + const riskGuideHint = document.getElementById("risk-guide-hint"); + const trendIntro = document.getElementById("trend-zone-intro"); + + if (legacyCard) legacyCard.hidden = isV2; + if (v2Panel) v2Panel.hidden = !isV2; + if (zoneRiskDetail && !isV2) zoneRiskDetail.hidden = true; + if (legacyBreakdown) legacyBreakdown.hidden = isV2; + if (scoreBreakdown && isV2) scoreBreakdown.hidden = true; + if (legendV2) legendV2.hidden = !isV2; + if (legendScores && isV2) { + legendScores.querySelector("strong").textContent = "Benchmark score (0–100 points)"; + const p = legendScores.querySelector("p"); + if (p) { + p.textContent = + "How this server compares to others in the benchmark corpus. Higher = better — separate from absolute risk."; + } + } + if (heroTitle && isV2) { + const level = String(DATA.score_v2.risk_level || "low"); + heroTitle.textContent = + level === "critical" || level === "high" + ? "Action needed — elevated risk" + : "Review recommended"; + } + if (trendTitle && isV2) { + trendTitle.textContent = "Risk over time"; + if (trendHint) trendHint.textContent = "Absolute risk per scan — lower is better."; + if (trendIntro) { + trendIntro.textContent = "Compare absolute risk across scans and see which band you are in."; + } + } + if (riskGuideTitle && isV2) { + riskGuideTitle.textContent = "Absolute risk bands"; + if (riskGuideHint) riskGuideHint.textContent = "Higher numbers mean more overall danger."; + } + } + + function fillHero() { + const statsEl = document.getElementById("hero-stats"); + const eyebrow = document.getElementById("hero-eyebrow"); + if (!statsEl) return; + + const s = DATA.summary || {}; + const cs = DATA.checks_summary || {}; + const tools = DATA.meta?.tools_discovered || 0; + const v2 = DATA.score_v2; + const score = DATA.score?.overall ?? 0; + + if (eyebrow) { + const target = DATA.meta?.target; + eyebrow.textContent = target ? `Scanned ${target}` : "Scan complete"; + } + + const stats = []; + if (v2) { + stats.push({ + cls: "hero-stat--risk", + value: String(v2.absolute_risk), + label: `${String(v2.risk_level || "low").toUpperCase()} risk`, + }); + if (v2.security_score != null) { + stats.push({ + cls: "", + value: `${v2.security_score}/100`, + label: "Benchmark score", + }); + } + } else { + stats.push({ + cls: "hero-stat--risk", + value: `${score}/100`, + label: `${DATA.risk?.level || "risk"} rating`, + }); + } + stats.push({ + cls: "hero-stat--issues", + value: String(s.total || 0), + label: `issue${s.total === 1 ? "" : "s"} found`, + }); + if (cs.analyzers_run) { + stats.push({ + cls: "hero-stat--ok", + value: `${cs.analyzers_passed}/${cs.analyzers_run}`, + label: "checks passed", + }); + } + stats.push({ + cls: "", + value: String(tools), + label: `MCP tool${tools === 1 ? "" : "s"}`, + }); + + statsEl.innerHTML = stats + .map( + (row) => + `
${escapeHtml(row.value)}${escapeHtml(row.label)}
` + ) + .join(""); + } + + function fillScoreV2() { + const v2 = DATA.score_v2; + const section = document.getElementById("v2-score-section"); + if (!section || !v2) return; + section.hidden = false; + + const absEl = document.getElementById("v2-absolute-risk"); + const pill = document.getElementById("v2-risk-pill"); + const rangeEl = document.getElementById("v2-risk-range"); + const secEl = document.getElementById("v2-security-score"); + const confEl = document.getElementById("v2-confidence"); + const pctEl = document.getElementById("v2-percentile"); + const intro = document.getElementById("v2-metrics-intro"); + if (absEl) absEl.textContent = String(v2.absolute_risk); + if (pill) { + pill.textContent = `${String(v2.risk_level || "low").toUpperCase()} RISK`; + pill.className = `risk-pill ${v2.risk_level || "low"}`; + } + if (rangeEl && Array.isArray(v2.risk_range)) { + const rangeConf = v2.risk_range_confidence != null ? String(v2.risk_range_confidence) : "—"; + rangeEl.textContent = `Likely range ${v2.risk_range[0]}–${v2.risk_range[1]} (confidence ${rangeConf}%)`; + } + if (secEl) { + secEl.textContent = v2.security_score != null ? `${v2.security_score} / 100` : "—"; + } + if (confEl) { + confEl.textContent = v2.confidence_score != null ? `${v2.confidence_score}%` : "—"; + } + if (pctEl) { + pctEl.textContent = v2.risk_percentile != null ? `${v2.risk_percentile}th percentile` : "—"; + } + if (intro) { + intro.textContent = + "These are the findings and OWASP categories contributing most to your absolute risk score."; + } + + const contributors = v2.top_contributors || []; + const categories = DATA.category_scores_v2 || []; + fillV2Contributors(contributors); + fillV2Categories(categories); + initV2DimensionRadar(v2.dimension_scores || {}); + const zoneRiskDetail = document.getElementById("zone-risk-detail"); + if (zoneRiskDetail) { + zoneRiskDetail.hidden = !contributors.length && !categories.length; + } + applyScoringMode(); + } + + function fillV2Categories(categories) { + const list = document.getElementById("v2-category-list"); + const card = document.getElementById("v2-categories-card"); + if (!list || !card) return; + if (!categories.length) { + card.hidden = true; + return; + } + card.hidden = false; + list.innerHTML = categories + .map((c) => { + const pct = Math.max(0, Math.min(100, Number(c.score) || 0)); + const barColor = pct >= 80 ? COLORS.low : pct >= 50 ? COLORS.medium : COLORS.critical; + return `
  • +
    + ${escapeHtml(c.label)} + ${escapeHtml(c.display)} +
    +
    +
  • `; + }) + .join(""); + } + + function fillV2Contributors(contributors) { + const tbody = document.getElementById("v2-contributors-body"); + const card = document.getElementById("v2-contributors-card"); + if (!tbody || !card) return; + if (!contributors.length) { + card.hidden = true; + return; + } + card.hidden = false; + const findingById = Object.fromEntries((DATA.findings || []).map((f) => [f.id, f])); + tbody.innerHTML = contributors + .map((row) => { + const finding = row.finding_id ? findingById[row.finding_id] : null; + const title = finding ? finding.title : row.type === "attack_chain" ? "Attack path" : row.finding_id || "—"; + const tool = finding ? finding.tool : row.nodes ? row.nodes.join(" → ") : "—"; + const factors = row.factors + ? Object.entries(row.factors) + .filter(([, v]) => Number(v) > 0) + .map(([k, v]) => `${V2_FACTOR_LABELS[k] || k.replace(/_/g, " ")} (${v})`) + .join("; ") + : row.hop_count != null + ? `${row.hop_count}-step attack path` + : "—"; + return ` + ${escapeHtml(title)} + ${escapeHtml(tool || "—")} + ${escapeHtml(String(row.risk_contribution ?? "—"))} + ${escapeHtml(factors)} + `; + }) + .join(""); + } + + function initV2DimensionRadar(dimensions) { + const canvas = document.getElementById("v2-dimension-radar"); + if (!canvas || typeof Chart === "undefined") return; + const keys = Object.keys(V2_DIMENSION_LABELS).filter((k) => k in dimensions); + if (!keys.length) return; + const labels = keys.map((k) => V2_DIMENSION_LABELS[k]); + const values = keys.map((k) => Number(dimensions[k]) || 0); + new Chart(canvas, { + type: "radar", + data: { + labels, + datasets: [ + { + label: "Factor load", + data: values, + borderColor: COLORS.high, + backgroundColor: "rgba(249,115,22,0.15)", + borderWidth: 2, + pointRadius: 3, + }, + ], + }, + options: { + responsive: true, + maintainAspectRatio: false, + scales: { + r: { + beginAtZero: true, + max: 100, + ticks: { display: false, stepSize: 25 }, + grid: { color: COLORS.grid }, + angleLines: { color: COLORS.grid }, + pointLabels: { color: COLORS.text, font: { size: 10 } }, + }, + }, + plugins: { legend: { display: false } }, + }, + }); + } + function fillScoreBreakdown() { const section = document.getElementById("score-breakdown-section"); const row = document.getElementById("score-breakdown-row"); const b = DATA.score && DATA.score.breakdown; if (!section || !row || !b) return; section.hidden = false; + if (DATA.score_v2) { + const intro = section.querySelector(".metrics-section-intro"); + if (intro) { + intro.textContent += + " Partition scores use the legacy v1 formula and may shift when attack chains run."; + } + } const cards = [ ["MCP Surface", b.mcp_surface], ["Supply Chain", b.supply_chain], @@ -119,14 +407,26 @@ const el = document.getElementById(id); if (el) el.textContent = val; }); + const legacyCard = document.getElementById("score-card"); + if (DATA.score_v2 && legacyCard) { + legacyCard.hidden = true; + return; + } const pill = document.getElementById("risk-pill"); const gaugeScore = document.getElementById("gauge-score-value"); const gradeEl = document.getElementById("security-grade"); - const scoreText = String(DATA.score.overall); + const v2 = DATA.score_v2; + const useV2Primary = v2 && DATA.scoring_version === "v2"; + const scoreText = useV2Primary && v2.security_score != null + ? String(v2.security_score) + : String(DATA.score.overall); - if (pill) { + if (pill && !useV2Primary) { pill.textContent = DATA.risk.badge; pill.className = `risk-pill ${DATA.risk.level}`; + } else if (pill && useV2Primary) { + pill.textContent = `${String(v2.risk_level || "low").toUpperCase()} RISK`; + pill.className = `risk-pill ${v2.risk_level || "low"}`; } if (gaugeScore) gaugeScore.textContent = scoreText; @@ -136,7 +436,11 @@ gradeEl.className = `grade-badge grade-${(grade.letter || "f").toLowerCase()}`; } const briefEl = document.getElementById("score-brief"); - if (briefEl) briefEl.textContent = DATA.risk.brief || DATA.risk.description || "—"; + if (briefEl) { + briefEl.textContent = useV2Primary + ? `Absolute risk ${v2.absolute_risk} — see v2 section below` + : DATA.risk.brief || DATA.risk.description || "—"; + } const detailEl = document.getElementById("score-detail"); const basis = DATA.score?.basis; @@ -154,17 +458,34 @@ const s = DATA.summary || {}; const score = DATA.score?.overall ?? 0; const tools = DATA.meta?.tools_discovered || 0; + const cs = DATA.checks_summary || {}; const parts = [ s.critical ? `${s.critical} critical` : null, s.high ? `${s.high} high` : null, s.medium ? `${s.medium} medium` : null, s.low ? `${s.low} low` : null, ].filter(Boolean); - const breakdown = parts.length ? ` (${parts.join(" + ")})` : ""; - el.innerHTML = - `${s.total || 0} issue${s.total === 1 ? "" : "s"} (count) across ` + - `${tools} MCP tool${tools === 1 ? "" : "s"}${breakdown}. ` + - `Security score: ${score} / 100 points (rating, not a percentage).`; + const breakdown = parts.length ? ` — ${parts.join(", ")}` : ""; + let scoreLine; + if (DATA.score_v2) { + const v2 = DATA.score_v2; + scoreLine = + `MCTS found ${s.total || 0} security issue${s.total === 1 ? "" : "s"} across ` + + `${tools} tool${tools === 1 ? "" : "s"}${breakdown}. ` + + `Overall absolute risk is ${v2.absolute_risk} (${v2.risk_level}).`; + if (v2.security_score != null) { + scoreLine += ` Benchmark score: ${v2.security_score}/100.`; + } + } else { + scoreLine = + `MCTS found ${s.total || 0} issue${s.total === 1 ? "" : "s"} across ` + + `${tools} tool${tools === 1 ? "" : "s"}${breakdown}. ` + + `Security rating: ${score}/100 (higher is better, not a percentage).`; + } + if (cs.analyzers_run) { + scoreLine += ` ${cs.analyzers_passed} of ${cs.analyzers_run} checks passed.`; + } + el.innerHTML = scoreLine; } function fillIssuesSummary() { @@ -485,23 +806,44 @@ const total = s.total || 0; const tools = DATA.meta?.tools_discovered || 0; - let scoreLine = - score >= 80 - ? `Security rating: ${score}/100 points — strong posture with ${total} issue(s) to review (not a %).` - : score >= 50 - ? `Security rating: ${score}/100 points — moderate risk (not a %). Address High findings to improve.` - : `Security rating: ${score}/100 points — serious risk (not a %). Treat Critical and High findings as urgent.`; + let scoreLine; + const v2 = DATA.score_v2; + if (v2) { + const band = String(v2.risk_level || "low"); + scoreLine = + band === "low" || band === "medium" + ? `Absolute risk ${v2.absolute_risk} (${band}) — review findings and harden before production.` + : `Absolute risk ${v2.absolute_risk} (${band}) — treat Critical and High findings as urgent.`; + if (v2.security_score != null) { + scoreLine += ` Benchmark security score: ${v2.security_score}/100.`; + } + } else { + scoreLine = + score >= 80 + ? `Security rating: ${score}/100 points — strong posture with ${total} issue(s) to review (not a %).` + : score >= 50 + ? `Security rating: ${score}/100 points — moderate risk (not a %). Address High findings to improve.` + : `Security rating: ${score}/100 points — serious risk (not a %). Treat Critical and High findings as urgent.`; + } lead.textContent = `MCTS scanned ${tools} tool(s), ran ${cs.analyzers_run || "—"} checks, and counted ${total} issue(s). ${scoreLine}`; - steps.innerHTML = [ - "Start Here — score, what passed, and what needs attention (this page).", - "Issues to Fix — full list of findings with severity and remediation.", - "All Checks — every analyzer: which passed (green) vs which found problems.", - "How to Fix — prioritized action items (P1 = fix first).", - ] - .map((line) => `
  • ${line}
  • `) - .join(""); + const stepsList = DATA.score_v2 + ? [ + "Snapshot — absolute risk, issue counts, and which checks passed.", + "What to do next — urgent findings and recommended fixes on this page.", + "Issues to Fix — every finding with severity, location, and remediation.", + "All Checks — what each analyzer inspected and whether it passed.", + "How to Fix — prioritized steps (P1 = most urgent).", + ] + : [ + "Snapshot — security score, issue counts, and which checks passed.", + "What to do next — urgent findings and recommended fixes on this page.", + "Issues to Fix — every finding with severity, location, and remediation.", + "All Checks — what each analyzer inspected and whether it passed.", + "How to Fix — prioritized steps (P1 = most urgent).", + ]; + steps.innerHTML = stepsList.map((line) => `
  • ${line}
  • `).join(""); const jumps = [ ["findings", `${total} issue${total === 1 ? "" : "s"} to fix`, total > 0], @@ -551,10 +893,13 @@ const topFindings = [...(DATA.findings || [])] .sort((a, b) => (severityRank[a.severity] ?? 9) - (severityRank[b.severity] ?? 9)) .slice(0, 6); - const passed = (DATA.analyzers || []).filter((a) => a.status === "passed"); + const passed = (DATA.analyzers || []).filter((a) => a.status === "passed").slice(0, 6); - if (!topFindings.length && !passed.length) return; - split.hidden = false; + if (!topFindings.length && !passed.length) { + split.hidden = true; + } else { + split.hidden = false; + } topList.innerHTML = topFindings.length ? topFindings @@ -651,6 +996,7 @@ } function initGaugeChart() { + if (DATA.score_v2) return; const canvas = document.getElementById("gauge-chart"); if (!canvas || typeof Chart === "undefined") return; @@ -747,47 +1093,122 @@ }); } + function trendSeriesKey() { + return (DATA.trend_meta && DATA.trend_meta.series_key) || "score"; + } + + function trendValue(point) { + if (point.trend_value != null) return Number(point.trend_value); + const key = trendSeriesKey(); + if (key === "absolute_risk") return Number(point.absolute_risk) || 0; + if (key === "security_score") return Number(point.security_score) || 0; + return Number(point.score) || 0; + } + + function trendValueLabel(value) { + const key = trendSeriesKey(); + if (key === "absolute_risk") return `${value} risk`; + return `${value} / 100 pts`; + } + + function trendTableColumns(points) { + const hasV2Risk = points.some((point) => point.absolute_risk != null); + const hasRiskLevel = points.some((point) => point.risk_level); + const hasSecurityScore = points.some((point) => point.security_score != null); + const hasIssues = points.some((point) => point.findings_total != null); + const hasCritical = points.some((point) => point.critical != null); + const hasHigh = points.some((point) => point.high != null); + const hasLegacyScore = points.some( + (point) => point.scoring_version === "legacy" || (!hasV2Risk && point.score != null) + ); + const columns = [{ key: "date", label: "Date" }]; + if (hasV2Risk) columns.push({ key: "absolute_risk", label: "Absolute risk", num: true }); + if (hasRiskLevel) columns.push({ key: "risk_level", label: "Risk level" }); + if (hasSecurityScore) columns.push({ key: "security_score", label: "Security score", num: true }); + if (hasIssues) columns.push({ key: "findings_total", label: "Issues", num: true }); + if (hasCritical) columns.push({ key: "critical", label: "Critical", num: true }); + if (hasHigh) columns.push({ key: "high", label: "High", num: true }); + if (hasLegacyScore) columns.push({ key: "score", label: "Legacy score", num: true }); + return columns; + } + + function trendTableCell(point, column) { + if (column.key === "date") return escapeHtml(point.date || "—"); + if (column.key === "risk_level") { + const level = point.risk_level ? String(point.risk_level).toLowerCase() : ""; + if (!level) return "—"; + return `${escapeHtml(level)}`; + } + const value = point[column.key]; + if (value == null || value === "") return "—"; + if (column.key === "absolute_risk") return escapeHtml(String(value)); + if (column.key === "security_score" || column.key === "score") return escapeHtml(`${value} / 100`); + return escapeHtml(String(value)); + } + function renderTrendTable(points) { const wrap = document.getElementById("trend-table-wrap"); if (!wrap || !points.length) return; wrap.hidden = false; + const columns = trendTableColumns(points); + const header = columns + .map((column) => `${escapeHtml(column.label)}`) + .join(""); const rows = points - .map( - (point) => - `${escapeHtml(point.date)}${scorePtsHtml(point.score)}` - ) + .map((point) => { + const cells = columns + .map( + (column) => + `${trendTableCell(point, column)}` + ) + .join(""); + return `${cells}`; + }) .join(""); - wrap.innerHTML = `${rows}
    DateScore
    `; + wrap.innerHTML = `${header}${rows}
    `; } function trendYRange(values) { if (!values.length) return { min: 0, max: 100 }; const minVal = Math.min(...values); const maxVal = Math.max(...values); + const isLegacyScore = trendSeriesKey() === "score" || trendSeriesKey() === "security_score"; if (minVal === maxVal) { - if (minVal <= 5) return { min: 0, max: 25 }; - if (minVal >= 95) return { min: 75, max: 100 }; - const pad = Math.max(8, Math.round(minVal * 0.15)); + if (isLegacyScore) { + if (minVal <= 5) return { min: 0, max: 25 }; + if (minVal >= 95) return { min: 75, max: 100 }; + } + const pad = Math.max(8, Math.round(Math.max(minVal * 0.15, 10))); return { min: Math.max(0, minVal - pad), - max: Math.min(100, maxVal + pad), + max: isLegacyScore ? Math.min(100, maxVal + pad) : maxVal + pad, }; } const pad = Math.max(4, Math.round((maxVal - minVal) * 0.12)); return { min: Math.max(0, minVal - pad), - max: Math.min(100, maxVal + pad), + max: isLegacyScore ? Math.min(100, maxVal + pad) : maxVal + pad, }; } + function trendChartWidth(wrap) { + wrap.hidden = false; + wrap.setAttribute("aria-hidden", "false"); + let w = wrap.clientWidth; + if (w < 2 && wrap.parentElement) { + w = wrap.parentElement.clientWidth; + } + return Math.max(320, Math.round(w) || 640); + } + function renderTrendSparkline(points) { const wrap = document.getElementById("trend-chart-wrap"); if (!wrap || !points.length) return; - const values = points.map((p) => Number(p.score) || 0); + const values = points.map((p) => trendValue(p)); const { min: yMin, max: yMax } = trendYRange(values); - const width = 640; - const height = 220; + const width = trendChartWidth(wrap); + const height = 160; const pad = { top: 18, right: 20, bottom: 36, left: 44 }; const innerW = width - pad.left - pad.right; const innerH = height - pad.top - pad.bottom; @@ -818,7 +1239,7 @@ const dots = coords .map( (pt, index) => - `${escapeHtml(points[index].date)}: ${values[index]} / 100 pts` + `${escapeHtml(points[index].date)}: ${escapeHtml(trendValueLabel(values[index]))}` ) .join(""); const gridLines = [0, 0.5, 1] @@ -839,12 +1260,19 @@ : `${escapeHtml(points[0].date)}${escapeHtml(points[count - 1].date)}`; const flatLabel = allSame && count > 1 - ? `Score flat at ${values[0]} / 100 pts across ${count} scans` + ? `Flat at ${escapeHtml(trendValueLabel(values[0]))} across ${count} scans` : ""; - wrap.hidden = false; - wrap.setAttribute("aria-hidden", "false"); - wrap.innerHTML = `${gridLines}${areaPath ? `` : ""}${linePath ? `` : ""}${dots}${xLabels}${flatLabel}`; + wrap.innerHTML = `${gridLines}${areaPath ? `` : ""}${linePath ? `` : ""}${dots}${xLabels}${flatLabel}`; + } + + let trendResizeTimer = null; + function scheduleTrendSparklineResize() { + if (trendResizeTimer) window.clearTimeout(trendResizeTimer); + trendResizeTimer = window.setTimeout(() => { + const points = DATA.trend || []; + if (points.length) renderTrendSparkline(points); + }, 120); } function fillTrendNote() { @@ -858,14 +1286,22 @@ "1 scan recorded — run mcts scan again from the same project folder to compare over time."; return; } + if (meta.mixed_metrics) { + note.hidden = false; + note.textContent = + "History mixes legacy and v2 scoring — chart shows legacy security score only. Re-scan with a consistent --scoring mode for comparable trends."; + return; + } if (meta.score_unchanged && points.length > 1) { note.hidden = false; - note.textContent = `${meta.runs} scans recorded — score unchanged at ${meta.latest_score} / 100 pts.`; + const suffix = meta.series_label ? ` (${meta.series_label})` : ""; + note.textContent = `${meta.runs} scans recorded — value unchanged at ${trendValueLabel(meta.latest_score)}${suffix}.`; return; } if (meta.runs >= 2) { note.hidden = false; - note.textContent = `${meta.runs} scans recorded for this target.`; + const suffix = meta.series_label ? ` ${meta.series_label}` : ""; + note.textContent = `${meta.runs} scans recorded for this target.${suffix}`; return; } note.hidden = true; @@ -901,6 +1337,9 @@ fillTrendNote(); renderTrendSparkline(points); renderTrendTable(points); + window.requestAnimationFrame(() => { + window.requestAnimationFrame(() => renderTrendSparkline(points)); + }); if (empty) { empty.hidden = true; @@ -911,6 +1350,27 @@ function fillRiskGuide() { const container = document.getElementById("risk-guide"); if (!container) return; + if (DATA.score_v2) { + const bands = [ + ["low", "0 – 99", COLORS.low], + ["medium", "100 – 249", COLORS.medium], + ["high", "250 – 499", COLORS.high], + ["critical", "500+", COLORS.critical], + ]; + const active = String(DATA.score_v2.risk_level || "low").toLowerCase(); + container.innerHTML = bands + .map(([key, range, color]) => { + const isActive = key === active; + return `
    +

    ${escapeHtml(key.toUpperCase())}

    +
    Absolute risk ${escapeHtml(range)}
    +
    ${isActive ? "Current band" : ""}
    +

    v2 multi-factor sum — higher = worse.

    +
    `; + }) + .join(""); + return; + } const score = DATA.score.overall; const iconMap = { critical: "critical", @@ -1517,11 +1977,14 @@ function init() { fillBanners(); + fillHero(); + fillMetricsHeadline(); fillReportGuide(); fillNavBadges(); - fillMetricsHeadline(); fillIssuesSummary(); + applyScoringMode(); fillScoreBreakdown(); + fillScoreV2(); fillChecksSummary(); fillOverviewPanels(); fillScanMeta(); @@ -1547,6 +2010,7 @@ initGaugeChart(); initRadarChart(); initTrendChart(); + window.addEventListener("resize", scheduleTrendSparklineResize); } if (document.readyState === "loading") { diff --git a/src/mcts/report/assets/styles.css b/src/mcts/report/assets/styles.css index e02fec8..52da696 100644 --- a/src/mcts/report/assets/styles.css +++ b/src/mcts/report/assets/styles.css @@ -18,8 +18,8 @@ --radius: 16px; --shadow: 0 8px 32px rgba(0, 0, 0, 0.35); --grid-gap: 20px; - --section-gap: 24px; - --card-pad: 24px; + --section-gap: 18px; + --card-pad: 20px; --transition: 200ms ease; font-family: system-ui, -apple-system, BlinkMacSystemFont, "Segoe UI", sans-serif; } @@ -361,21 +361,273 @@ a { max-width: 72ch; } -.report-guide { - border-color: rgba(37, 99, 235, 0.35); - background: linear-gradient(135deg, rgba(37, 99, 235, 0.12), rgba(15, 23, 42, 0.6)); +/* Overview hero */ +.overview-hero { + display: flex; + flex-wrap: wrap; + align-items: flex-start; + justify-content: space-between; + gap: 20px; + padding: 24px 28px; + border-radius: var(--radius); + border: 1px solid rgba(59, 130, 246, 0.28); + background: linear-gradient(135deg, rgba(37, 99, 235, 0.14), rgba(11, 23, 48, 0.92)); + box-shadow: var(--shadow); } -.report-guide-title { +.hero-eyebrow { + margin: 0 0 6px; + font-size: 12px; + font-weight: 700; + letter-spacing: 0.08em; + text-transform: uppercase; + color: #93c5fd; +} + +.hero-title { margin: 0 0 10px; + font-size: 26px; + font-weight: 800; + letter-spacing: -0.03em; + line-height: 1.15; +} + +.hero-lead { + margin: 0; + color: var(--muted); + font-size: 15px; + line-height: 1.55; + max-width: 72ch; +} + +.hero-lead strong { + color: var(--text); +} + +.hero-stats { + display: flex; + flex-wrap: wrap; + gap: 10px; + align-items: stretch; + min-width: min(100%, 320px); +} + +.hero-stat { + display: flex; + flex-direction: column; + justify-content: center; + flex: 1 1 120px; + min-width: 110px; + padding: 10px 12px; + border-radius: 12px; + border: 1px solid var(--border); + background: rgba(0, 0, 0, 0.22); +} + +.hero-stat-value { + display: block; + font-size: 22px; + font-weight: 800; + line-height: 1.1; + letter-spacing: -0.02em; +} + +.hero-stat-label { + display: block; + margin-top: 4px; + font-size: 11px; + font-weight: 600; + color: var(--muted); + line-height: 1.35; +} + +.hero-stat--risk .hero-stat-value { + color: #fdba74; +} + +.hero-stat--issues .hero-stat-value { + color: #fca5a5; +} + +.hero-stat--ok .hero-stat-value { + color: #86efac; +} + +.quick-jump-bar { + display: flex; + flex-wrap: wrap; + gap: 10px; +} + +/* Equal-height side-by-side grids */ +.priority-grid, +.overview-split, +.v2-detail-grid, +.v2-risk-panel, +.exec-summary-grid, +.breakdown-row, +.breakdown-inner, +.metrics-primary-row, +.scores-legend-grid, +.checks-summary-row, +.score-breakdown-row, +.metrics-row, +.risk-guide { + align-items: stretch; +} + +/* Priority grid: issues + risk side by side */ +.priority-grid { + display: grid; + grid-template-columns: minmax(300px, 1fr) minmax(320px, 1.1fr); + gap: var(--grid-gap); +} + +.priority-col { + display: flex; + flex-direction: column; + gap: var(--grid-gap); + min-width: 0; + min-height: 100%; +} + +.priority-col > .card:not([hidden]), +.priority-col > .v2-risk-panel { + flex: 1; + min-height: 0; + height: 100%; +} + +.v2-risk-panel { + display: grid; + grid-template-columns: 1fr 1fr; + gap: var(--grid-gap); +} + +.v2-risk-panel .v2-score-card, +.v2-risk-panel .v2-dimension-card { + display: flex; + flex-direction: column; + min-height: 0; + height: 100%; +} + +.v2-dimension-card .v2-radar-box { + flex: 1; + min-height: 160px; +} + +/* Content zones */ +.zone { + display: flex; + flex-direction: column; + gap: 14px; +} + +.zone-header { + margin-bottom: 2px; +} + +.zone-heading { + margin: 0 0 6px; font-size: 18px; font-weight: 700; + letter-spacing: -0.02em; +} + +.zone-intro, +.zone-subintro { + margin: 0; + color: var(--muted); + font-size: 14px; + line-height: 1.5; + max-width: 80ch; +} + +.zone-subheading { + margin: 0 0 6px; + font-size: 14px; + font-weight: 600; +} + +.v2-detail-grid { + display: grid; + grid-template-columns: 1.2fr 0.8fr; + gap: var(--grid-gap); +} + +.v2-detail-grid > .card { + display: flex; + flex-direction: column; + height: 100%; + min-height: 0; + overflow: hidden; +} + +.v2-contributors-card .card-microcopy, +.v2-categories-card .card-microcopy { + flex-shrink: 0; +} + +/* Collapsible read guide */ +.read-guide { + border-color: rgba(37, 99, 235, 0.25); + background: rgba(15, 23, 42, 0.55); + padding: 0; + overflow: hidden; +} + +.read-guide:hover { + transform: none; +} + +.read-guide-summary { + display: flex; + align-items: center; + justify-content: space-between; + gap: 12px; + padding: 16px 20px; + cursor: pointer; + list-style: none; + font-weight: 600; +} + +.read-guide-summary::-webkit-details-marker { + display: none; +} + +.read-guide-title { + font-size: 15px; + font-weight: 700; +} + +.read-guide-toggle { + font-size: 12px; + font-weight: 600; + color: #93c5fd; +} + +.read-guide-toggle::before { + content: "Show guide"; +} + +.read-guide[open] .read-guide-toggle::before { + content: "Hide guide"; +} + +.read-guide[open] .read-guide-toggle { + color: var(--muted); +} + +.read-guide-body { + padding: 0 20px 20px; + border-top: 1px solid var(--border); } .report-guide-lead { - margin: 0 0 14px; + margin: 16px 0 14px; color: var(--text); - font-size: 15px; + font-size: 14px; line-height: 1.55; } @@ -393,12 +645,6 @@ a { color: var(--text); } -.quick-jump { - display: flex; - flex-wrap: wrap; - gap: 10px; -} - .quick-jump-btn { border: 1px solid rgba(59, 130, 246, 0.4); background: rgba(37, 99, 235, 0.15); @@ -438,6 +684,18 @@ a { gap: var(--grid-gap); } +.overview-panel { + display: flex; + flex-direction: column; + height: 100%; + min-height: 0; +} + +.overview-panel .panel-link { + margin-top: auto; + padding-top: 8px; +} + .overview-panel .panel-hint { margin: -4px 0 14px; color: var(--muted); @@ -454,15 +712,18 @@ a { .overview-list { list-style: none; - margin: 0 0 14px; + margin: 0 0 10px; padding: 0; + flex: 1; + min-height: 0; + overflow-y: auto; } .overview-list li { display: flex; align-items: flex-start; gap: 10px; - padding: 8px 0; + padding: 7px 0; border-bottom: 1px solid var(--border); } @@ -499,7 +760,10 @@ a { } .overview-list-summary { - display: block; + display: -webkit-box; + -webkit-box-orient: vertical; + -webkit-line-clamp: 1; + overflow: hidden; color: var(--muted); font-size: 12px; line-height: 1.45; @@ -671,10 +935,15 @@ a { gap: var(--grid-gap); } +.metrics-primary-row > * { + height: 100%; + min-height: 0; +} + .scores-legend { - border-color: rgba(59, 130, 246, 0.35); - background: rgba(15, 23, 42, 0.75); - padding: 16px 18px; + margin-top: 8px; + padding: 14px 0 0; + border-top: 1px solid var(--border); } .scores-legend-title { @@ -689,7 +958,14 @@ a { gap: 14px; } +.scores-legend-grid:has(#legend-v2-block:not([hidden])) { + grid-template-columns: repeat(3, 1fr); +} + .scores-legend-block { + display: flex; + flex-direction: column; + height: 100%; padding: 12px 14px; border-radius: 10px; border: 1px solid var(--border); @@ -698,6 +974,11 @@ a { color: var(--muted); } +.scores-legend-block p { + margin: 0; + flex: 1; +} + .scores-legend-block strong { display: block; color: var(--text); @@ -715,6 +996,11 @@ a { background: rgba(148, 163, 184, 0.06); } +.scores-legend-block--v2 { + border-color: rgba(249, 115, 22, 0.35); + background: rgba(249, 115, 22, 0.08); +} + .score-pts-suffix, .gauge-denom { font-size: 14px; @@ -766,10 +1052,20 @@ a { line-height: 1.45; } +.issues-summary-card { + display: flex; + flex-direction: column; + height: 100%; +} + .issues-summary-card .card-heading { margin-bottom: 4px; } +.issues-summary-card .tools-stat { + margin-top: auto; +} + .issues-summary-intro { margin: 0 0 14px; font-size: 13px; @@ -784,7 +1080,7 @@ a { } .issues-total { - font-size: 42px; + font-size: 36px; font-weight: 800; line-height: 1; color: var(--text); @@ -834,7 +1130,6 @@ a { } .tools-stat { - margin-top: 14px; padding-top: 12px; border-top: 1px solid var(--border); font-size: 12px; @@ -867,7 +1162,6 @@ a { display: grid; grid-template-columns: minmax(280px, 300px) repeat(5, minmax(0, 1fr)); gap: var(--grid-gap); - align-items: stretch; } .tool-discovery-banner, @@ -899,12 +1193,19 @@ a { } .score-breakdown-row .breakdown-score-card { - padding: 14px 16px; + display: flex; + flex-direction: column; + height: 100%; + padding: 12px 14px; border-radius: 10px; background: rgba(15, 23, 42, 0.85); border: 1px solid rgba(148, 163, 184, 0.2); } +.score-breakdown-row .breakdown-score-card .breakdown-not-pct { + margin-top: auto; +} + .score-breakdown-row .breakdown-score-card h4 { margin: 0 0 6px; font-size: 12px; @@ -926,12 +1227,19 @@ a { } .checks-card { - padding: 16px 18px; + display: flex; + flex-direction: column; + height: 100%; + padding: 14px 16px; border-radius: 12px; border: 1px solid rgba(148, 163, 184, 0.18); background: rgba(15, 23, 42, 0.85); } +.checks-card .checks-sublabel { + margin-top: auto; +} + .checks-card.passed { border-color: rgba(34, 197, 94, 0.35); background: rgba(34, 197, 94, 0.08); @@ -1201,15 +1509,90 @@ body.modal-open { background: #22c55e !important; } +.v2-meta-dl dt { + display: flex; + flex-direction: column; + gap: 2px; +} + +.term-label { + font-weight: 600; + color: var(--muted); +} + +.term-hint { + font-size: 11px; + font-weight: 400; + color: rgba(148, 163, 184, 0.85); + line-height: 1.35; +} + +.v2-score-card { + display: flex; + flex-direction: column; + text-align: center; + background: linear-gradient(180deg, rgba(249, 115, 22, 0.1), rgba(11, 23, 48, 1)); +} + +.v2-dimension-card { + display: flex; + flex-direction: column; +} + +.v2-score-card .v2-meta-dl { + margin-top: auto; +} + +.v2-absolute-risk { + font-size: 2.5rem; + font-weight: 700; + line-height: 1.1; + margin: 4px 0; +} + +.v2-meta-dl { + display: grid; + grid-template-columns: auto 1fr; + gap: 4px 12px; + margin: 10px 0 0; + text-align: left; + font-size: 12px; +} + +.v2-meta-dl dt { + color: var(--muted); +} + +.v2-meta-dl dd { + margin: 0; + font-weight: 600; +} + +.v2-radar-box { + min-height: 160px; + max-height: 180px; + display: flex; + align-items: center; + justify-content: center; +} + +.v2-radar-box canvas { + max-height: 100%; +} + .score-card { display: flex; flex-direction: column; align-items: center; text-align: center; - min-height: 240px; + height: 100%; background: linear-gradient(180deg, rgba(239, 68, 68, 0.08), rgba(11, 23, 48, 1)); } +#score-card[hidden] { + display: none !important; +} + .score-title-row { display: flex; align-items: center; @@ -1413,6 +1796,13 @@ body.modal-open { gap: var(--grid-gap); } +.exec-summary-grid > .exec-col { + display: flex; + flex-direction: column; + height: 100%; + min-height: 0; +} + .exec-col-title { margin: 0 0 12px; font-size: 12px; @@ -1477,8 +1867,7 @@ body.modal-open { position: relative; display: flex; flex-direction: column; - min-height: 228px; - padding: 20px 18px 16px; + padding: 16px 16px 14px; overflow: hidden; border-top: 4px solid transparent; } @@ -1600,22 +1989,97 @@ body.modal-open { gap: var(--grid-gap); } +.breakdown-row:has(#legacy-breakdown-card[hidden]) { + grid-template-columns: 1fr; +} + .breakdown-card { - min-height: 340px; + display: flex; + flex-direction: column; + height: 100%; +} + +.breakdown-card .chart-box.trend, +.breakdown-card .breakdown-inner { + flex: 1; + min-height: 0; +} + +.v2-contributors-table-wrap { + flex: 1; + min-height: 0; + max-height: 280px; + overflow-y: auto; + margin-top: 4px; + border-radius: 8px; + border: 1px solid rgba(255, 255, 255, 0.06); +} + +.v2-contributors-table-wrap .data-table thead th { + position: sticky; + top: 0; + z-index: 1; + background: var(--card); +} + +.v2-contributors-card .data-table td:nth-child(1), +.v2-contributors-card .data-table td:nth-child(4) { + max-width: 220px; +} + +.v2-contributors-card .data-table td:nth-child(4) { + display: -webkit-box; + -webkit-box-orient: vertical; + -webkit-line-clamp: 2; + overflow: hidden; + font-size: 12px; + color: var(--muted); + line-height: 1.4; +} + +.v2-contributors-card .data-table td:nth-child(3) { + white-space: nowrap; + font-variant-numeric: tabular-nums; +} + +.v2-categories-list-wrap { + flex: 1; + min-height: 0; + max-height: 280px; + margin-top: 4px; + overflow-y: auto; + border-radius: 8px; + border: 1px solid rgba(255, 255, 255, 0.06); +} + +.v2-categories-card .category-list { + margin: 0; + padding: 8px 10px; + justify-content: flex-start; } .breakdown-inner { display: grid; grid-template-columns: 1fr 1fr; gap: var(--grid-gap); - align-items: stretch; - min-height: 280px; + min-height: 200px; +} + +.breakdown-inner > * { + height: 100%; + min-height: 0; } .metrics-panel { display: flex; flex-direction: column; - justify-content: center; + min-height: 0; +} + +.breakdown-inner .metrics-panel .category-list { + flex: 1; + min-height: 0; + overflow-y: auto; } .category-list { @@ -1669,12 +2133,12 @@ body.modal-open { .radar-box { position: relative; - min-height: 280px; + min-height: 200px; height: 100%; } .radar-box canvas { - min-height: 280px !important; + min-height: 200px !important; } .chart-box.trend { @@ -1682,13 +2146,18 @@ body.modal-open { display: flex; flex-direction: column; gap: 12px; + flex: 1; + min-height: 0; + overflow: hidden; } .trend-chart-wrap { position: relative; + flex-shrink: 0; width: 100%; - height: 220px; - min-height: 220px; + max-width: 100%; + height: 160px; + min-height: 160px; border: 1px solid rgba(255, 255, 255, 0.08); border-radius: 12px; background: rgba(0, 0, 0, 0.22); @@ -1698,7 +2167,9 @@ body.modal-open { .trend-sparkline { display: block; width: 100%; + max-width: 100%; height: 100%; + vertical-align: top; } .trend-sparkline .trend-axis-label { @@ -1737,7 +2208,7 @@ body.modal-open { align-items: center; justify-content: center; height: 100%; - min-height: 260px; + min-height: 160px; text-align: center; padding: var(--card-pad); border: 1px dashed rgba(255, 255, 255, 0.08); @@ -1776,6 +2247,7 @@ body.modal-open { } .trend-note { + flex-shrink: 0; margin: 0 0 12px; padding: 10px 12px; font-size: 13px; @@ -1787,7 +2259,13 @@ body.modal-open { } .trend-table-wrap { + flex: 1; + min-height: 0; + max-height: 280px; margin-top: 0; + overflow-y: auto; + border-radius: 8px; + border: 1px solid rgba(255, 255, 255, 0.06); } .trend-table { @@ -1796,6 +2274,13 @@ body.modal-open { font-size: 13px; } +.trend-table thead th { + position: sticky; + top: 0; + z-index: 1; + background: var(--card); +} + .trend-table th, .trend-table td { padding: 8px 10px; @@ -1811,9 +2296,18 @@ body.modal-open { letter-spacing: 0.04em; } -.chart-box.trend:has(#trend-table-wrap:not([hidden])) { - height: auto; - min-height: 0; +.trend-table th.num, +.trend-table td.num { + text-align: right; + font-variant-numeric: tabular-nums; +} + +.trend-table .sev-badge { + text-transform: capitalize; +} + +.trend-card .chart-box.trend { + min-height: 200px; } /* Risk level guide — mini product cards */ @@ -1829,6 +2323,9 @@ body.modal-open { .guide-card { position: relative; + display: flex; + flex-direction: column; + height: 100%; padding: 20px 18px; border-radius: 14px; border: 1px solid var(--border); @@ -2354,10 +2851,22 @@ body.modal-open { } @media (max-width: 1280px) { - .metrics-primary-row { + .priority-grid, + .v2-risk-panel, + .v2-detail-grid { grid-template-columns: 1fr; } - .scores-legend-grid { + + .issues-summary-card, + .v2-risk-panel { + min-height: 0; + } + + .overview-panel { + min-height: 0; + } + .scores-legend-grid, + .scores-legend-grid:has(#legend-v2-block:not([hidden])) { grid-template-columns: 1fr; } .metrics-row { @@ -2395,10 +2904,22 @@ body.modal-open { .app { flex-direction: column; } - .metrics-primary-row { + .priority-grid, + .v2-risk-panel, + .v2-detail-grid { grid-template-columns: 1fr; } - .scores-legend-grid { + + .issues-summary-card, + .v2-risk-panel { + min-height: 0; + } + + .overview-panel { + min-height: 0; + } + .scores-legend-grid, + .scores-legend-grid:has(#legend-v2-block:not([hidden])) { grid-template-columns: 1fr; } .metrics-row { diff --git a/src/mcts/report/data.py b/src/mcts/report/data.py index ad127dd..75581e5 100644 --- a/src/mcts/report/data.py +++ b/src/mcts/report/data.py @@ -228,6 +228,29 @@ def _score_brief(score: int) -> str: return "Strong security posture maintained" +def risk_description_v2(risk_level: str, absolute_risk: int) -> str: + level = risk_level.lower() + if level == "critical": + return ( + f"Critical multi-factor risk (absolute risk {absolute_risk}). " + "Remediate tool-attributed findings on attack paths immediately." + ) + if level == "high": + return ( + f"High multi-factor risk (absolute risk {absolute_risk}). " + "Prioritize high-severity tool findings and chain-exposed tools." + ) + if level == "medium": + return ( + f"Moderate multi-factor risk (absolute risk {absolute_risk}). " + "Schedule hardening for elevated factor dimensions." + ) + return ( + f"Low multi-factor risk (absolute risk {absolute_risk}). " + "Maintain controls; re-scan after material changes." + ) + + def risk_description(score: int) -> str: if score <= 25: return "Your MCP server has critical security issues that require immediate attention." @@ -400,6 +423,133 @@ def parse_category_gates(raw_values: list[str] | None) -> dict[str, int]: return gates +CATEGORY_TAGS_V2: dict[str, frozenset[str]] = { + "injection": frozenset( + { + "prompt_injection", + "jailbreak", + "schema_surface", + "metadata_integrity", + "skill_md", + "sigma_metadata", + "surface_metadata", + } + ), + "exfiltration": frozenset({"data_leakage", "embedding_secrets"}), + "privilege": frozenset( + { + "permission_analyzer", + "command_execution", + "path_validation", + "tool_abuse", + } + ), + "supply_chain": frozenset( + { + "supply_chain", + "vulnerable_package", + "npm_audit", + "virustotal", + "semgrep_sast", + } + ), + "protocol": frozenset({"oauth_config", "runtime_events", "cloud_inspect"}), +} +CATEGORY_PRIORITY_V2 = ("injection", "exfiltration", "privilege", "supply_chain", "protocol") +CATEGORY_LABELS_V2: dict[str, str] = { + "injection": "Injection & Metadata", + "exfiltration": "Data Exfiltration", + "privilege": "Privilege & Execution", + "supply_chain": "Supply Chain", + "protocol": "Protocol & Runtime", +} +_CATEGORY_V2_PENALTY = { + Severity.CRITICAL: 35, + Severity.HIGH: 20, + Severity.MEDIUM: 10, + Severity.LOW: 5, +} + + +def assign_category_v2(analyzer: str) -> str | None: + """First-match category assignment for v2 OWASP tiles.""" + for cat in CATEGORY_PRIORITY_V2: + if analyzer in CATEGORY_TAGS_V2[cat]: + return cat + return None + + +def category_scores_v2_gate_keys() -> frozenset[str]: + return frozenset(CATEGORY_PRIORITY_V2) + + +def parse_min_category_score_v2(raw_values: list[str] | None) -> dict[str, int]: + """Parse `--min-category-score-v2 injection:80` style minimum health scores.""" + gates: dict[str, int] = {} + if not raw_values: + return gates + valid = category_scores_v2_gate_keys() + for raw in raw_values: + for part in raw.split(","): + part = part.strip() + if not part: + continue + if ":" not in part: + raise ValueError(f"Invalid --min-category-score-v2 value {part!r}. Use category:min_score.") + category, limit_text = part.split(":", 1) + category = category.strip() + if category not in valid: + valid_list = ", ".join(sorted(valid)) + raise ValueError(f"Unknown v2 category {category!r}. Valid categories: {valid_list}") + minimum = int(limit_text.strip()) + if not 0 <= minimum <= 100: + raise ValueError(f"v2 category minimum must be 0–100, got {minimum}") + gates[category] = minimum + return gates + + +def category_scores_v2_gate_failures(findings: list[Finding], gates: dict[str, int]) -> list[str]: + """Fail when OWASP v2 tile score falls below minimum (100 = good polarity).""" + if not gates: + return [] + by_key = {row["key"]: row for row in category_scores_v2(findings)} + failures: list[str] = [] + for category, minimum in gates.items(): + row = by_key.get(category) + if not row: + continue + if row["score"] < minimum: + failures.append( + f"{row['label']}: v2 category score {row['score']} below minimum {minimum} " + f"(100=good; {row['findings_count']} findings)" + ) + return failures + + +def category_scores_v2(findings: list[Finding]) -> list[dict[str, Any]]: + """OWASP category health scores — 100 = good (RFC §4.15 polarity).""" + from mcts.scoring.context import scorable_findings_v2 + + scorable = scorable_findings_v2(findings) + rows: list[dict[str, Any]] = [] + for key in CATEGORY_PRIORITY_V2: + matched = [f for f in scorable if assign_category_v2(f.analyzer) == key] + penalty = sum(_CATEGORY_V2_PENALTY.get(f.severity, 5) for f in matched) + score = max(0, 100 - min(100, penalty)) + passed = len(matched) == 0 + rows.append( + { + "key": key, + "label": CATEGORY_LABELS_V2[key], + "score": score, + "display": "100/100" if passed else f"{score}/100", + "findings_count": len(matched), + "passed": passed, + } + ) + return rows + + def category_gate_failures(findings: list[Finding], gates: dict[str, int]) -> list[str]: """Return human-readable failures when a category score meets/exceeds its gate.""" if not gates: @@ -763,72 +913,158 @@ def build_recommendations(findings: list[Finding]) -> list[dict[str, Any]]: def build_attack_graph(report: ScanReport) -> dict[str, Any]: - if report.attack_graph.get("edges") or report.attack_graph.get("nodes"): - return report.attack_graph + from mcts.scoring.graph import canonical_attack_graph - nodes: dict[str, dict[str, str]] = {} - edges: list[dict[str, str]] = [] + return canonical_attack_graph(report) - for tool in report.server.tools: - nodes[tool.name] = {"id": tool.name, "label": tool.name, "type": "tool"} - for finding in report.findings: - if finding.analyzer != "attack_chains": - continue - evidence = finding.evidence - read_tools = evidence.get("read_tools", []) - exfil_tools = evidence.get("exfil_tools", []) - cred_tools = evidence.get("credential_tools", []) - exec_tools = evidence.get("exec_tools", []) - - for name in read_tools + exfil_tools + cred_tools + exec_tools: - nodes[name] = {"id": name, "label": name, "type": "tool"} - - for src in read_tools: - for dst in exfil_tools: - edges.append({"from": src, "to": dst, "label": "exfil"}) - for src in cred_tools: - for dst in exfil_tools: - edges.append({"from": src, "to": dst, "label": "credential → exfil"}) - for src in read_tools: - for dst in cred_tools: - edges.append({"from": src, "to": dst, "label": "read → cred"}) - for src in read_tools: - for dst in exec_tools: - edges.append({"from": src, "to": dst, "label": "read → exec"}) +def _trend_series_key(points: list[dict[str, Any]]) -> str: + """Pick Y-axis metric — never mix legacy score with v2 absolute_risk.""" + if not points: + return "score" + versions = {str(row.get("scoring_version", "legacy")) for row in points} + if versions == {"legacy"}: + return "score" + if versions.isdisjoint({"legacy"}) and all("absolute_risk" in row for row in points): + return "absolute_risk" + if versions.isdisjoint({"legacy"}) and all(row.get("security_score") is not None for row in points): + return "security_score" + return "score" - return { - "nodes": list(nodes.values()), - "edges": edges, - } + +def _trend_value(row: dict[str, Any], series_key: str) -> int: + if series_key == "absolute_risk": + return int(row.get("absolute_risk", 0)) + if series_key == "security_score": + return int(row.get("security_score", 0)) + return int(row.get("score", 0)) def score_trend(report: ScanReport) -> list[dict[str, Any]]: if report.scan_history: - return list(report.scan_history) - from mcts.output.history import trend_points_for_target + points = list(report.scan_history) + else: + from mcts.output.history import trend_points_for_target - points = trend_points_for_target(report.target) + points = trend_points_for_target(report.target) if points: + series_key = _trend_series_key(points) + for row in points: + row["trend_value"] = _trend_value(row, series_key) return points label = report.scanned_at.strftime("%b %d") - return [{"date": label, "score": report.score.overall}] + row: dict[str, Any] = { + "date": label, + "score": report.score.overall, + "scoring_version": report.scoring_version, + "trend_value": report.score.overall, + "findings_total": report.summary.total, + "critical": report.summary.critical, + "high": report.summary.high, + } + if report.score_v2 is not None: + row["absolute_risk"] = report.score_v2.absolute_risk + if report.score_v2.security_score is not None: + row["security_score"] = report.score_v2.security_score + row["risk_level"] = report.score_v2.risk_level + series_key = _trend_series_key([row]) + row["trend_value"] = _trend_value(row, series_key) + return [row] def trend_meta(report: ScanReport, points: list[dict[str, Any]]) -> dict[str, Any]: - scores = [int(row.get("score", 0)) for row in points] - unique_scores = sorted(set(scores)) + series_key = _trend_series_key(points) + values = [_trend_value(row, series_key) for row in points] + unique_values = sorted(set(values)) + latest = ( + values[-1] + if values + else ( + report.score_v2.absolute_risk + if series_key == "absolute_risk" and report.score_v2 is not None + else report.score.overall + ) + ) + labels = { + "score": "Security score (legacy, 0–100 pts, higher=better)", + "absolute_risk": "Absolute risk (v2, higher=worse)", + "security_score": "Security score (v2 benchmark, 0–100, higher=better)", + } return { "runs": len(points), - "unique_scores": len(unique_scores), - "latest_score": scores[-1] if scores else report.score.overall, - "score_unchanged": len(unique_scores) <= 1 and len(points) > 1, + "unique_scores": len(unique_values), + "latest_score": latest, + "score_unchanged": len(unique_values) <= 1 and len(points) > 1, + "series_key": series_key, + "series_label": labels.get(series_key, labels["score"]), + "mixed_metrics": len({str(row.get("scoring_version", "legacy")) for row in points}) > 1 + if points + else False, + } + + +def _score_v2_payload(report: ScanReport) -> dict[str, Any] | None: + if report.score_v2 is None: + return None + score = report.score_v2 + return { + "absolute_risk": score.absolute_risk, + "risk_range": list(score.risk_range), + "risk_range_confidence": score.risk_range_confidence, + "risk_level": score.risk_level, + "security_score": score.security_score, + "risk_percentile": score.risk_percentile, + "confidence_score": score.confidence_score, + "legacy_overall": score.legacy_overall, + "dimension_scores": score.dimension_scores, + "top_contributors": [c.model_dump() for c in score.top_contributors[:10]], + "weights_profile": score.weights_profile, + "chain_factor_mode": score.chain_factor_mode, + "benchmark_corpus_version": score.benchmark_corpus_version, + "basis": score.basis.model_dump(), } +def _build_score_help(report: ScanReport) -> dict[str, Any]: + items = [ + "Security points from 0–100 (not a percentage of tests passed)", + "Critical, High, Medium, and Low findings (severity-weighted)", + "Attack chain detections", + "Exponential decay: more severe findings lower the score", + ] + if report.score_v2 is not None: + items.extend( + [ + "Absolute risk: multi-factor sum on tool-attributed findings (higher = worse)", + "Security score: benchmark percentile when corpus stats are available", + "Chain multiplier applies to tool findings on validated attack paths only", + ] + ) + title = "Score derived from:" + if report.score_v2 is not None: + title = "Scores derived from:" + return {"title": title, "items": items} + + +def _primary_risk_header(report: ScanReport) -> tuple[str, str, str]: + if report.score_v2 is not None: + level = report.score_v2.risk_level.upper() + badge = f"{level} RISK" + brief = ( + f"Absolute risk {report.score_v2.absolute_risk} " + f"(range {report.score_v2.risk_range[0]}–{report.score_v2.risk_range[1]})" + ) + return badge, level.lower(), brief + return ( + risk_rating(report.score.overall)[0], + risk_rating(report.score.overall)[1], + _score_brief(report.score.overall), + ) + + def build_dashboard_payload(report: ScanReport) -> dict[str, Any]: scanned_at: datetime = report.scanned_at - badge, level = risk_rating(report.score.overall) + badge, level, score_brief = _primary_risk_header(report) executed = list(report.analyzers_executed) or sorted({f.analyzer for f in report.findings}) analyzer_results = build_analyzer_results(report.findings, executed, report=report) categories = category_scores(report.findings) @@ -907,24 +1143,25 @@ def build_dashboard_payload(report: ScanReport) -> dict[str, Any]: "grade": security_grade(report.score.overall), "breakdown": breakdown_payload, }, + **({"score_v2": _score_v2_payload(report)} if report.score_v2 is not None else {}), + **( + {"category_scores_v2": category_scores_v2(report.findings)} if report.score_v2 is not None else {} + ), + "scoring_version": report.scoring_version, "summary": report.summary.model_dump(), "risk": { "badge": badge, "level": level, - "description": risk_description(report.score.overall), - "brief": _score_brief(report.score.overall), + "description": ( + risk_description_v2(report.score_v2.risk_level, report.score_v2.absolute_risk) + if report.score_v2 is not None + else risk_description(report.score.overall) + ), + "brief": score_brief, }, "executive_summary": executive, "checks_summary": checks_summary, - "score_help": { - "title": "Score derived from:", - "items": [ - "Security points from 0–100 (not a percentage of tests passed)", - "Critical, High, Medium, and Low findings (severity-weighted)", - "Attack chain detections", - "Exponential decay: more severe findings lower the score", - ], - }, + "score_help": _build_score_help(report), "categories": categories, "trend": trend_points, "trend_meta": trend_meta(report, trend_points), diff --git a/src/mcts/report/generators/html_report.py b/src/mcts/report/generators/html_report.py index 6948d83..1b98755 100644 --- a/src/mcts/report/generators/html_report.py +++ b/src/mcts/report/generators/html_report.py @@ -48,6 +48,7 @@ def write_html_report(report: ScanReport, output: Path) -> None: logo_src=logo_data_uri(for_report=True), icons_json=json.dumps(_load_icons()), app_version=report.version, + hide_legacy_score_card=report.score_v2 is not None, ) output.parent.mkdir(parents=True, exist_ok=True) output.write_text(html, encoding="utf-8") diff --git a/src/mcts/report/scan_meta.py b/src/mcts/report/scan_meta.py index 3ffa340..79cac6c 100644 --- a/src/mcts/report/scan_meta.py +++ b/src/mcts/report/scan_meta.py @@ -86,6 +86,21 @@ def tool_discovery_context(report: ScanReport, *, live: bool, snapshot: bool) -> } +def append_chain_scan_notes(scan_notes: list[str], report: ScanReport, config: ScanConfig) -> None: + if config.scoring_mode == "legacy": + return + if "attack_chains" in report.analyzers_executed: + if not config.enable_attack_chains: + scan_notes.append( + "Chain multiplier disabled (chain_factor=1.0); graph and meta-findings still shown." + ) + return + scan_notes.append( + "Attack chains analyzer did not run (--analyzers filter or --surfaces without tool) " + "— chain_factor=1.0." + ) + + def _rel_path(path: Path | None) -> str: if path is None: return "" diff --git a/src/mcts/report/templates/dashboard.html b/src/mcts/report/templates/dashboard.html index 6d21e42..82f9e8c 100644 --- a/src/mcts/report/templates/dashboard.html +++ b/src/mcts/report/templates/dashboard.html @@ -92,46 +92,55 @@

    Scan Information

    -
    -

    How to read this report

    -

    -
      -
      -
      +
      +
      +

      Scan complete

      +

      Your security snapshot

      +

      +
      +
      +
      -
      -

      Key results

      -

      + -
      -

      Scores vs counts — read this first

      -
      -
      - Security scores (0–100 points) -

      Used for Security Score, Area sub-scores, and the trend chart. Like a health rating: 100 = best, 0 = worst. These are not percentages and not “% of tests passed.”

      -
      -
      - Counts (plain numbers) -

      Issues found (e.g. 21), severity rows (5 critical), checks run (20), and tools (6) are totals — how many items MCTS counted, not points out of 100.

      +
      +
      +
      + View all issues → +

      Issues found

      +

      Security problems MCTS flagged — fix Critical and High first.

      +
      + 0 + total issues
      + + + + + + + + +
      SeverityCountWhat it means
      Total0
      +
      -
      -
      +
      +
      View sub-scores →
      -

      Security Score

      +

      Security Score

      -

      Security points · 0 = worst · 100 = best · Not a percentage

      +

      0–100 points · higher is better · not a percentage

      -
      0 / 100 pts
      +
      0 / 100

      Grade

      @@ -140,109 +149,187 @@

      Security Score

      -
      - View all issues → -

      Issues found

      -

      Each row is a separate security finding MCTS flagged.

      -
      - 0 - total issues (count) +
      +
      +

      Overall risk level

      +

      Multi-factor score — higher number means more danger

      +
      +
      +

      +
      +
      Benchmark scoreCompared to other MCP servers (0–100, higher is better)
      +
      +
      ConfidenceHow sure MCTS is about this risk estimate
      +
      +
      Risk percentileWhere you rank vs the benchmark corpus
      +
      +
      +
      +
      +

      What drives the risk?

      +

      Each spoke shows which risk factor weighs most on this scan. 100 = dominant factor.

      +
      + +
      - - - - - - - - -
      SeverityCountMeaning
      Total0
      -
      - - - - -