From 358cbb6e15a74221ca0f6ec9af825c4041e826a6 Mon Sep 17 00:00:00 2001 From: Ram Dwivedi Date: Sun, 14 Jun 2026 08:49:30 -0400 Subject: [PATCH 1/2] docs: correct stale analyzer status and dangling references The MCP, semantic, and taint-tracking analyzers are implemented, but DEVELOPMENT.md still described them as stubs. Update the package-layout and "Stub analyzers" sections to reflect actual status (only mcp_rug_pull remains a stub), fix an invalid `//` comment in the Python example, and replace dangling internal "SADD" references in graph.py with neutral roadmap notes. Also refresh two stale "stub" docstrings. Docs and comments only; no behavior change. Co-Authored-By: Claude Opus 4.8 (1M context) Signed-off-by: Ram Dwivedi --- docs/DEVELOPMENT.md | 13 +++++++------ src/skillspector/graph.py | 7 ++++--- src/skillspector/nodes/analyzers/__init__.py | 2 +- 3 files changed, 12 insertions(+), 10 deletions(-) diff --git a/docs/DEVELOPMENT.md b/docs/DEVELOPMENT.md index 0795f09..d80db17 100644 --- a/docs/DEVELOPMENT.md +++ b/docs/DEVELOPMENT.md @@ -151,16 +151,17 @@ There are no conditional edges: after `resolve_input` → `build_context`, all a | `report.py` | Report node | | **nodes/analyzers/** | | | `__init__.py` | Registry: `ANALYZER_NODE_IDS`, `ANALYZER_NODES` | -| `common.py` | Helpers (e.g. `make_dummy_finding`) for stub analyzers | +| `common.py` | Shared analyzer helpers (line/context extraction, AST name resolution) | | `static_runner.py` | Runs static patterns; converts `AnalyzerFinding` → `Finding` | | `pattern_defaults.py` | Shared pattern metadata (category, explanation, remediation) | | `static_yara.py` | YARA-based static analyzer | | `osv_client.py` | OSV.dev API client for live vulnerability lookups (SC4); batch queries with caching and fallback | | `static_patterns_*.py` | 11 pattern-based analyzers (prompt_injection, data_exfiltration, etc.) | | `behavioral_ast.py` | AST-based behavioral analyzer (AST1–AST8): detects exec, eval, subprocess, os.system, compile, dynamic import/getattr, and dangerous execution chains | -| `behavioral_taint_tracking.py` | Taint-tracking behavioral analyzer (stub) | -| `mcp_least_privilege.py`, `mcp_tool_poisoning.py`, `mcp_rug_pull.py` | MCP analyzer stubs | -| `semantic_security_discovery.py`, `semantic_developer_intent.py`, `semantic_quality_policy.py` | Semantic (LLM) analyzer stubs | +| `behavioral_taint_tracking.py` | Taint-tracking behavioral analyzer (TT1–TT5): source→sink data-flow analysis over Python AST | +| `mcp_least_privilege.py`, `mcp_tool_poisoning.py` | MCP analyzers (LP1–LP4 least-privilege; TP1–TP4 tool poisoning) | +| `mcp_rug_pull.py` | MCP rug-pull analyzer (stub; RP1–RP3 planned) | +| `semantic_security_discovery.py`, `semantic_developer_intent.py`, `semantic_quality_policy.py` | Semantic (LLM) analyzers; emit findings only when `use_llm` is enabled | --- @@ -195,7 +196,7 @@ The CLI passes `input_path` to the graph. The **resolve_input** node (using [inp from skillspector import graph result = graph.invoke({ - "input_path": "/path/to/skill", // or "skill_path" for local dir only + "input_path": "/path/to/skill", # or use "skill_path" for a local dir "output_format": "json", # optional: terminal, json, markdown, sarif (default sarif) "use_llm": True, # optional: False to skip LLM in meta_analyzer }) @@ -248,7 +249,7 @@ Use [pattern_defaults](../src/skillspector/nodes/analyzers/pattern_defaults.py) ### Stub analyzers -Return `{"findings": []}`. The behavioral, MCP, and semantic analyzer nodes are currently stubs (placeholders for future implementation). +Return `{"findings": []}`. Most analyzer nodes are implemented; `mcp_rug_pull` remains a stub (returns no findings) pending rug-pull detection (RP1–RP3). Use this pattern for any new placeholder analyzer. The LLM-backed semantic analyzers also return `{"findings": []}` when `use_llm` is False. --- diff --git a/src/skillspector/graph.py b/src/skillspector/graph.py index 277f963..e034ffe 100644 --- a/src/skillspector/graph.py +++ b/src/skillspector/graph.py @@ -13,10 +13,11 @@ # See the License for the specific language governing permissions and # limitations under the License. -"""LangGraph workflow for Skillspector stub analyzers.""" +"""LangGraph workflow wiring Skillspector's analyzer nodes.""" -# TODO(SADD A.2.1–A.2.4): Analyzer discovery, stage-as-category with meta last, wire registry; respect requires_api_key/is_available() and skip or warn when API key missing or analyzer unavailable. See SADD for skillspector § A.2. -# TODO(SADD A.5.1): Implement skillspector serve (FastAPI): POST /scan (zip), GET /results/{id}, GET /health. See SADD for skillspector § A.5.1. +# Roadmap: analyzer auto-discovery and stage-as-category ordering (meta_analyzer last); respect +# requires_api_key / is_available() so analyzers are skipped or warned when unavailable. +# Roadmap: optional `skillspector serve` HTTP API (POST /scan, GET /results/{id}, GET /health). from __future__ import annotations diff --git a/src/skillspector/nodes/analyzers/__init__.py b/src/skillspector/nodes/analyzers/__init__.py index 58b3e93..71fdac3 100644 --- a/src/skillspector/nodes/analyzers/__init__.py +++ b/src/skillspector/nodes/analyzers/__init__.py @@ -13,7 +13,7 @@ # See the License for the specific language governing permissions and # limitations under the License. -"""Analyzer node registry for Skillspector v2 stub workflow.""" +"""Analyzer node registry for the Skillspector workflow.""" from __future__ import annotations From 4eee3c02c177e682925c885f68886693832468f8 Mon Sep 17 00:00:00 2001 From: Ram Dwivedi Date: Sun, 14 Jun 2026 08:51:59 -0400 Subject: [PATCH 2/2] docs: document the integration contract and trust model Add an "Integrating SkillSpector" section (exit codes, JSON shape, severity/recommendation enums, and a recommended install-gate mapping) and a "Trust model and data egress" section (no skill execution; LLM sends file contents unless --no-llm; SC4 sends dependency names to OSV.dev by design). Cross-link from DEVELOPMENT.md. Documentation only; no behavior change. Co-Authored-By: Claude Opus 4.8 (1M context) Signed-off-by: Ram Dwivedi --- README.md | 64 +++++++++++++++++++++++++++++++++++++++++++++ docs/DEVELOPMENT.md | 2 +- 2 files changed, 65 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index cca0724..b278ea0 100644 --- a/README.md +++ b/README.md @@ -418,6 +418,61 @@ Options: --help Show this message and exit ``` +## Integrating SkillSpector + +SkillSpector is built to be driven by other tools (CI pipelines, install gates, editor integrations). Its exit code and JSON output are a stable contract. + +### Exit codes + +`skillspector scan` exits with: + +| Code | Meaning | +|------|---------| +| `0` | Scan completed, `risk_score` ≤ 50 (recommendation `SAFE` or `CAUTION`) | +| `1` | Scan completed, `risk_score` > 50 (recommendation `DO_NOT_INSTALL`) | +| `2` | Error (bad input, unreadable source, internal failure) | + +> The exit code collapses `SAFE` and `CAUTION` into `0`. To act differently on them (e.g. *warn* on `CAUTION` but *block* on `DO_NOT_INSTALL`), read the `recommendation` field from the JSON output rather than relying on the exit code. + +### Machine-readable output + +`--format json` produces a JSON report; with no `--output`/`-o` it is written to stdout: + +```bash +skillspector scan ./my-skill/ --format json +``` + +The top-level shape is (this example shows a full LLM-backed scan; with `--no-llm`, `metadata.llm_requested` is `false`): + +```json +{ + "skill": { "name": "...", "source": "...", "scanned_at": "" }, + "risk_assessment": { "score": 0, "severity": "LOW", "recommendation": "SAFE" }, + "components": [ { "path": "...", "type": "...", "lines": 0, "executable": false, "size_bytes": 0 } ], + "issues": [ { "id": "...", "category": "...", "severity": "...", "confidence": 0.0, "location": { "file": "...", "start_line": 0 } } ], + "metadata": { "has_executable_scripts": false, "skillspector_version": "...", "llm_requested": true, "llm_available": true } +} +``` + +- `risk_assessment.severity` ∈ `LOW | MEDIUM | HIGH | CRITICAL`. +- `risk_assessment.recommendation` ∈ `SAFE | CAUTION | DO_NOT_INSTALL`, mapped from severity: `LOW → SAFE`, `MEDIUM → CAUTION`, `HIGH`/`CRITICAL → DO_NOT_INSTALL`. +- `metadata.llm_error` appears only when LLM analysis was requested but unavailable. +- The full per-issue shape is defined by `Finding.to_dict()` in [models.py](src/skillspector/models.py); rely on the fields above and treat any additional fields as best-effort. + +For CI/IDE tooling, `--format sarif` emits SARIF 2.1.0. + +### Recommended gate mapping + +When using SkillSpector as an install gate, map the recommendation to an action: + +| `recommendation` | Suggested action | +|------------------|------------------| +| `SAFE` | allow | +| `CAUTION` | prompt / warn the user | +| `DO_NOT_INSTALL` | block | + +SkillSpector computes the score band and recommendation; how strict the gate is (e.g. whether `CAUTION` blocks in CI) is a policy decision for the integrating tool. + ## Development ### Setup @@ -476,6 +531,15 @@ SC4 uses the [OSV.dev](https://osv.dev) API to check dependencies against the fu The tool requires outbound HTTPS access to `api.osv.dev` for live vulnerability data. When that is not available, findings are limited to the static fallback list. +## Trust model and data egress + +SkillSpector is defense-in-depth, not a sandbox. Know what it does and does not do before relying on it: + +- **It never executes the scanned skill.** All analysis is static (regex, Python AST, YARA) plus optional LLM evaluation of file *contents* — the skill's code is never run. +- **LLM analysis sends file contents to the configured provider.** When LLM analysis is enabled (the default), file contents are sent to the active `SKILLSPECTOR_PROVIDER` endpoint. Use `--no-llm` to keep contents local (static analysis only). +- **SC4 sends dependency names to OSV.dev.** The supply-chain check queries [OSV.dev](https://osv.dev) with the package names and versions the skill declares, to look up known CVEs. This is fundamental to the check and runs even with `--no-llm`. It sends dependency coordinates (not file contents), requires no API key, and falls back to a bundled list when OSV.dev is unreachable. +- **It does not sandbox the host.** SkillSpector flags risky patterns *before* you install a skill; it does not contain or isolate a skill you choose to install anyway. + ## Limitations - **Non-English content**: May miss patterns in other languages diff --git a/docs/DEVELOPMENT.md b/docs/DEVELOPMENT.md index d80db17..f48da46 100644 --- a/docs/DEVELOPMENT.md +++ b/docs/DEVELOPMENT.md @@ -188,7 +188,7 @@ skillspector scan ./skill.zip --no-llm # static analysis only skillspector --version ``` -The CLI passes `input_path` to the graph. The **resolve_input** node (using [input_handler.py](../src/skillspector/input_handler.py)) resolves Git URL, file URL, .zip, single .md file, or directory to a local directory and sets `skill_path` (and `temp_dir_for_cleanup` when a temp dir was created). The CLI cleans up `temp_dir_for_cleanup` after invoke. Exit code 1 if risk_score > 50; exit code 2 on error. +The CLI passes `input_path` to the graph. The **resolve_input** node (using [input_handler.py](../src/skillspector/input_handler.py)) resolves Git URL, file URL, .zip, single .md file, or directory to a local directory and sets `skill_path` (and `temp_dir_for_cleanup` when a temp dir was created). The CLI cleans up `temp_dir_for_cleanup` after invoke. Exit code 1 if risk_score > 50; exit code 2 on error. See [Integrating SkillSpector](../README.md#integrating-skillspector) for the full exit-code and JSON contract. ### Programmatic