Skip to content

Cursor CVE scan support#3

Open
pabamato wants to merge 2 commits into
dsawardekar:trunkfrom
pabamato:feature/cursor-cve-scan
Open

Cursor CVE scan support#3
pabamato wants to merge 2 commits into
dsawardekar:trunkfrom
pabamato:feature/cursor-cve-scan

Conversation

@pabamato

Copy link
Copy Markdown

Summary

Adds JSON advisory files with semver-style vulnerable ranges (--advisories), alongside the existing IOC malware list (--malwares). Either or both may be passed. Output records finding_type (malware | advisory). Python 3.10+ stdlib only at runtime—no extra pip packages for scanning.

You can ask an AI coding agent (e.g. in Cursor) to draft advisory JSON from a GHSA or CVE/NVD URL using the short prompt below (replace the URL). Also includes examples/ templates, docs/CURSOR_AND_CVE_SCANNING.md, README updates, tests, and optional .cursor/rules (upstream can drop that folder if out of scope).


Motivation

IOC text files match exact bad versions; they do not express CVE-style ranges (e.g. “below 1.13.5”). JSON advisories close that gap for offline, filesystem-based checks. Examples and docs make it easier to add rules and to generate JSON from public GHSA/CVE pages.


Short prompt for an AI agent (CVE / GHSA → advisory JSON)

Run this in the crawler repo (so examples/ is visible). Replace PASTE_ADVISORY_URL_HERE once.

Create examples/advisories_<short>_<CVE-or-GHSA>.json for crawler.py --advisories. Match the JSON shape of examples/advisories_axios.example.json (top-level "packages", exact npm keys). vulnerable_ranges: each array element is one OR-branch; inside a string use only comma-separated >= / <= / > / < clauses (AND), never ^ or ~. Use only this page for version facts: PASTE_ADVISORY_URL_HERE. Reply with raw JSON only.

Changes (short)

Area Detail
--advisories JSON packages → list of { id, vulnerable_ranges, ... }. Ranges use >= / < / etc.; clauses AND within a string, OR across array entries.
--malwares Optional if --advisories is set.
Output finding_type, stderr labels for malware vs advisory, “Active rules” line.
Catalog By default, ignore catalog lines whose path is under node_modules/; --include-node-modules scans them too.
Tests test_crawler.py covers advisories, ranges, and catalog path filtering.
requirements.txt No runtime deps; optional pytest note.

Why it’s useful

  • One scanner: IOC pins + semver windows, same catalog workflow.
  • No pip friction: works on PEP 668 / locked-down Python.
  • Maintainable rules: version-controlled JSON, easy to extend from advisories.

Commands

python3 crawler.py scan --catalog packages.txt --malwares malwares.txt -o out.json
python3 crawler.py scan --catalog packages.txt --advisories examples/advisories_axios.example.json -o out.json
python3 crawler.py scan --catalog packages.txt --malwares malwares.txt --advisories rules.json -o out.json
python3 crawler.py scan --catalog packages.txt --advisories rules.json --include-node-modules

Caveats

  • Resolves node_modules by walking up from each catalog path; does not exhaustively recurse every nested node_modules tree.
  • Ranges are explicit >= / < style, not npm ^ / ~.
  • Complements, does not replace, npm audit.

Breaking / behavior notes

  • --malwares not required when --advisories is provided.
  • Default skips catalog paths under node_modules/ unless --include-node-modules.
  • JSON findings include finding_type (and advisory-specific fields).

Maintainer checklist

  • Keep or remove .cursor/rules/ for upstream.
  • pytest test_crawler.py in CI.

Suggested title

feat(scan): add --advisories JSON semver ranges and optional catalog path filtering

Copilot AI review requested due to automatic review settings April 13, 2026 17:35

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds offline CVE-style advisory scanning support to the existing npm malware IOC scanner, including semver-style vulnerable ranges, improved scan output labeling, and catalog noise reduction.

Changes:

  • Introduces --advisories JSON rule files with semver-style vulnerable ranges and emits finding_type (malware | advisory) in JSONL output.
  • Makes --malwares optional when advisories are provided; reports active rule sources and improves scan summary output.
  • Skips catalog entries under node_modules/**/package.json by default (opt-in via --include-node-modules), with new regression tests and example rule templates/docs.

Reviewed changes

Copilot reviewed 9 out of 10 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
crawler.py Implements advisory rule loading, stdlib semver-range matching, new finding type output, and optional node_modules catalog-path filtering.
test_crawler.py Adds tests for advisory JSON loading, range matching behavior, and node_modules path filtering behavior.
README.md Updates usage/docs to describe --advisories, finding_type, and --include-node-modules.
docs/CURSOR_AND_CVE_SCANNING.md Adds workflow documentation for authoring/merging advisory JSON and running scans at scale.
examples/advisories_axios.example.json Provides an advisory JSON template/example for axios.
examples/advisories_react_rsc_GHSA-479c-33wc-g2pg.example.json Provides an advisory JSON example for React RSC-related packages.
examples/advisories_nextjs_GHSA-q4gf-8mx6-v5v3.example.json Provides an advisory JSON example for Next.js.
examples/malwares_axios_IOC.example.txt Adds an IOC-format example file and clarifies IOC vs range limitations.
.gitignore Ignores local virtualenv and macOS metadata files.
.cursor/rules/repo-reference.mdc Adds Cursor agent guidance for repo layout and the new advisory workflow (optional tooling support).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

| `examples/*.example.json` | Advisory JSON you can copy or hand to an AI as a template |
| `examples/malwares_axios_IOC.example.txt` | IOC list format (exact `@version` pins) |
| `test_crawler.py` | Regression tests |
| `requirements.txt` | No runtime deps; optional pytest note |

Copilot AI Apr 13, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

requirements.txt is listed here as a related file, but the repo doesn’t include it. Add it (for optional dev/test deps) or remove/update this row to keep the repo inventory accurate.

Suggested change
| `requirements.txt` | No runtime deps; optional pytest note |

Copilot uses AI. Check for mistakes.
Comment thread crawler.py
Comment on lines 405 to +411
for pkg_name, malicious_versions in self.malware_packages.items():
if found_version := self._check_package(node_modules_path, pkg_name, malicious_versions):
yield PackageMatch(project_package_json, node_modules_path, pkg_name, found_version)

for pkg_name, rules in self.advisory_by_package.items():
found_version = self._get_installed_version(node_modules_path, pkg_name)
if not found_version:

Copilot AI Apr 13, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the same package name appears in both malware_packages and advisory_by_package, the scanner will read and parse node_modules/<pkg>/package.json twice (once in _check_package, again in _get_installed_version). Consider refactoring to read the installed version once per package (or cache it) and reuse it for both IOC and advisory checks to avoid unnecessary IO on large scans.

Copilot uses AI. Check for mistakes.
- **Exact npm package name(s)** as published on npm (e.g. `next`, `react-server-dom-webpack`, `axios`).
- **Affected version ranges** translated into this tool’s syntax: one or more entries in `vulnerable_ranges`, each entry a string of **AND** clauses, **OR** across array elements.

**Ready-to-use prompt:** copy the short block in **`docs/PULL_REQUEST.md`** (*Short prompt for an AI agent*). Replace `PASTE_ADVISORY_URL_HERE` with your GHSA or NVD/CVE link.

Copilot AI Apr 13, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The doc references docs/PULL_REQUEST.md for the ready-to-use prompt, but that file doesn’t exist in this repo. Please either add the referenced document or update this line to point to the correct location (e.g., README or this document) so readers can actually find the prompt.

Suggested change
**Ready-to-use prompt:** copy the short block in **`docs/PULL_REQUEST.md`** (*Short prompt for an AI agent*). Replace `PASTE_ADVISORY_URL_HERE` with your GHSA or NVD/CVE link.
**Ready-to-use prompt:** use the prompt guidance in **this document** under **Using Cursor (or another AI) to create advisory JSON**. Replace `PASTE_ADVISORY_URL_HERE` with your GHSA or NVD/CVE link.

Copilot uses AI. Check for mistakes.
Comment thread README.md
## Installation

No additional dependencies required beyond Python 3.10+ standard library.
No additional runtime dependencies beyond Python 3.10+ standard library. Optional: `pytest` for tests (see `requirements.txt`).

Copilot AI Apr 13, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line points users to requirements.txt, but there is no requirements.txt in the repository. Either add the file (even if it only documents optional pytest) or remove/adjust the reference to avoid a broken pointer in the installation instructions.

Suggested change
No additional runtime dependencies beyond Python 3.10+ standard library. Optional: `pytest` for tests (see `requirements.txt`).
No additional runtime dependencies beyond Python 3.10+ standard library. Optional: install `pytest` if you want to run tests.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants