Cursor CVE scan support#3
Conversation
There was a problem hiding this comment.
Pull request overview
Adds offline CVE-style advisory scanning support to the existing npm malware IOC scanner, including semver-style vulnerable ranges, improved scan output labeling, and catalog noise reduction.
Changes:
- Introduces
--advisoriesJSON rule files with semver-style vulnerable ranges and emitsfinding_type(malware|advisory) in JSONL output. - Makes
--malwaresoptional when advisories are provided; reports active rule sources and improves scan summary output. - Skips catalog entries under
node_modules/**/package.jsonby default (opt-in via--include-node-modules), with new regression tests and example rule templates/docs.
Reviewed changes
Copilot reviewed 9 out of 10 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
crawler.py |
Implements advisory rule loading, stdlib semver-range matching, new finding type output, and optional node_modules catalog-path filtering. |
test_crawler.py |
Adds tests for advisory JSON loading, range matching behavior, and node_modules path filtering behavior. |
README.md |
Updates usage/docs to describe --advisories, finding_type, and --include-node-modules. |
docs/CURSOR_AND_CVE_SCANNING.md |
Adds workflow documentation for authoring/merging advisory JSON and running scans at scale. |
examples/advisories_axios.example.json |
Provides an advisory JSON template/example for axios. |
examples/advisories_react_rsc_GHSA-479c-33wc-g2pg.example.json |
Provides an advisory JSON example for React RSC-related packages. |
examples/advisories_nextjs_GHSA-q4gf-8mx6-v5v3.example.json |
Provides an advisory JSON example for Next.js. |
examples/malwares_axios_IOC.example.txt |
Adds an IOC-format example file and clarifies IOC vs range limitations. |
.gitignore |
Ignores local virtualenv and macOS metadata files. |
.cursor/rules/repo-reference.mdc |
Adds Cursor agent guidance for repo layout and the new advisory workflow (optional tooling support). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| | `examples/*.example.json` | Advisory JSON you can copy or hand to an AI as a template | | ||
| | `examples/malwares_axios_IOC.example.txt` | IOC list format (exact `@version` pins) | | ||
| | `test_crawler.py` | Regression tests | | ||
| | `requirements.txt` | No runtime deps; optional pytest note | |
There was a problem hiding this comment.
requirements.txt is listed here as a related file, but the repo doesn’t include it. Add it (for optional dev/test deps) or remove/update this row to keep the repo inventory accurate.
| | `requirements.txt` | No runtime deps; optional pytest note | |
| for pkg_name, malicious_versions in self.malware_packages.items(): | ||
| if found_version := self._check_package(node_modules_path, pkg_name, malicious_versions): | ||
| yield PackageMatch(project_package_json, node_modules_path, pkg_name, found_version) | ||
|
|
||
| for pkg_name, rules in self.advisory_by_package.items(): | ||
| found_version = self._get_installed_version(node_modules_path, pkg_name) | ||
| if not found_version: |
There was a problem hiding this comment.
If the same package name appears in both malware_packages and advisory_by_package, the scanner will read and parse node_modules/<pkg>/package.json twice (once in _check_package, again in _get_installed_version). Consider refactoring to read the installed version once per package (or cache it) and reuse it for both IOC and advisory checks to avoid unnecessary IO on large scans.
| - **Exact npm package name(s)** as published on npm (e.g. `next`, `react-server-dom-webpack`, `axios`). | ||
| - **Affected version ranges** translated into this tool’s syntax: one or more entries in `vulnerable_ranges`, each entry a string of **AND** clauses, **OR** across array elements. | ||
|
|
||
| **Ready-to-use prompt:** copy the short block in **`docs/PULL_REQUEST.md`** (*Short prompt for an AI agent*). Replace `PASTE_ADVISORY_URL_HERE` with your GHSA or NVD/CVE link. |
There was a problem hiding this comment.
The doc references docs/PULL_REQUEST.md for the ready-to-use prompt, but that file doesn’t exist in this repo. Please either add the referenced document or update this line to point to the correct location (e.g., README or this document) so readers can actually find the prompt.
| **Ready-to-use prompt:** copy the short block in **`docs/PULL_REQUEST.md`** (*Short prompt for an AI agent*). Replace `PASTE_ADVISORY_URL_HERE` with your GHSA or NVD/CVE link. | |
| **Ready-to-use prompt:** use the prompt guidance in **this document** under **Using Cursor (or another AI) to create advisory JSON**. Replace `PASTE_ADVISORY_URL_HERE` with your GHSA or NVD/CVE link. |
| ## Installation | ||
|
|
||
| No additional dependencies required beyond Python 3.10+ standard library. | ||
| No additional runtime dependencies beyond Python 3.10+ standard library. Optional: `pytest` for tests (see `requirements.txt`). |
There was a problem hiding this comment.
This line points users to requirements.txt, but there is no requirements.txt in the repository. Either add the file (even if it only documents optional pytest) or remove/adjust the reference to avoid a broken pointer in the installation instructions.
| No additional runtime dependencies beyond Python 3.10+ standard library. Optional: `pytest` for tests (see `requirements.txt`). | |
| No additional runtime dependencies beyond Python 3.10+ standard library. Optional: install `pytest` if you want to run tests. |
Summary
Adds JSON advisory files with semver-style vulnerable ranges (
--advisories), alongside the existing IOC malware list (--malwares). Either or both may be passed. Output recordsfinding_type(malware|advisory). Python 3.10+ stdlib only at runtime—no extrapippackages for scanning.You can ask an AI coding agent (e.g. in Cursor) to draft advisory JSON from a GHSA or CVE/NVD URL using the short prompt below (replace the URL). Also includes
examples/templates,docs/CURSOR_AND_CVE_SCANNING.md, README updates, tests, and optional.cursor/rules(upstream can drop that folder if out of scope).Motivation
IOC text files match exact bad versions; they do not express CVE-style ranges (e.g. “below 1.13.5”). JSON advisories close that gap for offline, filesystem-based checks. Examples and docs make it easier to add rules and to generate JSON from public GHSA/CVE pages.
Short prompt for an AI agent (CVE / GHSA → advisory JSON)
Run this in the crawler repo (so
examples/is visible). ReplacePASTE_ADVISORY_URL_HEREonce.Changes (short)
--advisoriespackages→ list of{ id, vulnerable_ranges, ... }. Ranges use>=/</ etc.; clauses AND within a string, OR across array entries.--malwares--advisoriesis set.finding_type, stderr labels for malware vs advisory, “Active rules” line.node_modules/;--include-node-modulesscans them too.test_crawler.pycovers advisories, ranges, and catalog path filtering.requirements.txtpytestnote.Why it’s useful
Commands
Caveats
node_modulesby walking up from each catalog path; does not exhaustively recurse every nestednode_modulestree.>=/<style, not npm^/~.npm audit.Breaking / behavior notes
--malwaresnot required when--advisoriesis provided.node_modules/unless--include-node-modules.finding_type(and advisory-specific fields).Maintainer checklist
.cursor/rules/for upstream.pytest test_crawler.pyin CI.Suggested title
feat(scan): add --advisories JSON semver ranges and optional catalog path filtering