Cursor CVE scan support by pabamato · Pull Request #3 · dsawardekar/crawler

pabamato · 2026-04-13T17:35:45Z

Summary

Adds JSON advisory files with semver-style vulnerable ranges (--advisories), alongside the existing IOC malware list (--malwares). Either or both may be passed. Output records finding_type (malware | advisory). Python 3.10+ stdlib only at runtime—no extra pip packages for scanning.

You can ask an AI coding agent (e.g. in Cursor) to draft advisory JSON from a GHSA or CVE/NVD URL using the short prompt below (replace the URL). Also includes examples/ templates, docs/CURSOR_AND_CVE_SCANNING.md, README updates, tests, and optional .cursor/rules (upstream can drop that folder if out of scope).

Motivation

IOC text files match exact bad versions; they do not express CVE-style ranges (e.g. “below 1.13.5”). JSON advisories close that gap for offline, filesystem-based checks. Examples and docs make it easier to add rules and to generate JSON from public GHSA/CVE pages.

Short prompt for an AI agent (CVE / GHSA → advisory JSON)

Run this in the crawler repo (so examples/ is visible). Replace PASTE_ADVISORY_URL_HERE once.

Create examples/advisories_<short>_<CVE-or-GHSA>.json for crawler.py --advisories. Match the JSON shape of examples/advisories_axios.example.json (top-level "packages", exact npm keys). vulnerable_ranges: each array element is one OR-branch; inside a string use only comma-separated >= / <= / > / < clauses (AND), never ^ or ~. Use only this page for version facts: PASTE_ADVISORY_URL_HERE. Reply with raw JSON only.

Changes (short)

Area	Detail
`--advisories`	JSON `packages` → list of `{ id, vulnerable_ranges, ... }`. Ranges use `>=` / `<` / etc.; clauses AND within a string, OR across array entries.
`--malwares`	Optional if `--advisories` is set.
Output	`finding_type`, stderr labels for malware vs advisory, “Active rules” line.
Catalog	By default, ignore catalog lines whose path is under `node_modules/`; `--include-node-modules` scans them too.
Tests	`test_crawler.py` covers advisories, ranges, and catalog path filtering.
`requirements.txt`	No runtime deps; optional `pytest` note.

Why it’s useful

One scanner: IOC pins + semver windows, same catalog workflow.
No pip friction: works on PEP 668 / locked-down Python.
Maintainable rules: version-controlled JSON, easy to extend from advisories.

Commands

python3 crawler.py scan --catalog packages.txt --malwares malwares.txt -o out.json
python3 crawler.py scan --catalog packages.txt --advisories examples/advisories_axios.example.json -o out.json
python3 crawler.py scan --catalog packages.txt --malwares malwares.txt --advisories rules.json -o out.json
python3 crawler.py scan --catalog packages.txt --advisories rules.json --include-node-modules

Caveats

Resolves node_modules by walking up from each catalog path; does not exhaustively recurse every nested node_modules tree.
Ranges are explicit >= / < style, not npm ^ / ~.
Complements, does not replace, npm audit.

Breaking / behavior notes

--malwares not required when --advisories is provided.
Default skips catalog paths under node_modules/ unless --include-node-modules.
JSON findings include finding_type (and advisory-specific fields).

Maintainer checklist

Keep or remove .cursor/rules/ for upstream.
pytest test_crawler.py in CI.

Suggested title

feat(scan): add --advisories JSON semver ranges and optional catalog path filtering

Copilot

Pull request overview

Adds offline CVE-style advisory scanning support to the existing npm malware IOC scanner, including semver-style vulnerable ranges, improved scan output labeling, and catalog noise reduction.

Changes:

Introduces --advisories JSON rule files with semver-style vulnerable ranges and emits finding_type (malware | advisory) in JSONL output.
Makes --malwares optional when advisories are provided; reports active rule sources and improves scan summary output.
Skips catalog entries under node_modules/**/package.json by default (opt-in via --include-node-modules), with new regression tests and example rule templates/docs.

Reviewed changes

Copilot reviewed 9 out of 10 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
`crawler.py`	Implements advisory rule loading, stdlib semver-range matching, new finding type output, and optional node_modules catalog-path filtering.
`test_crawler.py`	Adds tests for advisory JSON loading, range matching behavior, and node_modules path filtering behavior.
`README.md`	Updates usage/docs to describe `--advisories`, `finding_type`, and `--include-node-modules`.
`docs/CURSOR_AND_CVE_SCANNING.md`	Adds workflow documentation for authoring/merging advisory JSON and running scans at scale.
`examples/advisories_axios.example.json`	Provides an advisory JSON template/example for axios.
`examples/advisories_react_rsc_GHSA-479c-33wc-g2pg.example.json`	Provides an advisory JSON example for React RSC-related packages.
`examples/advisories_nextjs_GHSA-q4gf-8mx6-v5v3.example.json`	Provides an advisory JSON example for Next.js.
`examples/malwares_axios_IOC.example.txt`	Adds an IOC-format example file and clarifies IOC vs range limitations.
`.gitignore`	Ignores local virtualenv and macOS metadata files.
`.cursor/rules/repo-reference.mdc`	Adds Cursor agent guidance for repo layout and the new advisory workflow (optional tooling support).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-13T17:40:21Z

+| `examples/*.example.json` | Advisory JSON you can copy or hand to an AI as a template |
+| `examples/malwares_axios_IOC.example.txt` | IOC list format (exact `@version` pins) |
+| `test_crawler.py` | Regression tests |
+| `requirements.txt` | No runtime deps; optional pytest note |


requirements.txt is listed here as a related file, but the repo doesn’t include it. Add it (for optional dev/test deps) or remove/update this row to keep the repo inventory accurate.

Suggested change

| `requirements.txt` | No runtime deps; optional pytest note |

Copilot · 2026-04-13T17:40:22Z

        for pkg_name, malicious_versions in self.malware_packages.items():
            if found_version := self._check_package(node_modules_path, pkg_name, malicious_versions):
                yield PackageMatch(project_package_json, node_modules_path, pkg_name, found_version)

+        for pkg_name, rules in self.advisory_by_package.items():
+            found_version = self._get_installed_version(node_modules_path, pkg_name)
+            if not found_version:


If the same package name appears in both malware_packages and advisory_by_package, the scanner will read and parse node_modules/<pkg>/package.json twice (once in _check_package, again in _get_installed_version). Consider refactoring to read the installed version once per package (or cache it) and reuse it for both IOC and advisory checks to avoid unnecessary IO on large scans.

Copilot · 2026-04-13T17:40:22Z

+- **Exact npm package name(s)** as published on npm (e.g. `next`, `react-server-dom-webpack`, `axios`).
+- **Affected version ranges** translated into this tool’s syntax: one or more entries in `vulnerable_ranges`, each entry a string of **AND** clauses, **OR** across array elements.
+
+**Ready-to-use prompt:** copy the short block in **`docs/PULL_REQUEST.md`** (*Short prompt for an AI agent*). Replace `PASTE_ADVISORY_URL_HERE` with your GHSA or NVD/CVE link.


The doc references docs/PULL_REQUEST.md for the ready-to-use prompt, but that file doesn’t exist in this repo. Please either add the referenced document or update this line to point to the correct location (e.g., README or this document) so readers can actually find the prompt.

Suggested change

**Ready-to-use prompt:** copy the short block in **`docs/PULL_REQUEST.md`** (*Short prompt for an AI agent*). Replace `PASTE_ADVISORY_URL_HERE` with your GHSA or NVD/CVE link.

**Ready-to-use prompt:** use the prompt guidance in **this document** under **Using Cursor (or another AI) to create advisory JSON**. Replace `PASTE_ADVISORY_URL_HERE` with your GHSA or NVD/CVE link.

Copilot · 2026-04-13T17:40:22Z

 ## Installation

-No additional dependencies required beyond Python 3.10+ standard library.
+No additional runtime dependencies beyond Python 3.10+ standard library. Optional: `pytest` for tests (see `requirements.txt`).


This line points users to requirements.txt, but there is no requirements.txt in the repository. Either add the file (even if it only documents optional pytest) or remove/adjust the reference to avoid a broken pointer in the installation instructions.

Suggested change

No additional runtime dependencies beyond Python 3.10+ standard library. Optional: `pytest` for tests (see `requirements.txt`).

No additional runtime dependencies beyond Python 3.10+ standard library. Optional: install `pytest` if you want to run tests.

pabamato added 2 commits April 13, 2026 19:13

Cursor CVE scan support

cc45893

Update docs

f4d1045

Copilot AI review requested due to automatic review settings April 13, 2026 17:35

Copilot started reviewing on behalf of pabamato April 13, 2026 17:36 View session

Copilot AI reviewed Apr 13, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cursor CVE scan support#3

Cursor CVE scan support#3
pabamato wants to merge 2 commits into
dsawardekar:trunkfrom
pabamato:feature/cursor-cve-scan

pabamato commented Apr 13, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 13, 2026

Uh oh!

Copilot AI Apr 13, 2026

Uh oh!

Copilot AI Apr 13, 2026

Uh oh!

Copilot AI Apr 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	Ready-to-use prompt: copy the short block in `docs/PULL_REQUEST.md` (Short prompt for an AI agent). Replace `PASTE_ADVISORY_URL_HERE` with your GHSA or NVD/CVE link.
	Ready-to-use prompt: use the prompt guidance in this document under Using Cursor (or another AI) to create advisory JSON. Replace `PASTE_ADVISORY_URL_HERE` with your GHSA or NVD/CVE link.

	No additional runtime dependencies beyond Python 3.10+ standard library. Optional: `pytest` for tests (see `requirements.txt`).
	No additional runtime dependencies beyond Python 3.10+ standard library. Optional: install `pytest` if you want to run tests.

Conversation

pabamato commented Apr 13, 2026

Summary

Motivation

Short prompt for an AI agent (CVE / GHSA → advisory JSON)

Changes (short)

Why it’s useful

Commands

Caveats

Breaking / behavior notes

Maintainer checklist

Suggested title

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants