chore(test): node-suite differential runner + regression guard#5097
Conversation
Add scripts/node_suite_run.py (the print-and-diff node-suite runner) with two correctness measures baked in: - pre-warm pass: compile one test per module serially first so a crypto-feature module's cold auto-optimize rebuild doesn't blow the per-test compile timeout and score the whole module as perry_err (this made dns look 0% / http 47%). - fast/slow lanes: server/timing modules (http, http2, net, stream, ...) run strictly sequentially to avoid port contention + timing-race false negatives (http measured 78.9% parallel vs a verified 100% sequential). Add scripts/node_suite_regression_check.py + test-parity/node_suite_baseline.json: a floor guard that fails if any module drops below baseline (improvements always accepted). The node-suite is not part of the per-PR gate (the parity job is opt-in and runs node 22 while the real oracle is node 26), so a module can silently regress and still merge green -- which is how node:dns once went 83% -> 0% behind a green build. Run this in the node-26 environment on a schedule or before cutting a release. Floors captured at 2781/2863 (97.1%) on node v26.3.0; racy timing modules carry a small margin so ordinary flake doesn't false-alarm.
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (2)
📝 WalkthroughWalkthroughAdds a differential node-suite test runner, a CLI regression checker that compares per-module pass counts against a JSON baseline, and the initial node-26 baseline data. ChangesNode-suite regression testing
🎯 3 (Moderate) | ⏱️ ~20 minutes
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@scripts/node_suite_regression_check.py`:
- Around line 49-51: The check in scripts/node_suite_regression_check.py should
fail closed whenever the runner exits non‑zero; update the conditional that
currently reads "if proc.returncode != 0 and not proc.stdout:" to simply "if
proc.returncode != 0:" so any non‑zero proc.returncode triggers printing the
error and returning 2 (remove reliance on proc.stdout presence so
crashes/timeouts don't lead to parsing incomplete output).
In `@scripts/node_suite_run.py`:
- Around line 57-69: The current flow ignores non-zero child exits and uses
strip() which removes leading whitespace; change the logic so after running NODE
(variable n) you check n.returncode and return (mod, "node_err") if non-zero,
and after running the generated binary (variable p) check p.returncode and
return (mod, "run_err") if non-zero (keep the existing check for c.returncode !=
0 returning "compile_fail"); and when comparing outputs use a
trailing-newline-only trim (e.g., compare n.stdout.rstrip('\n') to
p.stdout.rstrip('\n')) instead of strip() so leading whitespace is preserved.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: c707e7ac-1167-4575-9bf0-5c2ee2cfb374
📒 Files selected for processing (3)
scripts/node_suite_regression_check.pyscripts/node_suite_run.pytest-parity/node_suite_baseline.json
| n = subprocess.run([NODE, path], capture_output=True, text=True, timeout=30) | ||
| except Exception: | ||
| return (mod, "node_err") | ||
| with tempfile.TemporaryDirectory() as td: | ||
| out = os.path.join(td, "o") | ||
| try: | ||
| c = subprocess.run([PERRY, path, "-o", out], capture_output=True, text=True, timeout=120) | ||
| if c.returncode != 0: | ||
| return (mod, "compile_fail") | ||
| p = subprocess.run([out], capture_output=True, text=True, timeout=30) | ||
| except Exception: | ||
| return (mod, "perry_err") | ||
| return (mod, "pass" if n.stdout.strip() == p.stdout.strip() else "diff") |
There was a problem hiding this comment.
Fail on child-process errors and preserve leading whitespace.
strip() removes leading whitespace too, and the current code still compares output even when node or the generated out process exits non-zero. That can turn real failures into false pass/diff results.
🐛 Proposed fix
n = subprocess.run([NODE, path], capture_output=True, text=True, timeout=30)
+ if n.returncode != 0:
+ return (mod, "node_err")
@@
c = subprocess.run([PERRY, path, "-o", out], capture_output=True, text=True, timeout=120)
if c.returncode != 0:
return (mod, "compile_fail")
p = subprocess.run([out], capture_output=True, text=True, timeout=30)
+ if p.returncode != 0:
+ return (mod, "perry_err")
except Exception:
return (mod, "perry_err")
- return (mod, "pass" if n.stdout.strip() == p.stdout.strip() else "diff")
+ return (mod, "pass" if n.stdout.rstrip() == p.stdout.rstrip() else "diff")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| n = subprocess.run([NODE, path], capture_output=True, text=True, timeout=30) | |
| except Exception: | |
| return (mod, "node_err") | |
| with tempfile.TemporaryDirectory() as td: | |
| out = os.path.join(td, "o") | |
| try: | |
| c = subprocess.run([PERRY, path, "-o", out], capture_output=True, text=True, timeout=120) | |
| if c.returncode != 0: | |
| return (mod, "compile_fail") | |
| p = subprocess.run([out], capture_output=True, text=True, timeout=30) | |
| except Exception: | |
| return (mod, "perry_err") | |
| return (mod, "pass" if n.stdout.strip() == p.stdout.strip() else "diff") | |
| n = subprocess.run([NODE, path], capture_output=True, text=True, timeout=30) | |
| if n.returncode != 0: | |
| return (mod, "node_err") | |
| except Exception: | |
| return (mod, "node_err") | |
| with tempfile.TemporaryDirectory() as td: | |
| out = os.path.join(td, "o") | |
| try: | |
| c = subprocess.run([PERRY, path, "-o", out], capture_output=True, text=True, timeout=120) | |
| if c.returncode != 0: | |
| return (mod, "compile_fail") | |
| p = subprocess.run([out], capture_output=True, text=True, timeout=30) | |
| if p.returncode != 0: | |
| return (mod, "perry_err") | |
| except Exception: | |
| return (mod, "perry_err") | |
| return (mod, "pass" if n.stdout.rstrip() == p.stdout.rstrip() else "diff") |
🧰 Tools
🪛 ast-grep (0.43.0)
[error] 62-62: Use of unsanitized data to create processes
Context: subprocess.run([PERRY, path, "-o", out], capture_output=True, text=True, timeout=120)
Note: [CWE-78].
(os-system-unsanitized-data)
[error] 65-65: Use of unsanitized data to create processes
Context: subprocess.run([out], capture_output=True, text=True, timeout=30)
Note: [CWE-78].
(os-system-unsanitized-data)
[error] 62-62: Command coming from incoming request
Context: subprocess.run([PERRY, path, "-o", out], capture_output=True, text=True, timeout=120)
Note: [CWE-20].
(subprocess-from-request)
[error] 65-65: Command coming from incoming request
Context: subprocess.run([out], capture_output=True, text=True, timeout=30)
Note: [CWE-20].
(subprocess-from-request)
🪛 Ruff (0.15.15)
[error] 57-57: subprocess call: check for execution of untrusted input
(S603)
[warning] 58-58: Do not catch blind exception: Exception
(BLE001)
[error] 63-63: subprocess call: check for execution of untrusted input
(S603)
[error] 66-66: subprocess call: check for execution of untrusted input
(S603)
[warning] 67-67: Do not catch blind exception: Exception
(BLE001)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@scripts/node_suite_run.py` around lines 57 - 69, The current flow ignores
non-zero child exits and uses strip() which removes leading whitespace; change
the logic so after running NODE (variable n) you check n.returncode and return
(mod, "node_err") if non-zero, and after running the generated binary (variable
p) check p.returncode and return (mod, "run_err") if non-zero (keep the existing
check for c.returncode != 0 returning "compile_fail"); and when comparing
outputs use a trailing-newline-only trim (e.g., compare n.stdout.rstrip('\n') to
p.stdout.rstrip('\n')) instead of strip() so leading whitespace is preserved.
- node_suite_regression_check.py: fail closed on any non-zero runner exit
(was: only when stdout also empty) so a crashed/timed-out runner can't lead
to parsing partial output.
- node_suite_run.py: compare stdout with rstrip('\n') instead of strip() so
leading/trailing spaces are significant (byte-for-byte intent), and require
matching process exit code, so a Perry crash that printed matching output
before dying is a diff, not a false pass. (Did not bucket node-nonzero as
node_err per the literal suggestion — that would wrongly drop legitimate
error-path tests where node intentionally exits non-zero and Perry matches.)
|
Thanks @coderabbitai — both addressed in 4fecc93:
Re-ran the full suite on the node-26 oracle with the stricter comparison: no module dropped below its floor (overall 2779/2863, the ±2 vs the committed 2781 is ordinary stream/http2 timing flake, within the margins). So the stricter match confirmed the existing numbers weren't hiding masked crashes or whitespace diffs — baseline floors stand unchanged. |
|
Tip For best results, initiate chat on the files or code changes.
The re-run confirming 2779/2863 within the timing-flake margin is a good sanity check that the stricter comparison didn't expose any hidden whitespace or exit-code mismatches in the baseline. (ᴗ˳ᴗ) |
What
Adds tooling to make the print-and-diff node-suite (
test-parity/node-suite/) trustworthy and to stop silent regressions from merging green.scripts/node_suite_run.py— the runner, with two correctness measuresauto-optimizeruntime/stdlib rebuild (~4 min) doesn't blow the per-test compile timeout and score the whole module asperry_err. This artifact had made dns look 0% and http 47% when both are actually 100%.http http2 https net dgram tls cluster dns stream child_process worker_threads inspector repl diagnostics_channel timers fetch) run strictly sequentially to avoid port-contention + timing-race false negatives. http measured 78.9% parallel vs a verified 100% sequential.scripts/node_suite_regression_check.py+test-parity/node_suite_baseline.json— the guardA floor check: fails (exit 1) if any module drops below its baseline pass count; improvements are always accepted and reported as ratchet candidates.
Why
The node-suite is not part of the per-PR CI gate (the
parityjob is opt-in and runs node 22, while the real oracle is node 26). A module can silently regress and still merge green — which is exactly how node:dns once went 83% → 0% behind a green build. This guard runs in the node-26 environment (the box) on a schedule or before a release and catches that class of regression. Validated: it flags a simulateddns 6→0and passes the current tree clean.Numbers
Floors captured at 2781/2863 (97.1%) on node v26.3.0. Racy timing modules (http2, net, stream, diagnostics_channel, fs-promises) carry a small margin so ordinary flake doesn't false-alarm.
Notes
Summary by CodeRabbit