fix(gc): explicit gc() forces the conservative native-stack scan (#4977)#4998
Merged
Conversation
In the default auto scan mode a full collection skips the conservative native-stack scan, but at a gc() callsite live module-init/top-level locals may be held only on the native stack — neither the precise shadow-stack roots nor the module-var scanners cover them — so the collector reclaimed live object graphs and later field reads returned dangling-pointer garbage (silent corruption, no crash). Fix: ManualGcScanGuard pins the Full conservative scan for the duration of a manual collection, on both the direct js_gc_collect path and the deferred-flush arm (now shared via manual_gc_collect_now). The guard respects an already-pinned per-thread override (the GC unit tests pin Auto so forced collections still reclaim native-stack locals), and an explicit PERRY_CONSERVATIVE_STACK_SCAN env value beats any override, so the bisection escape hatch keeps working. The heap-snapshot workaround that wrapped its collect in the Full override (#4916) is superseded and removed. Verified: repro prints 16 / leaf-string-4916 / widget-name-4977 (was dangling-pointer garbage); PERRY_CONSERVATIVE_STACK_SCAN=0 still reproduces the skip (env precedence intact); PERRY_GEN_GC=0 unaffected; gc:: unit suite 377 passed / 0 failed.
proggeramlug
pushed a commit
that referenced
this pull request
Jun 12, 2026
…remembered-set fix Both tests fail on a PRE-EXISTING remembered-set coverage bug exposed when explicit gc() started using the Full conservative stack scan (#4998): minor cycles drop legitimate old->young dirty-page coverage, live nursery children of old-gen large objects are swept while still referenced, and forced evacuation corrupts through the dangling slots. Full root-cause trail (bisect, knob matrix, instrumentation) lives in #5029. Re-enable when the coverage fix lands.
proggeramlug
added a commit
that referenced
this pull request
Jun 12, 2026
…re-enable write-barrier stress tests (#5043) Four coordinated fixes, each addressing a measured failure mode of the gc_write_barrier_stress suite (red since #4998 made explicit gc() run the Full conservative native-stack scan): 1. roots: conservative discoveries in the OLD generation are pin-only in MINOR cycles (CONS_PINNED, no mark, no trace seed). A stale stack word can resurrect a DEAD old object whose slots still point into long-swept nursery memory; once fresh nursery blocks land on those freed ranges the slots alias live young objects, and tracing/evacuating/rewriting through them corrupts the heap (and produced the deterministic missing_edges=7710 verifier signature on a dead 256 KB array backing). Minors never sweep the old gen, so the mark is not needed for survival, and a LIVE old object's real old->young edges are dirty-page-covered by the write barriers (measured: ~60k barrier calls per inter-cycle window, all landing correctly). FULL collections keep mark+trace (#4977). 2. verify/rewrite: rewrite_heap_objects and verify_heap_objects no longer skip UNMARKED non-nursery objects. Being unmarked in a minor is the normal state of a live old object, not a sign of death; old->old references have no remembered-set coverage, so this walk is the only pass that re-points an old referrer at an evacuated target before the forwarding stubs are released (measured: 753 skipped stale referrers per evacuating cycle). 3. remembered set: restore_surviving_dirty_coverage() re-derives kept pages after remembered_set_clear from the SAME walk the old-young-edge verifier uses, so a still-needed page can never be dropped (also closes the copying-path re-remember gap where ptrs.decode_bits returns None for freshly copied to-survivor children). External entries are validated address-first (page classify / malloc registry) before any header dereference - the reclaim unit tests seed synthetic entries. 4. policy: old-page defrag (C4b compaction) is skipped on cycles that ran the conservative stack scan. Conservative stack words cannot be rewritten after a move and CONS_PINNED only covers direct discoveries; the stress suite demonstrated a moved old object with an un-rewritten referrer (shape-table lookups through it returned recycled memory). Copying minors never run the conservative scan, so steady-state defrag is unaffected. Follow-up to lift the gate: #5042 (codegen raw-pointer globals, e.g. perry_class_keys_*, need mutable-root registration). Validation: gc_write_barrier_stress 2/2 across repeated runs (re-enabled, previously #[ignore]d); standalone repro matrix 23/23 clean across FORCE_EVACUATE+VERIFY_EVACUATION, PERRY_CONSERVATIVE_STACK_SCAN=full, default, PERRY_GEN_GC=0 and PERRY_GEN_GC_EVACUATE=0; gc unit suite 377/377; perry-runtime lib green (single known macOS date flake); perry bin+integration suites green. Co-authored-by: Ralph Küpper <ralph@skelpo.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #4977.
Problem
Explicit
gc()(a full collection with the defaultautostack-scan mode) skipped the conservative native-stack scan:gc/roots.rs::conservative_stack_scan_decision_formapsAuto→SkipDisabled. At agc()callsite, live module-init/top-level locals can be held only on the native stack — neither the precise shadow-stack roots nor the module-var scanners cover them — so the collector reclaimed the whole object graph and later field reads returned dangling-pointer garbage. Silent corruption, no crash:Fix
gc/roots.rs: newManualGcScanGuard— pins theFullconservative scan for the duration of a manual collection, only when no per-thread override is already pinned. The GC unit tests pinAutoinside their controlled-root scopes (so forced collections still reclaim objects held only as native-stack test locals) and keep working unchanged; an explicitPERRY_CONSERVATIVE_STACK_SCANenv value beats any override either way, so the bisection escape hatch is intact.gc/policy.rs: both manual-collect paths — directjs_gc_collectand the deferred-flushCollect(Manual)arm — now share onemanual_gc_collect_now()helper that engages the guard (they previously duplicated the weakref + collect sequence).gc/heap_snapshot.rs: the Diagnostics fakes: v8 heap snapshot is an empty-but-valid graph; inspector/repl sessions look real but aren't #4916 workaround that wrapped its collect in theFulloverride is superseded and removed;js_gc_collectnow provides the guarantee.Threshold-triggered automatic collections are intentionally untouched — the
Autoskip exists for copied-minor eligibility and per-cycle cost; this change scopes the full scan to explicit collections where the caller observably holds live state at the callsite.Validation
test-files/test_issue_4977_gc_toplevel_locals.ts, object literal + class instance): prints16 / leaf-string-4916 / widget-name-4977 / 1with the fix; previous binary printed10 / globalThis / <empty> / 1.PERRY_CONSERVATIVE_STACK_SCAN=0still reproduces the skip (env precedence verified at runtime);PERRY_GEN_GC=0legacy path unaffected.RUST_TEST_THREADS=1 cargo test --release -p perry-runtime gc::→ 377 passed / 0 failed, including the newmanual_gc_scan_guard_forces_full_scan_only_when_unpinnedcovering both the unpinned-engage and pinned-respect cases.Code-only PR — no version bump / changelog (maintainer folds metadata at merge).