fix(gc): close the #5029 conservative-scan × evacuation corruption — re-enable write-barrier stress tests#5043
Merged
Merged
Conversation
…re-enable write-barrier stress tests Four coordinated fixes, each addressing a measured failure mode of the gc_write_barrier_stress suite (red since #4998 made explicit gc() run the Full conservative native-stack scan): 1. roots: conservative discoveries in the OLD generation are pin-only in MINOR cycles (CONS_PINNED, no mark, no trace seed). A stale stack word can resurrect a DEAD old object whose slots still point into long-swept nursery memory; once fresh nursery blocks land on those freed ranges the slots alias live young objects, and tracing/evacuating/rewriting through them corrupts the heap (and produced the deterministic missing_edges=7710 verifier signature on a dead 256 KB array backing). Minors never sweep the old gen, so the mark is not needed for survival, and a LIVE old object's real old->young edges are dirty-page-covered by the write barriers (measured: ~60k barrier calls per inter-cycle window, all landing correctly). FULL collections keep mark+trace (#4977). 2. verify/rewrite: rewrite_heap_objects and verify_heap_objects no longer skip UNMARKED non-nursery objects. Being unmarked in a minor is the normal state of a live old object, not a sign of death; old->old references have no remembered-set coverage, so this walk is the only pass that re-points an old referrer at an evacuated target before the forwarding stubs are released (measured: 753 skipped stale referrers per evacuating cycle). 3. remembered set: restore_surviving_dirty_coverage() re-derives kept pages after remembered_set_clear from the SAME walk the old-young-edge verifier uses, so a still-needed page can never be dropped (also closes the copying-path re-remember gap where ptrs.decode_bits returns None for freshly copied to-survivor children). External entries are validated address-first (page classify / malloc registry) before any header dereference - the reclaim unit tests seed synthetic entries. 4. policy: old-page defrag (C4b compaction) is skipped on cycles that ran the conservative stack scan. Conservative stack words cannot be rewritten after a move and CONS_PINNED only covers direct discoveries; the stress suite demonstrated a moved old object with an un-rewritten referrer (shape-table lookups through it returned recycled memory). Copying minors never run the conservative scan, so steady-state defrag is unaffected. Follow-up to lift the gate: #5042 (codegen raw-pointer globals, e.g. perry_class_keys_*, need mutable-root registration). Validation: gc_write_barrier_stress 2/2 across repeated runs (re-enabled, previously #[ignore]d); standalone repro matrix 23/23 clean across FORCE_EVACUATE+VERIFY_EVACUATION, PERRY_CONSERVATIVE_STACK_SCAN=full, default, PERRY_GEN_GC=0 and PERRY_GEN_GC_EVACUATE=0; gc unit suite 377/377; perry-runtime lib green (single known macOS date flake); perry bin+integration suites green.
This was referenced Jun 12, 2026
fix: chalk boolean style modifiers —
<Text dimColor> no longer renders [object Object] (#5039)
#5045
Merged
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #5029.
The bug chain (all empirically measured — full trail in the #5029 comments)
The stress suite has been red since #4998 made explicit
gc()run the Full conservative native-stack scan. The corruption is real (clone fields read recycled memory), and the root cause is an interaction chain, not a single defect:gc().pointer_in_nurseryflips false→true for unchanged slot bits — measured at panic time: 7710 aliased slots, page-pattern all-clean, manual barrier replay covers instantly).missing_edges=7710signature.The four fixes
CONS_PINNED, no mark/trace seed); full collections keep mark+trace for #4977rewrite_heap_objects/verify_heap_objectsno longer skip unmarked non-nursery objects (unmarked ≠ dead outside the nursery in a minor)restore_surviving_dirty_coverage(): post-clear remembered-set repair using the same walk the verifier uses (address-validated before any header deref — the reclaim unit tests seed synthetic entries)ptrs.decode_bits→ None for to-space children)The two stress tests are re-enabled (they were
#[ignore]d in #5033 to unblock CI).Why a live old object loses nothing from fix 1
Its real old→young edges are barrier-covered: measured ~60k
mark_dirty_old_pagecalls per inter-cycle window, all landing on the right pages — minors only ever find old→young edges through the remembered set anyway. Retention doesn't need the mark (minors don't sweep old-gen), andCONS_PINNEDalready blocks every evacuation path.Validation
gc_write_barrier_stress: 2/2, repeated ×4 runs (re-enabled)FORCE_EVACUATE+VERIFY_EVACUATION,PERRY_CONSERVATIVE_STACK_SCAN=full, default knobs,PERRY_GEN_GC=0,PERRY_GEN_GC_EVACUATE=0isReactComponenton its prototype (2-level chain) → react-reconciler treats error boundaries as function components → children dropped (blocks ink) #5024 gap test byte-identical to Node; clone/probe programs clean through all rounds under stress knobsFollow-up
#5042 — register codegen raw-pointer globals (
perry_class_keys_*et al.) as mutable roots, then lift the fix-4 defrag gate and extendPERRY_GC_VERIFY_EVACUATIONto walk those tables.No version bump / changelog — maintainer folds metadata at merge.