You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Found while fixing #5029. With the #5029 fixes in place, one corruption vector remained, isolated experimentally to old-page evacuation (C4b old-gen defrag): disabling defrag selection makes the structured_clone_gc_churn_stress workload pass 11/11 across the full knob matrix; enabling it corrupts the clone root deterministically on the cycle where old_page_moved_objects > 0.
Evidence
The strengthened verify_heap_objects (now covering unmarked old objects) finds no stale forwarded refs in any walked heap object after the rewrite — so the dangling referrer is NOT a heap slot covered by rewrite_forwarded_references.
Failure shape: with a 302-property cloned object, shape-table-based property lookups break (cl["f" + i] undefined for ~295 props) while inline-offset fast-path fields (f0–f4) and a direct string field survive. This pattern points at a codegen-emitted global holding a raw pointer to a moved object — prime suspect: the per-class @perry_class_keys_<module>__<class> globals (shared keys_array pointer, built once at module init). If those globals are not registered as FFI mutable roots, an old-page move of the keys array leaves the global dangling.
gc_collect_inner_with_trigger skips old-page defrag selection on any cycle whose conservative-stack-scan decision is Scan. Copying minors (the steady-state path) never run the conservative scan, so defrag still operates there under its own policy. This contains the corruption but leaves defrag disabled for fallback minors (e.g. every explicit gc() since #4998).
To do
Audit codegen-emitted raw-pointer globals (perry_class_keys_*, any module-var data tables holding raw I64 object pointers) for FFI mutable-root registration so the rewrite pass can fix them after moves.
Consider extending verify_evacuated_no_stale_forwarded_refs to walk codegen global tables so this class of dangling root is caught by PERRY_GC_VERIFY_EVACUATION instead of manifesting as silent corruption.
Context
Found while fixing #5029. With the #5029 fixes in place, one corruption vector remained, isolated experimentally to old-page evacuation (C4b old-gen defrag): disabling defrag selection makes the
structured_clone_gc_churn_stressworkload pass 11/11 across the full knob matrix; enabling it corrupts the clone root deterministically on the cycle whereold_page_moved_objects > 0.Evidence
verify_heap_objects(now covering unmarked old objects) finds no stale forwarded refs in any walked heap object after the rewrite — so the dangling referrer is NOT a heap slot covered byrewrite_forwarded_references.cl["f" + i]undefined for ~295 props) while inline-offset fast-path fields (f0–f4) and a direct string field survive. This pattern points at a codegen-emitted global holding a raw pointer to a moved object — prime suspect: the per-class@perry_class_keys_<module>__<class>globals (shared keys_array pointer, built once at module init). If those globals are not registered as FFI mutable roots, an old-page move of the keys array leaves the global dangling.Current mitigation (shipped with the #5029 PR)
gc_collect_inner_with_triggerskips old-page defrag selection on any cycle whose conservative-stack-scan decision isScan. Copying minors (the steady-state path) never run the conservative scan, so defrag still operates there under its own policy. This contains the corruption but leaves defrag disabled for fallback minors (e.g. every explicitgc()since #4998).To do
perry_class_keys_*, any module-var data tables holding raw I64 object pointers) for FFI mutable-root registration so the rewrite pass can fix them after moves.gc_write_barrier_stress+ the CI: gc_write_barrier_stress red on main (missing old→young remembered-set edges → segfault) #5029 repro matrix.verify_evacuated_no_stale_forwarded_refsto walk codegen global tables so this class of dangling root is caught byPERRY_GC_VERIFY_EVACUATIONinstead of manifesting as silent corruption.