Coarse-fine interface machinery for ADAM/PRISM — Berger-Colella reflux, edge-E uniqueness, high-order ghost-fill
This issue body was reformulated 2026-05-23 after a multi-session literature review (sources/papers_20260522_forest_octree_halo.md) and design conversation that re-scoped the work. The original Phase-D-design content (tasks D.1–D.4a, all completed) is preserved as Appendix H — Historical scope. The active forward plan is Phases A → B → C described in §§3–5 below.
Status of the original Phase D scope:
- D.1 (design), D.2 (
adam_maps_object extension), D.3 (manifest parser + driver), C.3 closure (post-D.3 follow-up), and D.4a (multi-realm orchestrator scaffolding + rmf-2realm case files + digest aggregation) — all completed and merged (commits 2d74fb6d, dffa19be, c2a77708, d73b9ca8, plus the May-2026 budget/finalize fixes 771dbd2f and bf324ccd).
- D.4b (capture rmf-2realm golden + confirm bit-comparability) — superseded by the present work: empirical evidence (the seam
By digest drifts 0% → 0.55% → 0.77% over 10 steps) shows the rmf-2realm seam is not bit-comparable to the single-realm golden, because the seam coupling is structurally incomplete. The seam machinery has to be built before a meaningful golden can be captured.
- D.5 (architecture docs) — deferred until §§3–5 land; the docs will reflect the actual final architecture.
1. Motivation — what the rmf-2realm regression revealed
The rmf-2realm regression case (src/tests/prism/regression/rmf-2realm/, see Appendix H for the construction) was built as a free correctness oracle: the two adjacent realms reconstruct the single-realm rmf domain by union, so the existing rmf golden should be bit-comparable to the digest-aggregated 2-realm output.
After fixing two genuine multi-realm engineering bugs in May 2026 — per-realm memory budget over-allocation (771dbd2f) and per-realm MPI_FINALIZE race (bf324ccd) — the case runs clean to completion on CPU. The aggregated digest, however, does not match the single-realm rmf golden. The By component shows a systematic drift at the inter-realm seam: 0% at t=0, 0.55% at step 5, 0.77% at step 10. The drift is monotonic, smooth, and spatially localized at the seam — it is not a sign error, an indexing bug, or a substage-staleness artifact (all three were eliminated during diagnosis).
The root cause is structural: the current exchange_inter_realm_halos_forest (src/app/prism/cpu/adam_prism_cpu_object.F90:1181-1493) is a partial implementation of the inter-realm seam coupling. Specifically:
- It is a same-resolution mirror copy only. No flux reconciliation, no edge-E (or equivalently, no induction-equation flux) sharing across the seam. The induction equation in each realm sees its own independently-computed curl-of-E contribution at the seam, and the two contributions disagree at truncation order. The disagreement accumulates as the
By drift.
find_peer_block is local-rank-only (line 1296). Off-rank peers hit cycle (line 1346) and the corresponding seam ghost is never written. For the current rmf-2realm replicated forest (both ranks own both realms) this defect is dormant; it activates immediately under disjoint-rank carve-out.
copy_peer_face writes the axis-aligned face slab only (lines 1366-1492). Edge and corner ghosts are not written. WENO stencils that read corner/edge ghosts at the seam see uninitialized or stale values. This is the same bug class AMReX fixed in their issue #568 (the multi_ghost send-tag fix).
The first defect — missing flux/edge-E reconciliation — is the dominant contributor to the By drift (it produces a truncation-order error that grows linearly with step count) and is the focus of this issue. Defects 2 and 3 are addressed as part of the same redesign.
2. Literature foundation — what the field has converged on
A full source-verified review is in sources/papers_20260522_forest_octree_halo.md. The operative consensus, after reading 11 primary sources end-to-end:
2.1 The cell-centered AMR divergence-preservation question, resolved
The relevant question for ADAM/PRISM is not "can a local pointwise interpolation operator preserve ∇·B = 0 on cell-centered storage?" — six independent sources (Tóth & Roe 2002 §2/§7, Balsara 2001, Londrillo–Del Zanna 2004 §2.2, Li & Li 2004 p.2, Olivares 2019 §2, Balsara 2009 §2) converge that this question is under-determined: one cell-averaged B-vector per cell cannot uniquely close the face-flux + curl constraints required for an exact local prolongation.
The relevant question is "can the discretization as a whole, with cell-centered storage everywhere, preserve ∇·B = 0 across AMR jumps by adding interface machinery (reflux, shared edge-E, unique flux reconstruction)?" — and the answer is yes, demonstrated by production codes (Flash-X Spark with cell-centered MHD + flux registers; BATSRUS in cell-centered mode + flux-CT; Athena++ FluxCorrection). The mechanism is:
- Restriction (fine → coarse): coarse-face flux = sum of fine-face fluxes. Algebraically conservative; if every fine cell has ∇·B = 0 numerically, the coarse cell does too, by telescoping (Olivares 2019 §4.2.1 Eq. 26-27).
- Induction-equation flux at the seam: one canonical edge-E value, shared by both sides. If the same edge-E enters the curl-of-E on both the fine and the coarse face that share the edge, the discrete divergence telescopes algebraically across the seam (Evans & Hawley 1988; Balsara & Spicer 1999; Londrillo & Del Zanna 2004 §2.1 Fig. 1).
- Prolongation (coarse → fine) for new fine cells: standard slope-limited reconstruction. Not required to be div-free pointwise; the next-substage reflux corrects any divergence error introduced. This is the Spark/AMReX pattern (Couch 2021 §2.2; Zhang 2019).
2.2 The 5-author algorithmic consensus on RK-multistage refluxing
For Runge-Kutta time integration (PRISM uses SSP-RK 1/2/3/4/5), the reflux is stage-weighted, not a single end-of-step correction. The formula, in the form most directly applicable to ADAM's existing adam_rk_object (which carries ark(s), brk(s), crk(s) weights and alph(s,ss), beta(s), gamm(s) matrices), is:
$$
U_{\text{coarse}}^{n+1} \mathrel{+}= \sum_{s=1}^{\text{nrk}} W_s \cdot \frac{\Delta t}{\Delta x} \left( F_{\text{coarse},s}^{\text{used}} - \sum_{\text{fine faces}} F_{\text{fine},s} \right)
$$
where $W_s$ is the SSP-RK accumulation weight for substage $s$ (derived from ark/brk/crk for the SSP-RK form ADAM uses). This is the formula in Wang et al. 2018, JCP 360 (eq. 23) for GPU-SAMR, generalized in Berger & Colella 1989, JCP 82:64 for the foundational case. Couch et al. 2021 §2.2 confirms the implementation in Flash-X Spark: "a linear combination of flux calculations can also be stored in the flux register for a correction to be applied to the fluxes at the end of the update step."
2.3 The high-order ghost-fill picture
When realms are conforming and at the same resolution (current rmf-2realm), ghost-fill is a same-resolution mirror copy and order is preserved trivially. When realms (or AMR levels intra-realm) differ in resolution, the existing ADAM ghost-fill drops to 0th-order piecewise-constant injection (src/lib/common/adam_maps_object.F90:1024-1042 — the "smoking gun" identified in the literature review). This caps global accuracy at the interface to 1st-order in a one-ghost-thick shell, regardless of the interior scheme order.
The replacement is dimension-by-dimension high-order WENO interpolation of the coarse cell-centered data, with a hybrid smooth/shock switch (5th-order WENO in smooth regions, 2nd-order conservative interp at shocks) governed by a troubled-cell detector. References: Shen, Qiu & Christlieb 2011 JCP 230:3780; arXiv:2511.08335 (2025) for the modern cell-centered FD treatment.
Critical decoupling (the insight that makes this safe): "this process does not introduce any loss of conservation, even though a non-conservative interpolation is employed. This is because the interpolation is only applied to populate ghost cells, rather than to update the valid solution cells." (arXiv:2511.08335 §4.2.3.) Conservation is enforced separately by the flux register (Phase A); ghost-fill order is enforced by WENO interpolation (Phase C). These two mechanisms are independent.
3. Phase A — Berger-Colella reflux on the existing flux array
Goal. Strict conservation of all bulk variables (ρ, momentum, energy, ψ, φ, and the B/D components) across every coarse-fine interface (intra-realm AMR jump AND inter-realm seam). This is the prerequisite for Phase B.
Status of G2 after Phase A alone. Conservation is exact; ∇·B is preserved if and only if the underlying induction-equation discretization is naturally CT-like (telescoping curl-of-E). For PRISM's current Roe-family Riemann + cleaning numerics, Phase A alone produces conservation-correct but truncation-order ∇·B at the seam — better than current, not yet algebraic. Phase B closes that gap.
3.1 New module — adam_flux_register_object
A new module in src/lib/common/, structurally parallel to adam_maps_object (additive, no modification to existing types). The data structure:
module adam_flux_register_object
use :: penf, only : I4P, R8P
integer(I4P), parameter, public :: SEAM_KIND_INTRA_REALM_AMR = 1_I4P !< Coarse-fine AMR jump inside one realm.
integer(I4P), parameter, public :: SEAM_KIND_INTER_REALM = 2_I4P !< Inter-realm seam.
!< Per-seam-face flux accumulator.
type, public :: flux_register_face_t
integer(I4P) :: seam_kind !< SEAM_KIND_*.
integer(I4P) :: coarse_realm !< Realm owning the coarse side (= 0 for intra-realm).
integer(I4P) :: coarse_block !< Block on the coarse side.
integer(I4P) :: coarse_face !< Which face (±x, ±y, ±z) of the coarse block.
integer(I4P) :: fine_realm !< Realm owning the fine side (= 0 for intra-realm).
integer(I4P) :: fine_block(:) !< Allocatable list of fine blocks covering this coarse face.
integer(I4P) :: nface_cells !< Cells covered by this face on the coarse side.
real(R8P), allocatable :: F_coarse(:,:,:) !< (nv, nface_cells, nrk) — coarse-side flux per substage.
real(R8P), allocatable :: F_fine_sum(:,:,:) !< (nv, nface_cells, nrk) — sum of fine fluxes, accumulated per substage.
endtype flux_register_face_t
type, public :: flux_register_object
type(flux_register_face_t), allocatable :: face(:) !< All registered seam faces.
integer(I4P) :: nfaces = 0_I4P
contains
procedure, pass(self) :: initialize
procedure, pass(self) :: register_face !< Called by topology pass at init/regrid.
procedure, pass(self) :: accumulate_coarse_flux !< Called by coarse-side residual.
procedure, pass(self) :: accumulate_fine_flux !< Called by fine-side residual.
procedure, pass(self) :: apply_reflux !< Called at end of substage, before next ghost-update.
procedure, pass(self) :: reset !< Called at start of step.
procedure, pass(self) :: destroy
endtype flux_register_object
end module adam_flux_register_object
The register does not own field data; it owns the per-substage flux contributions on a thin "skin" of cells along every seam face. Memory cost is therefore $O(\text{seam-area} \times \text{nv} \times \text{nrk})$ — bounded by the surface-area-times-stages of the AMR boundaries, independent of the interior cell count.
3.2 Lifecycle and hook points
Initialization. During the topology pass that runs once at init/regrid, the inter-realm topology already discovered by forest_object%populate_inter_realm_topology (src/lib/common/adam_forest_object.F90:368-406) is replayed to call flux_register%register_face for every inter-realm face pair. For intra-realm AMR, an analogous pass walks each realm's adam_grid_object neighbor table at every refinement level, calling register_face for every fine-coarse adjacency. The register's face(:) array is fully populated before the first time step.
Per-step reset. flux_register%reset is called from forest_object%evolve_one_step (src/lib/common/adam_forest_object.F90:194) at the top of every step, zeroing F_coarse and F_fine_sum for all registered faces.
Flux accumulation during residual computation. Inside each realm's compute_residuals_* routine (e.g. compute_residuals_fd_centered at src/app/prism/cpu/adam_prism_cpu_object.F90:1873-2056), when the flux through a face has been computed, the routine checks whether the face is registered (via a precomputed mask) and if so calls:
call flux_register%accumulate_coarse_flux(coarse_block=b, coarse_face=f, &
substage=s, flux_face=F_face)
! ... on the coarse side ...
call flux_register%accumulate_fine_flux(fine_block=b, fine_face=f, &
substage=s, flux_face=F_face)
! ... on the fine side; accumulates into F_fine_sum ...
The mask check is a single integer compare per face, so the overhead on interior faces is negligible.
Reflux application. At the end of each substage, before the next ghost-update, the forest invokes:
call flux_register%apply_reflux(realm=forest_realm, substage=s, dt=dt)
apply_reflux performs the rank-to-rank reduce of F_fine_sum (if the fine and coarse sides of a face are on different ranks; this is a separate MPI exchange from the ghost-update, with its own communicator pattern), then applies the correction to the coarse-side q array:
$$
q_{\text{coarse cell}}(\text{nv}, i, j, k, b) \mathrel{+}= W_s \cdot \frac{\Delta t}{\Delta x_{\text{coarse}}} \cdot \big( F_{\text{coarse},s} - F_{\text{fine sum},s} \big)
$$
where $W_s$ is the SSP-RK accumulation weight. For SSP-RK 3 (ADAM's nrk = 3), $W_1 = W_2 = W_3 = 1$ in the natural decomposition; for SSP-RK 5 stage-weights are read from adam_rk_object%ark(s).
Verification. A full step's worth of refluxing must satisfy conservation to round-off: for any conserved quantity $\Phi = \int q, dV$,
$$
\Phi(t^{n+1}) - \Phi(t^n) = -\Delta t \oint_{\partial\Omega} F \cdot n , dS
$$
with no contribution from internal coarse-fine interfaces — this is the discrete divergence theorem, and it is the unit-test acceptance criterion for Phase A.
3.3 Integration with the existing comm_map_* machinery
The flux register's MPI exchange is a separate communication pattern from the ghost-cell exchange (different cell sets, different send/recv directions). It does not plug into comm_map_send_ghost / comm_map_recv_ghost (src/lib/common/adam_maps_object.F90:96-99). Instead, the register builds its own (send_tags, recv_tags) tables at registration time, indexed by (fine rank, coarse rank, face) — structurally identical to AMReX's FabArrayBase::CommMetaData. The pack/unpack uses contiguous flux_register_face_t%F_fine_sum buffers, no transposition needed.
The intra-realm AMR case typically has the fine and coarse sides on the same rank (because both belong to the same realm and ADAM's intra-realm partitioning keeps levels co-resident); the inter-realm case may cross ranks. The register handles both transparently via the (send_tags, recv_tags) abstraction.
3.4 References and citations for Phase A
- Berger, M. J. & Colella, P. (1989). "Local adaptive mesh refinement for shock hydrodynamics." Journal of Computational Physics 82(1):64-84. doi:10.1016/0021-9991(89)90035-1. — The foundational reflux paper. The δF correction is in §4.
- Wang, B., Xia, Y. & Shu, C.-W. (2018). "Efficient implementation of high order inverse Lax-Wendroff boundary treatment for conservation laws." Journal of Computational Physics 360:73-99. — RK-stage-weighted reflux formula (Eq. 23). The form ADAM should adopt.
- Couch, S. M. et al. (2021). "Towards performance portability in the Spark astrophysical magnetohydrodynamics solver in the Flash-X simulation framework." Parallel Computing 108:102830. doi:10.1016/j.parco.2021.102830. — §2.2: production reflux for RK-multistage cell-centered MHD. Quoted: "a linear combination of flux calculations can also be stored in the flux register for a correction to be applied to the fluxes at the end of the update step."
- Zhang, W. et al. (2019). "AMReX: a framework for block-structured AMR." JOSS 4(37):1370. — The
FluxRegister and YAFluxRegister data structure design ADAM's flux_register_object mirrors.
- Olivares, H. et al. (2019). "Constrained transport and adaptive mesh refinement in the Black Hole Accretion Code." A&A 629:A61. — §4.2.1 Eq. 26-27: the telescoping property that makes reflux work.
3.5 Phase A acceptance criteria
3.6 Phase A implementation log — what landed, in 8 commits
Phase A is fully complete on develop. The rmf-2realm aggregated CPU digest matches the single-realm rmf golden across every field variable; the 9 metadata mismatches that previously cluttered the cross-case comparison are now filtered to SKIP_METADATA by digest.py compare. The 0.77%-at-step-10 By drift documented in HANDOFF_rmf2realm_2026-05-21.md is closed. A regression golden for the case is committed; run.sh PASSes rmf-2realm/cpu against its own golden. The flux-register infrastructure for the FV-path reflux is wired into forest_object AND the PRISM CPU consumer (compute_residuals_fv_centered) now actively calls accumulate_*_flux on registered seam faces; for the rmf-2realm case (which uses fd_centered) the FV hook is unreachable, the accumulators stay zero, and reflux remains a numerical no-op — by design, per the 2026-05-24 decision (PRISM is FD-focused, no FV regression case in scope).
The eight commits, in order:
| # |
Commit |
Subject |
Numerical effect |
| 1 |
6b13074 |
feat(lib): scaffold flux register module for Phase A reflux machinery |
Numerical no-op (skeleton TBPs) |
| 2 |
9786245 |
feat(lib): wire flux register topology pass for inter-realm seams |
Numerical no-op (accumulators allocated and zeroed) |
| 3 |
9d1533e |
feat(lib,prism): replace geometric seam exchange with precomputed comm-map |
rmf single-realm: byte-identical. rmf-2realm: byte-identical to develop-HEAD (0.77% By drift exposed but not yet closed — see §3.6.1 for why) |
| 4 |
6987c20 |
fix(lib,prism): close inter-realm seam — BC_SEAM override + 3-phase substage loop |
rmf-2realm digest closes (zero field-variable mismatches vs single-realm rmf golden) |
| 5 |
d3e50e5 |
test(prism): capture rmf-2realm CPU regression golden — seam closure |
rmf-2realm now has a committed golden; run.sh PASSes the case against it |
| 6 |
fa20706 |
test(regression): downgrade per-block-metadata digest mismatches to SKIP_METADATA |
Cosmetic — cross-case rmf-2realm vs rmf golden comparison now exits 0 (filters the 9 benign metadata rows to SKIP_METADATA) |
| 7 |
fd91be6 |
feat(lib): reflux infrastructure — accumulators, register lookup, forest q-correction |
Numerical no-op (FV-path infrastructure; no consumer invokes accumulate_*_flux yet — see §3.7) |
| 8 |
108a96a6 |
feat(prism,lib): wire FV-path inter-realm flux accumulation hook (#13) |
Numerical no-op for the regression case (rmf-2realm uses fd_centered, hook unreachable). Hook is live for FV consumers; same-resolution mirror seam gives F_coarse - F_fine_sum = round-off zero by expectation. |
Merged into develop as ac6edec0 (commits 1–3), 993f81fe (commit 4), d3e50e5 direct (commit 5), fa20706 direct (commit 6), c3648ef9 (commit 7 from feature/issue13-phase-a-fv-reflux), and 57df060e (commit 8 from feature/issue13-phase-a-fv-reflux-hook). HEAD on develop after closure: 57df060e.
Material work landed by commits 1–3:
-
adam_flux_register_object exists as a program-scope singleton (forest_flux_register on adam_forest_global) with full TBP signatures (initialize, register_face, accumulate_coarse_flux, accumulate_fine_flux, apply_reflux, reset, destroy). Topology hooks fire at init (populate_inter_realm_topology), per-step reset (evolve_one_step), and per-substage apply (between phases of the substage loop). Accumulators (F_coarse, F_fine_sum) are allocated (nv, nface_cells, nrk) per registered face. accumulate_*_flux and apply_reflux remain skeleton no-ops pending Phase A step 4 (PRISM compute_residuals_* wiring); the lifecycle infrastructure is complete and inert.
-
adam_maps_object gained inter_realm_ghost_cell(:,1:10) — a per-cell ghost map analogous to the intra-realm local_map_ghost_cell machinery, with layout [peer_realm, b_send, b_recv, i_send, j_send, k_send, i_recv, j_recv, k_recv, one_or_eight]. Built once at init by forest_object%build_inter_realm_ghost_cell_map, which enumerates every ghost cell in every seam-block's full ghost slab (face + edge + corner — the defect-B fix) and resolves the matching peer realm + block + interior cell via a geometric find_peer_cell test against field%emin/emax. Coverage for rmf-2realm: 9348 map rows per realm per rank (6144 face + 2880 edge + 324 corner).
-
prism_cpu_object%exchange_inter_realm_halos_forest is now a flat indexed loop over the precomputed map. The previous geometric find_peer_block + face-slab copy_peer_face helpers (which missed corner / edge ghosts) were removed. Substage selection via forest_active_substage is preserved.
3.6.1 Why commits 1–3 left the digest unchanged — and what commit 4 fixed
Commit 3 wrote ~50% more ghost cells than the old face-slab exchange, with structurally-correct addressing — and yet the rmf-2realm digest stayed byte-identical to develop-HEAD's 0bf607ad.... Smoking-gun: replacing the per-cell copy with q = 999.0_R8P left residuals unchanged. The seam exchange's writes were being clobbered downstream.
Root cause #1 (BC overwrite): each realm's INI declares physical BCs on all 6 faces (e.g. bc_x_max = Neumann). For a realm whose +x face is glued to another realm by the manifest, that declaration is wrong — the face is a SEAM, not a physical boundary. PRISM's set_boundary_conditions (called by update_ghost AFTER exchange_inter_realm_halos_forest) was extrapolating Neumann values into the seam-ghost cells at every substage, overwriting the peer-interior values the seam exchange wrote.
Root cause #2 (substage-coherence race) — exposed by fixing #1: the previous 2-phase substage loop (assemble → evaluate = compute_residuals + rk%assign_stage back-to-back) had a data race at the seam. rk%assign_stage(s, q=dq) overwrites q_rk(:, interior, :, b, s) with dq in-place (the SSP-RK trick — q_rk(s) holds the dq of stage s for the next stage's linear combination). When realm A's evaluate finished before realm B's, B's seam exchange read A's q_rk(s) — which by then held A's residual buffer, not A's substage state — and B's stencil fed on garbage. The race was previously masked by the BC Neumann overwrite of the corrupted ghosts; once BC was silenced on seam faces, the race surfaced. Empirically: max|q-ghost| jumped 5 orders of magnitude in a single substage on the second-evaluated realm, traced via maxval(abs(q)) instrumentation at every substage entry.
Commit 4 fixes both, together (they had to land in one commit because fixing #1 destabilises the regression without #2):
-
BC_SEAM sentinel added to adam_parameters (= -2_I4P, sibling of BC_PERIODIC = -1). forest_object%populate_inter_realm_topology gains override_seam_bc_in_crown, which walks each realm's local_map_bc_crown (already built during the realm's own initialize_forest) and rewrites column 8 (bc_type) from BC_NEUMANN/etc. to BC_SEAM for entries whose (block, face_1_6) matches a manifest-declared seam. PRISM's set_boundary_conditions dispatch ladder has no branch for BC_SEAM → those entries become silent no-ops at consumption time → the seam-exchange writes survive intact. The manifest acts as authoritative override over each realm's INI at the right semantic layer — post-crown construction, pre-time-loop — with no per-cell mask inside the BC inner loop and no schema extension to the realm INI.
-
Three-phase substage loop in forest_object%evolve_one_step:
- Phase 1 — all realms
assemble_substage_forest(s)
- Phase 2a — all realms
residuals_substage_forest(s) (computes self%dq; reads peer q_rk(s) via the seam exchange — safe, no peer has overwritten q_rk(s) yet)
- Phase 2b — all realms
assign_substage_forest(s) (overwrites q_rk(:, interior, :, b, s) with self%dq — race-free because no peer needs to read it any more)
- Phase 3 — forest-level reflux hook (still skeleton no-op — that's where step-4-of-Phase-A
accumulate_*_flux + apply_reflux wiring will land later)
evaluate_substage_forest is split into residuals_substage_forest + assign_substage_forest on both realm_object (defaults error_stop) and PRISM CPU + FNL (overrides). The old evaluate_substage_forest is kept as a backward-compat wrapper that calls both in sequence — the forest no longer invokes it.
3.6.2 Failed exploration recorded as anti-pattern (NOT committed)
Before the BC_SEAM commit landed, an alternative approach was tried: add inter_realm_face_mask(b, fec_1_6) on adam_maps_object and consult it inside set_boundary_conditions's crown loop to cycle past seam entries. Empirically catastrophic — same residual explosion (O(1e30) at step 1, NaN by step 10) because it triggers the same substage-coherence race the BC_SEAM + 3-phase fix had to address jointly. The per-cell-mask approach is recorded here as anti-pattern: the BC_SEAM crown-override is the right level. The mask exploration is left in the implementation-session transcript only, not in the repo.
3.6.3 Verification at develop HEAD 57df060e
prism-cpu-gnu builds clean (gcc/15.2.0 + openmpi/5.0.10-gnu15.2.0), release and debug.
- rmf single-realm digest: 102/102 within
rtol=1e-6 atol=1e-3 (the BC_SEAM machinery is a true no-op for N=1: the manifest is absent, populate_inter_realm_topology returns early, BC behaves exactly as before; the FV reflux hook is unreachable for fd_centered).
- rmf-2realm CPU regression: PASSes against its committed golden (
run.sh cpu → digest match: 102 rows within rtol=1e-06 atol=0.001).
- Cross-case sanity (rmf-2realm aggregated vs rmf single-realm golden): PASSes 102/102 with 9
SKIP_METADATA lines (per-block-metadata filter active). Zero field-variable mismatches. The 0.77%-at-step-10 By drift is gone.
- Reflux machinery (commits 7 + 8) is fully wired:
forest_object%apply_reflux_corrections runs every substage, walks forest_flux_register%face(:), and applies the full Berger-Colella correction q_coarse += weight·sign·dt/dx·(F_coarse - F_fine_sum). For the rmf-2realm case (FD), the FV hook is unreachable → accumulators stay zero → correction is 0. For a hypothetical FV mirror seam, F_coarse - F_fine_sum is round-off zero by expectation → correction is round-off zero. The non-zero path activates when true coarse-fine AMR adjacency exists; that case is not in the current Phase A regression scope.
- The per-realm
inter_realm_face_register_index(b, fec_1_6) lookup uses a signed-integer convention (positive = this realm is the coarse side of register entry abs(idx), negative = this realm is the fine side, zero = not a seam face). This encodes the coarse/fine role in the lookup itself, so no realm-identity field is needed on the realm object.
3.7 Phase A — what's left
Phase A as originally drafted is fully closed. The remaining items below are deliberate scope-cuts or follow-on phases, not Phase A blockers.
FV-path consumer wiring — landed in commit 8 (108a96a6). compute_residuals_fv_centered now calls accumulate_coarse_flux / accumulate_fine_flux on registered seam faces between reconstruction and the conservative-update loop. The dispatch coarse-vs-fine is encoded in the sign of inter_realm_face_register_index(b, fec_1_6) (+ = coarse side, − = fine side, 0 = not a seam). Per the 2026-05-24 decision: hook implemented but not exercised by a new regression case — PRISM is FD-focused. Correctness oracle was rmf-2realm staying bit-identical (passed: hook unreachable for fd_centered).
- FNL multi-realm path — the per-realm FNL singletons were rebound to value components in earlier work; the 2026-05-25 investigation (§3.8 below) drove the architectural cleanup further (retired
forest_realm/adam_forest_global, threaded realm(:) / s_active / flux_register as dummy arguments throughout) but rmf-2realm/fnl STILL FAILS at the first OpenACC kernel launch after the cross-realm seam exchange. The remaining trigger is below the architectural layer — a WSL-UCX × OpenACC × FUNDAL device-memcpy interaction. Status: not blocking Phase A closure; tracked in §3.8.
- Phase B (shared edge-E for algebraic ∇·B = 0) and Phase C (high-order ghost-fill for non-conforming resolutions) — the next planned phases per §§4 and 5 below; not started.
3.8 FNL multi-realm investigation (2026-05-25) — architectural cleanup landed, runtime bug remains
A multi-day investigation triggered by rmf-2realm/fnl failing at the first inter-realm seam exchange. The investigation produced five landed commits on branch feature/issue13-fnl-realm-dummy-threading plus measured evidence that the residual failure is in the nvfortran-OpenACC / UCX / FUNDAL interaction, NOT in PRISM's own state management.
The five landed commits (in chronological order):
| Commit |
Subject |
Effect |
8000ae7b |
fix(prism-fnl): switch inter-realm seam copy to FUNDAL memcpy |
Replaced !$acc update host on dev_alloc-managed pointers in copy_peer_face_gpu (memory-model violation: ACC present-table vs FUNDAL raw-pointer) with FUNDAL dev_memcpy_from_device / dev_memcpy_to_device. Closed the original "data not found on device" crash; exposed the next bug downstream. |
d2fb7bc8 |
refactor(lib,prism): thread realm(:) as dummy through forest contract |
Added optional realm(:) to initialize_forest, assemble/residuals/assign/evaluate_substage_forest, post_step_forest, update_ghost, compute_residuals_*, save_simulation_data. The forest passes realm=realm at every per-realm dispatch. Lets FNL's intra-realm and inter-realm halo refresh use the dummy-argument path. |
57350354 |
fix(lib): plumb add_adam to field initialize for tree/field consistency |
Surfaced during the investigation: tree_object%initialize has optional add_adam (default .true.); field_object%initialize had no matching dummy and silently set blocks_number = 0 despite the tree being seeded with the ADAM ancestor block. Added add_adam to field, set blocks_number = 1 ON RANK 0 ONLY. adam_object%initialize plumbs the same value to both tree and field. Independent fix; orthogonal to FNL multi-realm. |
90d0343e |
refactor(lib,prism): retire forest_realm class pointer + fix kernel device-mem thrash |
Three improvements in one commit: (a) retired the class(realm_object), pointer :: forest_realm(:) shim from adam_forest_global (drop fallback paths in update_ghost, replace with present(realm)); (b) hoisted host_buf in copy_peer_face_gpu from allocatable + per-call allocate/deallocate to an automatic stack array (eliminated heap-tcache churn that was masking the GPU bug as a UCX-malloc abort); (c) compile-time-bound private arrays in compute_residuals_fd_centered_dev_kernel (replaced 6 × private(qsx_y(1-s1:1+s1), ...) with fixed-size + slice passed at call site — 89% reduction in 3.67 MB device-memory churn per substage). |
9fa34b08 |
refactor(lib,prism): retire adam_forest_global, full dummy-argument threading |
Closes the polymorphic / module-shared-state hygiene track: deletes adam_forest_global.F90; forest_flux_register becomes a type(flux_register_object) :: flux_register VALUE component on forest_object; forest_active_substage becomes an integer, intent(in), optional :: s_active dummy threaded through exchange_inter_realm_halos_forest to copy_peer_face[_gpu]. The forest passes self%flux_register and s through residuals_substage_forest. NO forest module-scope mutable state remains. |
Diagnostic findings (measured, not speculated):
- Five standalone FUNDAL tests (A: single-instance whole-array memcpy; B: two-instance cross-copy; C: polymorphic-class-array + select type + nested component chain; D: kernel/memcpy interleave; E: D + MPI on -np 2) — all PASS. FUNDAL itself + the polymorphic-dispatch pattern + the OpenACC-kernel-launch-interleave-with-FUNDAL-memcpy + MPI all work cleanly in isolation.
- The retirement of
forest_realm (the polymorphic-class-pointer hazard) made progress (the run advances further before crashing — ~3× more set_boundary_conditions_kernel launches than pre-fix) but does NOT eliminate the crash. Confirms the bug is downstream of polymorphic-pointer handling.
- The kernel-private compile-time-bound fix achieved 89% reduction in 3.67 MB device-memory
recycle device memory → pgi_uacc_alloc thrash; the crash persists, at the same location. Confirms device-memory thrash is a real efficiency bug but not the proximate trigger.
- Final crash signature on rmf-2realm/fnl is
MemcpyHtoDAsync CUDA_ERROR_INVALID_VALUE at receive_recv_buffer_ghost_gpu_dev:139 — the OpenACC kernel-argument-block h2d copy for an INTRA-realm ghost-receive kernel. The failing destination address (e.g. 0xe7f729d30) appears in the trace only at the crash line — never allocated, never recycled — so the OpenACC runtime is generating a fresh arg-block destination for the kernel launch and CUDA rejects it.
- No ACC errors anywhere in the 51 MB trace (no
not found in present table, no nil pointers, no double-free signatures). All ~110 k OpenACC events report success up to the kernel-launch arg-block copy that fails.
Refined hypothesis (not yet falsified): the cross-realm acc_memcpy_from_device_f(c_loc(dst), c_loc(peer_realm%rk_fnl%q_rk_gpu(slice)), bytes) inside FUNDAL — although correct in isolation per the standalone tests — interacts with WSL-specific UCX eager-mode buffer registration on the device (the WSL workaround UCX_RNDV_THRESH=inf registers eager buffers that the OpenACC runtime cannot reconcile with its arg-block allocator). The single-realm path doesn't exercise cross-instance device pointers, so the interaction never fires.
What's next (NOT in this Phase A scope):
- Test on a real GPU cluster (Leonardo, IAC) — the WSL workaround envvars are explicitly documented as WSL-only perf regressions; cluster GPUDirect RDMA uses rendezvous + actual UCX device-memory support that may or may not exhibit the same interaction.
- Alternative seam-exchange path that host-stages everything through CPU memory (slower but architecturally insulated from device-runtime interaction).
- File an nvfortran/UCX bug with the WSL repro after isolating which specific operation (the FUNDAL cross-instance memcpy, the eager-mode UCX, or the OpenACC arg-block allocator) introduces the bad device address.
Architectural state at end of investigation (branch feature/issue13-fnl-realm-dummy-threading):
adam_forest_global deleted. No forest_realm, forest_active_substage, forest_flux_register symbols in the binary.
forest_object owns flux_register as a value component.
realm(:) flows exclusively as class(realm_object), intent(inout), optional, target dummy through the contract chain — NO module-scope shims for the realm array.
s_active flows as integer(I4P), intent(in), optional dummy through exchange_inter_realm_halos_forest.
flux_register flows as class(flux_register_object), intent(inout), optional dummy through compute_residuals_interface, present only at the CPU FV-reflux hook (PRISM FNL has no FV residual path; PRISM FD/WENO accept the dummy for contract parity).
- CLAUDE.md pointer-to-derived-type rule is fully observed on the forest path.
- CPU regression preserved (rmf-2realm/cpu PASS). rmf/fnl PASS. rmf-2realm/fnl crash is NOT in the architectural hygiene track.
4. Phase B — Shared edge-E / induction-flux uniqueness at the seam
Goal. Algebraic, round-off-level ∇·B = 0 (and ∇·D = 0 when constrained transport on D is enabled) across every coarse-fine interface. This is the specific divergence-preservation mechanism on top of Phase A's bulk-conservation guarantee.
Why this is structurally analogous to Phase A. Phase A reconciles face-fluxes for conserved variables at the seam; Phase B reconciles edge-fluxes (edge-E) for the magnetic flux. Same lifecycle, same comm pattern, same module shape. The distinction is where in the discretization the registered value enters: A enters the conservative update of the bulk variables, B enters the induction-equation update of B.
4.1 The mathematical mechanism — why shared edge-E gives algebraic ∇·B = 0
Recall the Stokes-curl form of the induction equation that every CT-equivalent discretization uses (Evans & Hawley 1988; Balsara & Spicer 1999; Londrillo–Del Zanna 2004 §2.1 Eq. 11):
$$
\frac{d B_x|_{i+½,j,k}}{dt} = - \frac{1}{\Delta y \Delta z} \big[ \Delta z E_z|_{i+½,j+½,k} - \Delta z E_z|_{i+½,j-½,k} - \Delta y E_y|_{i+½,j,k+½} + \Delta y E_y|_{i+½,j,k-½} \big]
$$
The discrete divergence of B at cell $(i,j,k)$ is then:
$$
(\nabla \cdot B)_{i,j,k} = \frac{B_x|_{i+½} - B_x|_{i-½}}{\Delta x} + \frac{B_y|_{j+½} - B_y|_{j-½}}{\Delta y} + \frac{B_z|_{k+½} - B_z|_{k-½}}{\Delta z}
$$
When the induction equation is applied to each face, every edge-E contributes to two adjacent face updates with opposite sign. Summing the 6 face-divergence contributions for one cell:
- Each $E_z|{i+½, j+½, k}$ appears in $dB_x/dt|{i+½}$ with sign $-1$ and in $dB_y/dt|_{j+½}$ with sign $+1$ (after multiplication by their respective face-area weights).
- The two contributions to $(\nabla \cdot B)$ from this one edge cancel identically (Olivares 2019 §3.2 verbatim: "Since each of the line integrals of the electric field is shared by two faces, but appears with opposite sign in the time update formula for each of them, the rate of change of $(\overline{\nabla \cdot B})_\text{cell}$, which can be calculated as the sum of the rate of change of the outgoing flux through all faces, vanishes.").
This is the algebraic ∇·B = 0 property of CT — exact at the level of integer arithmetic, not just floating-point. It holds provided the same edge-E value is used in both adjacent face-updates.
At a coarse-fine seam (intra-realm AMR jump or inter-realm seam), this property breaks if the coarse-cell induction update and the fine-cell induction update use independently-computed edge-E values that disagree at truncation order. The fix is shared edge-E: at every seam edge, one canonical value is chosen and used by both sides.
4.2 Implementation in cell-centered ADAM — "edge-E" is a derived quantity
In a fully face-staggered CT code (BHAC, Athena++ CT-mode), edge-E is a stored primary variable. ADAM stores B at cell centers; for the equivalent algebraic structure, edge-E becomes a derived quantity computed during residual evaluation from the local cell-centered B and v reconstructions (or, equivalently, the induction-equation flux on each face is derived using a particular combination of corner-states from the L/R cell-centered reconstructions).
The PRISM CPU implementation path:
- PRISM's
compute_convective_fluxes_maxwell family (src/app/prism/cpu/adam_prism_cpu_object.F90:298-308 — currently dispatched to one of compute_convective_fluxes_maxwell{,_div_d,_div_b,_div_d_b}) computes the Maxwell-equation flux at face centers. The induction-equation portion of that flux already implicitly carries the curl-of-E contribution that maps to the BHAC §3.4 UCT2 form (Olivares 2019 Eq. 22):
$$
E_z = \sqrt{\gamma} \Bigg[ \frac{c_x^+ \tilde{V}_L^x B_L^y + c_x^- \tilde{V}_R^x B_R^y - c_x^- c_y^+ (B_R^y - B_L^y)}{c_x^+ + c_x^-} + \frac{c_x^+ c_x^- (B_y^R - B_y^L) - c_y^- c_x^+ (B_x^R - B_x^L)}{(c_x^+ + c_x^-)(c_y^+ + c_y^-)} \Bigg]
$$
(simplified to the non-relativistic limit for PRISM; $\sqrt{\gamma}$ → 1, all velocities are 3-velocities).
- At seam faces, instead of computing edge-E independently on each side, the residual routine reads the shared canonical value from a new
adam_edge_register_object (structurally identical to adam_flux_register_object but indexed by edges, not faces). The routine's call site changes from:
E_seam = compute_edge_E_local(B_local, v_local, ...)
to:
if (edge_register%is_seam_edge(i, j, k, edge_dir)) then
E_seam = edge_register%get_shared_E(seam_edge_id, substage=s)
else
E_seam = compute_edge_E_local(B_local, v_local, ...)
endif
- The shared E is computed on the fine side, using the fine-resolution reconstruction (higher accuracy), then summed to a coarse-edge value via the same restriction-by-averaging Olivares §4.2.1 uses for flux summation:
$$
\overline{E_{\text{coarse edge}}} = \frac{1}{N_{\text{fine edges}}} \sum_{\text{fine edges on this coarse edge}} E_{\text{fine edge}}
$$
(For 2:1 refinement in the plane of the coarse edge, $N = 2$; for 2:1 refinement in both perpendicular directions, $N = 4$.) The averaged value is then broadcast to the coarse side and used in both the coarse-side induction update and the fine-side induction updates that touch this edge.
- The 4 fine cells adjacent to a seam edge and the 1 coarse cell on the other side now share the exact same edge-E. Their induction updates' divergence contributions cancel pairwise across the seam, by the same algebraic property that holds for interior edges. The discrete ∇·B = 0 is preserved across the AMR jump.
4.3 New module — adam_edge_register_object
module adam_edge_register_object
use :: penf, only : I4P, R8P
type, public :: edge_register_edge_t
integer(I4P) :: seam_kind !< SEAM_KIND_*.
integer(I4P) :: coarse_realm
integer(I4P) :: coarse_block
integer(I4P) :: coarse_edge_idx !< Linear index of the edge within the coarse block.
integer(I4P) :: edge_direction !< 1=x, 2=y, 3=z.
integer(I4P), allocatable :: fine_block(:) !< Fine blocks touching this coarse edge.
integer(I4P), allocatable :: fine_edge_idx(:) !< Per-fine-block linear edge indices contributing.
real(R8P), allocatable :: E_shared(:,:) !< (1, nrk) — shared E value per substage. (Scalar per edge — E_z on a z-edge, etc.)
real(R8P), allocatable :: E_fine_sum(:,:) !< (1, nrk) — accumulator on the fine side per substage.
endtype edge_register_edge_t
type, public :: edge_register_object
type(edge_register_edge_t), allocatable :: edge(:)
integer(I4P) :: nedges = 0_I4P
contains
procedure, pass(self) :: initialize
procedure, pass(self) :: register_edge !< Called by topology pass.
procedure, pass(self) :: accumulate_fine_E !< Called by fine-side residual on seam edges.
procedure, pass(self) :: reduce_and_publish !< Called between fine-residual and induction-update.
procedure, pass(self) :: get_shared_E !< Called by induction-equation flux on seam edges.
procedure, pass(self) :: reset
procedure, pass(self) :: destroy
endtype edge_register_object
end module adam_edge_register_object
4.4 Lifecycle (structurally identical to Phase A)
- Init/regrid: topology pass registers all seam edges via
register_edge.
- Per-step reset:
edge_register%reset zeroes accumulators.
- Per-substage:
- Fine-side residual:
accumulate_fine_E is called for each seam edge as part of the flux computation.
- Between fine-residual and induction-update:
reduce_and_publish performs the MPI reduce + the 2:1 (or 4:1) averaging onto the coarse-edge index, writing the result into E_shared.
- Coarse-side induction-update:
get_shared_E retrieves the canonical value.
- End of substage: any further reflux step in Phase A sees a coarse-side B that is already algebraically divergence-free at the seam.
4.5 References for Phase B
- Evans, C. R. & Hawley, J. F. (1988). "Simulation of magnetohydrodynamic flows: a constrained transport method." ApJ 332:659-677. doi:10.1086/166684. — The foundational constrained-transport paper, the source of the algebraic ∇·B = 0 property.
- Balsara, D. S. & Spicer, D. S. (1999). "A staggered mesh algorithm using high-order Godunov fluxes to ensure solenoidal magnetic fields in MHD simulations." JCP 149:270-292. — The first practical CT scheme with arithmetic-averaging corner-E; superseded for stability by UCT (Londrillo-Del Zanna 2004) but still the simplest correct form.
- Londrillo, P. & Del Zanna, L. (2004). "On the divergence-free condition in Godunov-type schemes for ideal magnetohydrodynamics: the upwind constrained transport method." JCP 195:17-48. doi:10.1016/j.jcp.2003.09.016. — The UCT formalism. §2.1 Fig. 1 (the Yee mesh geometry) + §2.2 (the consistency demands).
- Olivares, H. et al. (2019). "Constrained transport and adaptive mesh refinement in the Black Hole Accretion Code." A&A 629:A61. — The production-code engineering blueprint. §3.2 Eq. 16 (the algebraic ∇·B = 0 statement) and §3.4 UCT2 Eq. 22 (the upwinded corner-E reconstruction formula adapted to PRISM's non-relativistic limit).
- Tóth, G. (2000). "The ∇·B = 0 constraint in shock-capturing magnetohydrodynamics codes." JCP 161:605-652. — Three-camp taxonomy. Phase B places PRISM in the CT camp at the seam, while keeping the cleaning machinery (
divergence_correction = hyperbolic|poisson) for residual interior errors. Cost: ~+4-5% of base scheme (Table I).
4.6 Phase B acceptance criteria
5. Phase C — High-order ghost-fill with hybrid smooth/shock switch
Goal. Order-matched ghost-cell values across the seam, so the WENO/centered-FD interior order is not capped to 1st by the ghost-fill. This is independent of Phases A and B (different concern: ghost-cell accuracy vs flux conservation).
Status of G2 after Phase C. Phase C does not change ∇·B. It changes the accuracy of non-magnetic variables at the seam (ρ, momentum, energy) when realms or AMR levels differ in resolution. Phase C is not needed for the current rmf-2realm regression (same resolution across realms — the existing same-resolution mirror copy is already exact). Phase C is needed for any future case with non-conforming resolution at the seam — heterogeneous-resolution forests, AMR-active multi-realm runs, the rmf-2realm-amr case that issue #13's follow-up Phase D backlog points at.
5.1 The current defect — 0th-order injection
Read from source, src/lib/common/adam_maps_object.F90:1024-1042:
else
! receiving from a block coarser than me
do k=kmin, kmax
do j=jmin, jmax
do i=imin, imax
kkk = 2 * k + kdelta
jjj = 2 * j + jdelta
iii = 2 * i + idelta
do kc=0,1 ; do jc=0,1 ; do ic=0,1
self%local_map_ghost_cell(c,1:2) = [b_send, b_recv]
self%local_map_ghost_cell(c,3:5) = [i, j, k] ! ← same coarse cell for all 8
self%local_map_ghost_cell(c,6:8) = [iii+ic, jjj+jc, kkk+kc]
self%local_map_ghost_cell(c, 9 ) = 1
c = c + 1
enddo ; enddo ; enddo
enddo
enddo
enddo
endif
All 8 fine ghost cells (iii+ic, jjj+jc, kkk+kc) are written from the same single coarse cell (i, j, k) — piecewise-constant injection, 0th-order. This is the executed kernel in src/lib/common/adam_field_object.F90:856-857:
if (one_or_eight==1) then
q(v,i_recv,j_recv,k_recv,b_recv) = q(v,i_send,j_send,k_send,b_send) ! ← the 0th-order injection
5.2 The replacement — dimension-by-dimension WENO ghost-fill
The kernel is replaced by a 1D WENO interpolation in each dimension, applied successively, using the same WENO-coefficient machinery as the interior reconstruction (adam_weno_object). For a 5th-order WENO interp filling fine-ghost cell $(iii+ic, jjj+jc, kkk+kc)$ from a coarse stencil ${i-2, i-1, i, i+1, i+2}$:
$$
q_{\text{fine ghost}}(iii+ic, jjj+jc, kkk+kc) = \sum_{p=-2}^{+2} w_p^{(\text{fractional offset})} \cdot q_{\text{coarse}}(i+p, j, k)
$$
then repeat for the $j$ and $k$ directions on the partial result. The coefficients $w_p$ depend on the fractional position of the fine-cell-center within the coarse-cell volume (e.g. ±0.25 of the coarse cell size for ic=0/1 in a 2:1 refinement) and on the WENO smoothness indicators evaluated on the coarse stencil. This is the classical structured-mesh AMR-WENO interpolation (Shen-Qiu-Christlieb 2011 §4; arXiv:2511.08335 §4.2.3).
The local_map_ghost_cell plan structure is unchanged. The 9-column-per-row tuple already encodes everything needed; the kernel inside update_ghost_local (src/lib/common/adam_field_object.F90:822-869) is what changes. A new transfer_kind column may be added to local_map_ghost_cell to dispatch among {0th-injection legacy, 8-cell average existing, WENO-interp new, hybrid-shock new} — this is the operator-family architecture the literature review (§6m.3) called out.
5.3 The hybrid smooth/shock switch
A naive 5th-order WENO interp at a shock will produce overshoots — even though ghost cells are read-only and do not directly violate conservation, they feed into the interior WENO reconstruction at the next substage, and if the ghost values are oscillatory the interior reconstruction triggers troubled-cell behavior. The fix from arXiv:2511.08335 §5.2:
-
Troubled-cell detector at the seam. A simple, robust detector adequate for PRISM:
- Roe-based: compute the jump $|\Delta \rho| / \min(\rho_L, \rho_R)$ and the pressure-curvature $\theta = |p_{i+1} - 2 p_i + p_{i-1}| / (p_{i+1} + 2 p_i + p_{i-1})$ on the coarse stencil.
- If $\theta > 10^{-2}$ or $|\Delta \rho| / \min > 0.2$, mark the cell as troubled.
-
Fallback at troubled cells. Drop from 5th-order WENO to 2nd-order conservative interpolation (a simple slope-limited reconstruction of the coarse cell value with the slope minmodded against neighbors). This is non-oscillatory by construction.
The hybrid switch lives entirely inside the new transfer_kind=WENO_HYBRID kernel in update_ghost_local. It does not affect the plan structure.
5.4 Conservation-vs-order decoupling (the critical insight)
arXiv:2511.08335 §4.2.3, verbatim:
"this process does not introduce any loss of conservation, even though a non-conservative interpolation is employed. This is because the interpolation is only applied to populate ghost cells, rather than to update the valid solution cells."
In ADAM's idiom: ghost cells are read by compute_residuals_*, which produces fluxes, which then update the valid cells. Conservation of the valid cells is enforced by the flux register (Phase A), not by the ghost-fill. Ghost-fill non-conservativity is therefore safe to absorb in exchange for high order. This dissolves the FD-conservation worry that the original literature review wrestled with.
5.5 References for Phase C
- Shen, C., Qiu, J.-M. & Christlieb, A. (2011). "Adaptive mesh refinement based on high order finite difference WENO scheme for multi-scale simulations." JCP 230(9):3780-3802. doi:10.1016/j.jcp.2011.02.008. — The foundational FD-WENO AMR ghost-fill paper. §4 (the WENO interpolation), §5 (hybrid smooth/shock switch).
- arXiv:2511.08335 (2025). "An Improved High-order AMR Framework for Shock-turbulence Interaction based on cell-centered finite difference schemes." — Modern cell-centered FD treatment. §4.2.3 (conservation-vs-order decoupling), §4.2.3 Eq. 15 (RK3 temporal interpolation for subcycled levels — reserved for future subcycled forests, not needed in current lockstep mode), §5.2 (troubled-cell detector).
- Couch, S. M. et al. (2021). Flash-X Spark §2.2 — production reference for the cell-centered AMR-WENO ghost-fill in a code structurally identical to ADAM.
5.6 Phase C acceptance criteria
6. Execution order and dependencies
| Phase |
Depends on |
Unblocks |
Closes (G1, G2, accuracy) |
| A — Reflux |
D.4a (done); existing comm primitives |
B, C |
bulk conservation across seam |
| B — Edge-E sharing |
A (the lifecycle pattern is reused) |
nothing |
algebraic ∇·B = 0 across seam |
| C — WENO ghost-fill |
nothing structural (independent of A, B) |
non-conforming-resolution forests |
seam accuracy at interior order |
A → B → C is the dependency order and the engineering-value order:
- A alone restores conservation and (per literature evidence) may close most of the
By 0.77% drift because the dominant error term in current rmf-2realm is the unrefluxed bulk-variable update, not the divergence-cleaning residual.
- A+B delivers the full G2 goal: algebraic ∇·B = 0 across the seam, cell-centered storage preserved.
- A+B+C delivers G1 (unified operator family across same-res and non-conforming seams).
C can be deferred until a non-conforming-resolution use case exists. A and B should land together: A without B leaves the seam ∇·B at truncation order, which is better than current but does not realize the full guarantee.
7. Acceptance criteria for the whole issue
8. Out of scope (deferred)
- Face-staggering of B, D (the Balsara 2001 / Li-Li 2004 / Olivares 2019 route). Examined in
sources/papers_20260522_forest_octree_halo.md; not chosen because cell-centered + interface machinery (this issue) achieves the same algebraic ∇·B = 0 guarantee without restructuring physics%nv. The face-staggered route remains a forward-looking alternative if PRISM ever grows curl-constrained vector fields (Balsara 2023, GPR fluids, GR FO-CCZ4) where the partial-staggering pattern scales further — at which point a single restructure delivers both div- and curl-preservation.
- High-order temporal interpolation between substages on subcycled forests (RK3 Taylor expansion, arXiv:2511.08335 Eq. 15). Reserved hook; not needed while
forest_active_substage lockstep is the execution model.
- Constrained-transport on E (the curl-preserving extension, Balsara-Käppeli-Boscheri-Dumbser 2023). Not needed for PRISM's current Maxwell scope.
9. Persistent context — links to the artifacts produced
Appendix H — Historical scope (the completed Phase D design work)
This appendix preserves the original Phase D design content of issue #13 (D.1 through D.4a, all completed) as design history. The forward plan is the reformulation above; the appendix is for reference only.
H.1 Phase D was the first concrete N>1 use case design
Phase D of #10 was design-blocked until a concrete N>1 use case was specified. The chosen scope: split the existing rmf regression case along x=0 into two adjacent realms. The union of the two realms reconstructs the single-realm rmf domain exactly, so the existing rmf golden serves as the bit-comparable reference solution. Front-loading the Phase D infrastructure work — per-realm config, MPI carve-out, inter-realm halo exchange via adam_maps_object, time-step coordination, AMR policy — onto a use case where every component was verifiable in isolation.
H.2 Use case specification (rmf-2realm)
- Physics: identical to
src/tests/prism/regression/rmf/. Maxwell, EM model, four AC coils, Neumann BCs, no IB, no AMR, 10 iterations, CFL=0.5, fd_centered order 6, SSP-RK 5-stage, weno-u-1.
- Topology: single-axis split at x=0. realm 1: x ∈ [-0.06, 0], ni=8. realm 2: x ∈ [0, +0.06], ni=8. dx=0.0075 in both, matches single-realm.
- Coils: 4 coils; coils 1 & 2 straddle x=0 (each realm computes its own cells); coils 3 & 4 entirely in one realm.
- MPI: replicated forest. Both ranks own both realms. Inter-realm halo is local memcpy on each rank.
- Time-step: globally synchronous single dt via
forest%compute_global_dt.
- AMR: disabled at v1.
- Configuration: manifest + per-realm INI files.
H.3 Phase D task breakdown (final status)
- D.1 Phase D design (this document). Done.
- D.2 Extend
adam_maps_object with inter_realm_neighbor_t + exchange_inter_realm_halos_forest TBP. Done 2026-05-18. Per-substage exchange granularity question reopened, resolved in D.4a via path (a) — split advance_one_step_forest into per-substage TBPs.
- D.3 Add manifest parser
adam_forest_manifest.F90 + manifest-aware driver. Done 2026-05-18 (commit dffa19be).
- C.3 closure Post-D.3 follow-up retargeting
adam_adam_global shim. Done 2026-05-19.
- D.4a Multi-realm orchestrator scaffolding (
adam_forest_global exposing forest_realm(:) + forest_active_substage; realm per-substage TBPs prepare_step_forest, assemble_substage_forest, evaluate_substage_forest, finalize_step_forest, exchange_inter_realm_halos_forest; bind_my_globals_forest); rmf-2realm case files; digest.py aggregation. Done 2026-05-19 (commit c2a77708, cleanup d73b9ca8).
- D.4b Capture rmf-2realm CPU golden + confirm bit-comparability. Superseded by the present issue body §§3-5 — empirical evidence showed the rmf-2realm seam is not bit-comparable to single-realm because the seam coupling is structurally incomplete (the present work).
- D.5 Document architecture in
src/lib/common/README.md. Deferred until Phases A/B/C land.
H.4 May 2026 multi-realm bug fixes
Two genuine multi-realm engineering bugs surfaced when rmf-2realm was first run and were fixed before this issue was reformulated:
771dbd2f — per-realm memory budget. Each realm was grabbing the full per-process budget at allocation time, causing OOM for N>1 forests. Fix: divide the per-process budget across the realms in the forest.
bf324ccd — per-realm MPI_FINALIZE. Each realm was calling MPI_FINALIZE independently, causing post-finalize abort after the first realm shut down. Fix: single forest-level MPI finalize.
After these fixes, rmf-2realm runs clean to completion on CPU. The residual By digest drift (0.77%/10 steps) is the seam coupling defect that the §§3-5 forward plan addresses.
H.5 Phase D risks (resolved)
The original risks section is preserved verbatim for design history:
Inter-realm halo exchange ordering — resolved by D.4a path (a): split advance_one_step_forest into per-substage TBPs driven by the forest, with exchange_inter_realm_halos_forest called between substages via the forest_realm module pointer.
adam_maps_object blast radius — mitigated by keeping inter-realm additions purely additive; existing intra-realm exchange code unchanged.
INI parsing complexity creep — removed by the manifest-design revision (2026-05-18); each realm uses an unchanged PRISM parser on its own complete INI.
C.3 deferred shim retargeting — discharged 2026-05-19 by the C.3 closure commit.
H.6 Phase D out of scope (still deferred; restated for completeness)
- Disjoint-rank MPI carve-out.
- AMR across realm boundaries (Phase C of this issue's forward plan addresses the technical infrastructure).
- Per-realm dt / sub-cycling.
- Per-realm physics or numerical scheme.
- Schwarz-style overlap coupling.
- Three or more realms.
- N>1 in NASTO / CHASE / PATCH.
These remain on the Phase D backlog. The Phases A/B/C work in this issue scaffolds the interface machinery for any of them.
Coarse-fine interface machinery for ADAM/PRISM — Berger-Colella reflux, edge-E uniqueness, high-order ghost-fill
1. Motivation — what the rmf-2realm regression revealed
The rmf-2realm regression case (
src/tests/prism/regression/rmf-2realm/, see Appendix H for the construction) was built as a free correctness oracle: the two adjacent realms reconstruct the single-realmrmfdomain by union, so the existingrmfgolden should be bit-comparable to the digest-aggregated 2-realm output.After fixing two genuine multi-realm engineering bugs in May 2026 — per-realm memory budget over-allocation (
771dbd2f) and per-realmMPI_FINALIZErace (bf324ccd) — the case runs clean to completion on CPU. The aggregated digest, however, does not match the single-realmrmfgolden. TheBycomponent shows a systematic drift at the inter-realm seam: 0% at t=0, 0.55% at step 5, 0.77% at step 10. The drift is monotonic, smooth, and spatially localized at the seam — it is not a sign error, an indexing bug, or a substage-staleness artifact (all three were eliminated during diagnosis).The root cause is structural: the current
exchange_inter_realm_halos_forest(src/app/prism/cpu/adam_prism_cpu_object.F90:1181-1493) is a partial implementation of the inter-realm seam coupling. Specifically:Bydrift.find_peer_blockis local-rank-only (line 1296). Off-rank peers hitcycle(line 1346) and the corresponding seam ghost is never written. For the current rmf-2realm replicated forest (both ranks own both realms) this defect is dormant; it activates immediately under disjoint-rank carve-out.copy_peer_facewrites the axis-aligned face slab only (lines 1366-1492). Edge and corner ghosts are not written. WENO stencils that read corner/edge ghosts at the seam see uninitialized or stale values. This is the same bug class AMReX fixed in their issue #568 (themulti_ghostsend-tag fix).The first defect — missing flux/edge-E reconciliation — is the dominant contributor to the
Bydrift (it produces a truncation-order error that grows linearly with step count) and is the focus of this issue. Defects 2 and 3 are addressed as part of the same redesign.2. Literature foundation — what the field has converged on
A full source-verified review is in
sources/papers_20260522_forest_octree_halo.md. The operative consensus, after reading 11 primary sources end-to-end:2.1 The cell-centered AMR divergence-preservation question, resolved
The relevant question for ADAM/PRISM is not "can a local pointwise interpolation operator preserve ∇·B = 0 on cell-centered storage?" — six independent sources (Tóth & Roe 2002 §2/§7, Balsara 2001, Londrillo–Del Zanna 2004 §2.2, Li & Li 2004 p.2, Olivares 2019 §2, Balsara 2009 §2) converge that this question is under-determined: one cell-averaged B-vector per cell cannot uniquely close the face-flux + curl constraints required for an exact local prolongation.
The relevant question is "can the discretization as a whole, with cell-centered storage everywhere, preserve ∇·B = 0 across AMR jumps by adding interface machinery (reflux, shared edge-E, unique flux reconstruction)?" — and the answer is yes, demonstrated by production codes (Flash-X Spark with cell-centered MHD + flux registers; BATSRUS in cell-centered mode + flux-CT; Athena++ FluxCorrection). The mechanism is:
2.2 The 5-author algorithmic consensus on RK-multistage refluxing
For Runge-Kutta time integration (PRISM uses SSP-RK 1/2/3/4/5), the reflux is stage-weighted, not a single end-of-step correction. The formula, in the form most directly applicable to ADAM's existing
adam_rk_object(which carriesark(s), brk(s), crk(s)weights andalph(s,ss), beta(s), gamm(s)matrices), is:where$W_s$ is the SSP-RK accumulation weight for substage $s$ (derived from
ark/brk/crkfor the SSP-RK form ADAM uses). This is the formula in Wang et al. 2018, JCP 360 (eq. 23) for GPU-SAMR, generalized in Berger & Colella 1989, JCP 82:64 for the foundational case. Couch et al. 2021 §2.2 confirms the implementation in Flash-X Spark: "a linear combination of flux calculations can also be stored in the flux register for a correction to be applied to the fluxes at the end of the update step."2.3 The high-order ghost-fill picture
When realms are conforming and at the same resolution (current rmf-2realm), ghost-fill is a same-resolution mirror copy and order is preserved trivially. When realms (or AMR levels intra-realm) differ in resolution, the existing ADAM ghost-fill drops to 0th-order piecewise-constant injection (
src/lib/common/adam_maps_object.F90:1024-1042— the "smoking gun" identified in the literature review). This caps global accuracy at the interface to 1st-order in a one-ghost-thick shell, regardless of the interior scheme order.The replacement is dimension-by-dimension high-order WENO interpolation of the coarse cell-centered data, with a hybrid smooth/shock switch (5th-order WENO in smooth regions, 2nd-order conservative interp at shocks) governed by a troubled-cell detector. References: Shen, Qiu & Christlieb 2011 JCP 230:3780; arXiv:2511.08335 (2025) for the modern cell-centered FD treatment.
Critical decoupling (the insight that makes this safe): "this process does not introduce any loss of conservation, even though a non-conservative interpolation is employed. This is because the interpolation is only applied to populate ghost cells, rather than to update the valid solution cells." (arXiv:2511.08335 §4.2.3.) Conservation is enforced separately by the flux register (Phase A); ghost-fill order is enforced by WENO interpolation (Phase C). These two mechanisms are independent.
3. Phase A — Berger-Colella reflux on the existing flux array
Goal. Strict conservation of all bulk variables (ρ, momentum, energy, ψ, φ, and the B/D components) across every coarse-fine interface (intra-realm AMR jump AND inter-realm seam). This is the prerequisite for Phase B.
Status of G2 after Phase A alone. Conservation is exact; ∇·B is preserved if and only if the underlying induction-equation discretization is naturally CT-like (telescoping curl-of-E). For PRISM's current Roe-family Riemann + cleaning numerics, Phase A alone produces conservation-correct but truncation-order ∇·B at the seam — better than current, not yet algebraic. Phase B closes that gap.
3.1 New module —
adam_flux_register_objectA new module in
src/lib/common/, structurally parallel toadam_maps_object(additive, no modification to existing types). The data structure:The register does not own field data; it owns the per-substage flux contributions on a thin "skin" of cells along every seam face. Memory cost is therefore$O(\text{seam-area} \times \text{nv} \times \text{nrk})$ — bounded by the surface-area-times-stages of the AMR boundaries, independent of the interior cell count.
3.2 Lifecycle and hook points
Initialization. During the topology pass that runs once at init/regrid, the inter-realm topology already discovered by
forest_object%populate_inter_realm_topology(src/lib/common/adam_forest_object.F90:368-406) is replayed to callflux_register%register_facefor every inter-realm face pair. For intra-realm AMR, an analogous pass walks each realm'sadam_grid_objectneighbor table at every refinement level, callingregister_facefor every fine-coarse adjacency. The register'sface(:)array is fully populated before the first time step.Per-step reset.
flux_register%resetis called fromforest_object%evolve_one_step(src/lib/common/adam_forest_object.F90:194) at the top of every step, zeroingF_coarseandF_fine_sumfor all registered faces.Flux accumulation during residual computation. Inside each realm's
compute_residuals_*routine (e.g.compute_residuals_fd_centeredatsrc/app/prism/cpu/adam_prism_cpu_object.F90:1873-2056), when the flux through a face has been computed, the routine checks whether the face is registered (via a precomputed mask) and if so calls:The mask check is a single integer compare per face, so the overhead on interior faces is negligible.
Reflux application. At the end of each substage, before the next ghost-update, the forest invokes:
apply_refluxperforms the rank-to-rank reduce ofF_fine_sum(if the fine and coarse sides of a face are on different ranks; this is a separate MPI exchange from the ghost-update, with its own communicator pattern), then applies the correction to the coarse-sideqarray:where$W_s$ is the SSP-RK accumulation weight. For SSP-RK 3 (ADAM's $W_1 = W_2 = W_3 = 1$ in the natural decomposition; for SSP-RK 5 stage-weights are read from
nrk = 3),adam_rk_object%ark(s).Verification. A full step's worth of refluxing must satisfy conservation to round-off: for any conserved quantity$\Phi = \int q, dV$ ,
with no contribution from internal coarse-fine interfaces — this is the discrete divergence theorem, and it is the unit-test acceptance criterion for Phase A.
3.3 Integration with the existing
comm_map_*machineryThe flux register's MPI exchange is a separate communication pattern from the ghost-cell exchange (different cell sets, different send/recv directions). It does not plug into
comm_map_send_ghost/comm_map_recv_ghost(src/lib/common/adam_maps_object.F90:96-99). Instead, the register builds its own (send_tags, recv_tags) tables at registration time, indexed by (fine rank, coarse rank, face) — structurally identical to AMReX'sFabArrayBase::CommMetaData. The pack/unpack uses contiguousflux_register_face_t%F_fine_sumbuffers, no transposition needed.The intra-realm AMR case typically has the fine and coarse sides on the same rank (because both belong to the same realm and ADAM's intra-realm partitioning keeps levels co-resident); the inter-realm case may cross ranks. The register handles both transparently via the (send_tags, recv_tags) abstraction.
3.4 References and citations for Phase A
FluxRegisterandYAFluxRegisterdata structure design ADAM'sflux_register_objectmirrors.3.5 Phase A acceptance criteria
adam_flux_register_objectmodule exists, compiles in all backends (CPU, FNL, NVF).forest_objectregisters all inter-realm seam faces at init time; per-step reset hooked inevolve_one_step.compute_residuals_*routines accumulate fluxes on registered faces; reflux applied between substages.Bydigest drift compared to single-realmrmfgolden: report the post-Phase-A magnitude. Expected: significantly reduced from current 0.77%/10 steps but not to zero (zero requires Phase B).3.6 Phase A implementation log — what landed, in 8 commits
Phase A is fully complete on
develop. The rmf-2realm aggregated CPU digest matches the single-realmrmfgolden across every field variable; the 9 metadata mismatches that previously cluttered the cross-case comparison are now filtered toSKIP_METADATAbydigest.py compare. The 0.77%-at-step-10Bydrift documented inHANDOFF_rmf2realm_2026-05-21.mdis closed. A regression golden for the case is committed;run.shPASSesrmf-2realm/cpuagainst its own golden. The flux-register infrastructure for the FV-path reflux is wired intoforest_objectAND the PRISM CPU consumer (compute_residuals_fv_centered) now actively callsaccumulate_*_fluxon registered seam faces; for the rmf-2realm case (which usesfd_centered) the FV hook is unreachable, the accumulators stay zero, and reflux remains a numerical no-op — by design, per the 2026-05-24 decision (PRISM is FD-focused, no FV regression case in scope).The eight commits, in order:
6b1307497862459d1533eBydrift exposed but not yet closed — see §3.6.1 for why)6987c20rmfgolden)d3e50e5run.shPASSes the case against itfa20706rmf-2realmvsrmfgolden comparison now exits 0 (filters the 9 benign metadata rows toSKIP_METADATA)fd91be6accumulate_*_fluxyet — see §3.7)108a96a6fd_centered, hook unreachable). Hook is live for FV consumers; same-resolution mirror seam givesF_coarse - F_fine_sum = round-off zeroby expectation.Merged into
developasac6edec0(commits 1–3),993f81fe(commit 4),d3e50e5direct (commit 5),fa20706direct (commit 6),c3648ef9(commit 7 fromfeature/issue13-phase-a-fv-reflux), and57df060e(commit 8 fromfeature/issue13-phase-a-fv-reflux-hook). HEAD ondevelopafter closure:57df060e.Material work landed by commits 1–3:
adam_flux_register_objectexists as a program-scope singleton (forest_flux_registeronadam_forest_global) with full TBP signatures (initialize,register_face,accumulate_coarse_flux,accumulate_fine_flux,apply_reflux,reset,destroy). Topology hooks fire at init (populate_inter_realm_topology), per-step reset (evolve_one_step), and per-substage apply (between phases of the substage loop). Accumulators (F_coarse,F_fine_sum) are allocated(nv, nface_cells, nrk)per registered face.accumulate_*_fluxandapply_refluxremain skeleton no-ops pending Phase A step 4 (PRISMcompute_residuals_*wiring); the lifecycle infrastructure is complete and inert.adam_maps_objectgainedinter_realm_ghost_cell(:,1:10)— a per-cell ghost map analogous to the intra-realmlocal_map_ghost_cellmachinery, with layout[peer_realm, b_send, b_recv, i_send, j_send, k_send, i_recv, j_recv, k_recv, one_or_eight]. Built once at init byforest_object%build_inter_realm_ghost_cell_map, which enumerates every ghost cell in every seam-block's full ghost slab (face + edge + corner — the defect-B fix) and resolves the matching peer realm + block + interior cell via a geometricfind_peer_celltest againstfield%emin/emax. Coverage for rmf-2realm: 9348 map rows per realm per rank (6144 face + 2880 edge + 324 corner).prism_cpu_object%exchange_inter_realm_halos_forestis now a flat indexed loop over the precomputed map. The previous geometricfind_peer_block+ face-slabcopy_peer_facehelpers (which missed corner / edge ghosts) were removed. Substage selection viaforest_active_substageis preserved.3.6.1 Why commits 1–3 left the digest unchanged — and what commit 4 fixed
Commit 3 wrote ~50% more ghost cells than the old face-slab exchange, with structurally-correct addressing — and yet the rmf-2realm digest stayed byte-identical to develop-HEAD's
0bf607ad.... Smoking-gun: replacing the per-cell copy withq = 999.0_R8Pleft residuals unchanged. The seam exchange's writes were being clobbered downstream.Root cause #1 (BC overwrite): each realm's INI declares physical BCs on all 6 faces (e.g.
bc_x_max = Neumann). For a realm whose+xface is glued to another realm by the manifest, that declaration is wrong — the face is a SEAM, not a physical boundary. PRISM'sset_boundary_conditions(called byupdate_ghostAFTERexchange_inter_realm_halos_forest) was extrapolating Neumann values into the seam-ghost cells at every substage, overwriting the peer-interior values the seam exchange wrote.Root cause #2 (substage-coherence race) — exposed by fixing #1: the previous 2-phase substage loop (
assemble→evaluate=compute_residuals+rk%assign_stageback-to-back) had a data race at the seam.rk%assign_stage(s, q=dq)overwritesq_rk(:, interior, :, b, s)withdqin-place (the SSP-RK trick —q_rk(s)holds the dq of stagesfor the next stage's linear combination). When realm A's evaluate finished before realm B's, B's seam exchange read A'sq_rk(s)— which by then held A's residual buffer, not A's substage state — and B's stencil fed on garbage. The race was previously masked by the BC Neumann overwrite of the corrupted ghosts; once BC was silenced on seam faces, the race surfaced. Empirically:max|q-ghost|jumped 5 orders of magnitude in a single substage on the second-evaluated realm, traced viamaxval(abs(q))instrumentation at every substage entry.Commit 4 fixes both, together (they had to land in one commit because fixing #1 destabilises the regression without #2):
BC_SEAM sentinel added to
adam_parameters(= -2_I4P, sibling ofBC_PERIODIC = -1).forest_object%populate_inter_realm_topologygainsoverride_seam_bc_in_crown, which walks each realm'slocal_map_bc_crown(already built during the realm's owninitialize_forest) and rewrites column 8 (bc_type) fromBC_NEUMANN/etc. toBC_SEAMfor entries whose (block, face_1_6) matches a manifest-declared seam. PRISM'sset_boundary_conditionsdispatch ladder has no branch forBC_SEAM→ those entries become silent no-ops at consumption time → the seam-exchange writes survive intact. The manifest acts as authoritative override over each realm's INI at the right semantic layer — post-crown construction, pre-time-loop — with no per-cell mask inside the BC inner loop and no schema extension to the realm INI.Three-phase substage loop in
forest_object%evolve_one_step:assemble_substage_forest(s)residuals_substage_forest(s)(computesself%dq; reads peerq_rk(s)via the seam exchange — safe, no peer has overwrittenq_rk(s)yet)assign_substage_forest(s)(overwritesq_rk(:, interior, :, b, s)withself%dq— race-free because no peer needs to read it any more)accumulate_*_flux+apply_refluxwiring will land later)evaluate_substage_forestis split intoresiduals_substage_forest+assign_substage_foreston bothrealm_object(defaults error_stop) and PRISM CPU + FNL (overrides). The oldevaluate_substage_forestis kept as a backward-compat wrapper that calls both in sequence — the forest no longer invokes it.3.6.2 Failed exploration recorded as anti-pattern (NOT committed)
Before the BC_SEAM commit landed, an alternative approach was tried: add
inter_realm_face_mask(b, fec_1_6)onadam_maps_objectand consult it insideset_boundary_conditions's crown loop tocyclepast seam entries. Empirically catastrophic — same residual explosion (O(1e30) at step 1, NaN by step 10) because it triggers the same substage-coherence race the BC_SEAM + 3-phase fix had to address jointly. The per-cell-mask approach is recorded here as anti-pattern: the BC_SEAM crown-override is the right level. The mask exploration is left in the implementation-session transcript only, not in the repo.3.6.3 Verification at develop HEAD
57df060eprism-cpu-gnubuilds clean (gcc/15.2.0 + openmpi/5.0.10-gnu15.2.0), release and debug.rtol=1e-6 atol=1e-3(the BC_SEAM machinery is a true no-op for N=1: the manifest is absent,populate_inter_realm_topologyreturns early, BC behaves exactly as before; the FV reflux hook is unreachable forfd_centered).run.sh cpu→digest match: 102 rows within rtol=1e-06 atol=0.001).SKIP_METADATAlines (per-block-metadata filter active). Zero field-variable mismatches. The 0.77%-at-step-10Bydrift is gone.forest_object%apply_reflux_correctionsruns every substage, walksforest_flux_register%face(:), and applies the full Berger-Colella correctionq_coarse += weight·sign·dt/dx·(F_coarse - F_fine_sum). For the rmf-2realm case (FD), the FV hook is unreachable → accumulators stay zero → correction is0. For a hypothetical FV mirror seam,F_coarse - F_fine_sumis round-off zero by expectation → correction is round-off zero. The non-zero path activates when true coarse-fine AMR adjacency exists; that case is not in the current Phase A regression scope.inter_realm_face_register_index(b, fec_1_6)lookup uses a signed-integer convention (positive = this realm is the coarse side of register entryabs(idx), negative = this realm is the fine side, zero = not a seam face). This encodes the coarse/fine role in the lookup itself, so no realm-identity field is needed on the realm object.3.7 Phase A — what's left
Phase A as originally drafted is fully closed. The remaining items below are deliberate scope-cuts or follow-on phases, not Phase A blockers.
FV-path consumer wiring— landed in commit 8 (108a96a6).compute_residuals_fv_centerednow callsaccumulate_coarse_flux/accumulate_fine_fluxon registered seam faces between reconstruction and the conservative-update loop. The dispatch coarse-vs-fine is encoded in the sign ofinter_realm_face_register_index(b, fec_1_6)(+ = coarse side, − = fine side, 0 = not a seam). Per the 2026-05-24 decision: hook implemented but not exercised by a new regression case — PRISM is FD-focused. Correctness oracle was rmf-2realm staying bit-identical (passed: hook unreachable forfd_centered).forest_realm/adam_forest_global, threadedrealm(:)/s_active/flux_registeras dummy arguments throughout) but rmf-2realm/fnl STILL FAILS at the first OpenACC kernel launch after the cross-realm seam exchange. The remaining trigger is below the architectural layer — a WSL-UCX × OpenACC × FUNDAL device-memcpy interaction. Status: not blocking Phase A closure; tracked in §3.8.3.8 FNL multi-realm investigation (2026-05-25) — architectural cleanup landed, runtime bug remains
A multi-day investigation triggered by
rmf-2realm/fnlfailing at the first inter-realm seam exchange. The investigation produced five landed commits on branchfeature/issue13-fnl-realm-dummy-threadingplus measured evidence that the residual failure is in the nvfortran-OpenACC / UCX / FUNDAL interaction, NOT in PRISM's own state management.The five landed commits (in chronological order):
8000ae7b!$acc update hostondev_alloc-managed pointers incopy_peer_face_gpu(memory-model violation: ACC present-table vs FUNDAL raw-pointer) with FUNDALdev_memcpy_from_device/dev_memcpy_to_device. Closed the original "data not found on device" crash; exposed the next bug downstream.d2fb7bc8realm(:)toinitialize_forest,assemble/residuals/assign/evaluate_substage_forest,post_step_forest,update_ghost,compute_residuals_*,save_simulation_data. The forest passesrealm=realmat every per-realm dispatch. Lets FNL's intra-realm and inter-realm halo refresh use the dummy-argument path.57350354tree_object%initializehas optionaladd_adam(default.true.);field_object%initializehad no matching dummy and silently setblocks_number = 0despite the tree being seeded with the ADAM ancestor block. Addedadd_adamto field, setblocks_number = 1ON RANK 0 ONLY.adam_object%initializeplumbs the same value to both tree and field. Independent fix; orthogonal to FNL multi-realm.90d0343eclass(realm_object), pointer :: forest_realm(:)shim fromadam_forest_global(drop fallback paths inupdate_ghost, replace withpresent(realm)); (b) hoistedhost_bufincopy_peer_face_gpufrom allocatable + per-callallocate/deallocateto an automatic stack array (eliminated heap-tcache churn that was masking the GPU bug as a UCX-malloc abort); (c) compile-time-bound private arrays incompute_residuals_fd_centered_dev_kernel(replaced 6 ×private(qsx_y(1-s1:1+s1), ...)with fixed-size + slice passed at call site — 89% reduction in 3.67 MB device-memory churn per substage).9fa34b08adam_forest_global.F90;forest_flux_registerbecomes atype(flux_register_object) :: flux_registerVALUE component onforest_object;forest_active_substagebecomes aninteger, intent(in), optional :: s_activedummy threaded throughexchange_inter_realm_halos_foresttocopy_peer_face[_gpu]. The forest passesself%flux_registerandsthroughresiduals_substage_forest. NO forest module-scope mutable state remains.Diagnostic findings (measured, not speculated):
forest_realm(the polymorphic-class-pointer hazard) made progress (the run advances further before crashing — ~3× moreset_boundary_conditions_kernellaunches than pre-fix) but does NOT eliminate the crash. Confirms the bug is downstream of polymorphic-pointer handling.recycle device memory → pgi_uacc_allocthrash; the crash persists, at the same location. Confirms device-memory thrash is a real efficiency bug but not the proximate trigger.MemcpyHtoDAsync CUDA_ERROR_INVALID_VALUEatreceive_recv_buffer_ghost_gpu_dev:139— the OpenACC kernel-argument-block h2d copy for an INTRA-realm ghost-receive kernel. The failing destination address (e.g.0xe7f729d30) appears in the trace only at the crash line — never allocated, never recycled — so the OpenACC runtime is generating a fresh arg-block destination for the kernel launch and CUDA rejects it.not found in present table, no nil pointers, no double-free signatures). All ~110 k OpenACC events report success up to the kernel-launch arg-block copy that fails.Refined hypothesis (not yet falsified): the cross-realm
acc_memcpy_from_device_f(c_loc(dst), c_loc(peer_realm%rk_fnl%q_rk_gpu(slice)), bytes)inside FUNDAL — although correct in isolation per the standalone tests — interacts with WSL-specific UCX eager-mode buffer registration on the device (the WSL workaroundUCX_RNDV_THRESH=infregisters eager buffers that the OpenACC runtime cannot reconcile with its arg-block allocator). The single-realm path doesn't exercise cross-instance device pointers, so the interaction never fires.What's next (NOT in this Phase A scope):
Architectural state at end of investigation (branch
feature/issue13-fnl-realm-dummy-threading):adam_forest_globaldeleted. Noforest_realm,forest_active_substage,forest_flux_registersymbols in the binary.forest_objectownsflux_registeras a value component.realm(:)flows exclusively asclass(realm_object), intent(inout), optional, targetdummy through the contract chain — NO module-scope shims for the realm array.s_activeflows asinteger(I4P), intent(in), optionaldummy throughexchange_inter_realm_halos_forest.flux_registerflows asclass(flux_register_object), intent(inout), optionaldummy throughcompute_residuals_interface, present only at the CPU FV-reflux hook (PRISM FNL has no FV residual path; PRISM FD/WENO accept the dummy for contract parity).4. Phase B — Shared edge-E / induction-flux uniqueness at the seam
Goal. Algebraic, round-off-level ∇·B = 0 (and ∇·D = 0 when constrained transport on D is enabled) across every coarse-fine interface. This is the specific divergence-preservation mechanism on top of Phase A's bulk-conservation guarantee.
Why this is structurally analogous to Phase A. Phase A reconciles face-fluxes for conserved variables at the seam; Phase B reconciles edge-fluxes (edge-E) for the magnetic flux. Same lifecycle, same comm pattern, same module shape. The distinction is where in the discretization the registered value enters: A enters the conservative update of the bulk variables, B enters the induction-equation update of B.
4.1 The mathematical mechanism — why shared edge-E gives algebraic ∇·B = 0
Recall the Stokes-curl form of the induction equation that every CT-equivalent discretization uses (Evans & Hawley 1988; Balsara & Spicer 1999; Londrillo–Del Zanna 2004 §2.1 Eq. 11):
The discrete divergence of B at cell$(i,j,k)$ is then:
When the induction equation is applied to each face, every edge-E contributes to two adjacent face updates with opposite sign. Summing the 6 face-divergence contributions for one cell:
This is the algebraic ∇·B = 0 property of CT — exact at the level of integer arithmetic, not just floating-point. It holds provided the same edge-E value is used in both adjacent face-updates.
At a coarse-fine seam (intra-realm AMR jump or inter-realm seam), this property breaks if the coarse-cell induction update and the fine-cell induction update use independently-computed edge-E values that disagree at truncation order. The fix is shared edge-E: at every seam edge, one canonical value is chosen and used by both sides.
4.2 Implementation in cell-centered ADAM — "edge-E" is a derived quantity
In a fully face-staggered CT code (BHAC, Athena++ CT-mode), edge-E is a stored primary variable. ADAM stores B at cell centers; for the equivalent algebraic structure, edge-E becomes a derived quantity computed during residual evaluation from the local cell-centered B and v reconstructions (or, equivalently, the induction-equation flux on each face is derived using a particular combination of corner-states from the L/R cell-centered reconstructions).
The PRISM CPU implementation path:
compute_convective_fluxes_maxwellfamily (src/app/prism/cpu/adam_prism_cpu_object.F90:298-308— currently dispatched to one ofcompute_convective_fluxes_maxwell{,_div_d,_div_b,_div_d_b}) computes the Maxwell-equation flux at face centers. The induction-equation portion of that flux already implicitly carries the curl-of-E contribution that maps to the BHAC §3.4 UCT2 form (Olivares 2019 Eq. 22):(simplified to the non-relativistic limit for PRISM;$\sqrt{\gamma}$ → 1, all velocities are 3-velocities).
adam_edge_register_object(structurally identical toadam_flux_register_objectbut indexed by edges, not faces). The routine's call site changes from:E_seam = compute_edge_E_local(B_local, v_local, ...)to:
(For 2:1 refinement in the plane of the coarse edge,$N = 2$ ; for 2:1 refinement in both perpendicular directions, $N = 4$ .) The averaged value is then broadcast to the coarse side and used in both the coarse-side induction update and the fine-side induction updates that touch this edge.
4.3 New module —
adam_edge_register_object4.4 Lifecycle (structurally identical to Phase A)
register_edge.edge_register%resetzeroes accumulators.accumulate_fine_Eis called for each seam edge as part of the flux computation.reduce_and_publishperforms the MPI reduce + the 2:1 (or 4:1) averaging onto the coarse-edge index, writing the result intoE_shared.get_shared_Eretrieves the canonical value.4.5 References for Phase B
divergence_correction = hyperbolic|poisson) for residual interior errors. Cost: ~+4-5% of base scheme (Table I).4.6 Phase B acceptance criteria
adam_edge_register_objectmodule exists, compiles in all backends.compute_convective_fluxes_maxwell*family is patched to query the edge register on seam edges and use the shared value.Bydigest drift compared to single-realmrmfgolden: drops to round-off (≤1e-13 relative across 100 steps). This is the acceptance criterion for the entire G2 goal.5. Phase C — High-order ghost-fill with hybrid smooth/shock switch
Goal. Order-matched ghost-cell values across the seam, so the WENO/centered-FD interior order is not capped to 1st by the ghost-fill. This is independent of Phases A and B (different concern: ghost-cell accuracy vs flux conservation).
Status of G2 after Phase C. Phase C does not change ∇·B. It changes the accuracy of non-magnetic variables at the seam (ρ, momentum, energy) when realms or AMR levels differ in resolution. Phase C is not needed for the current rmf-2realm regression (same resolution across realms — the existing same-resolution mirror copy is already exact). Phase C is needed for any future case with non-conforming resolution at the seam — heterogeneous-resolution forests, AMR-active multi-realm runs, the rmf-2realm-amr case that issue #13's follow-up Phase D backlog points at.
5.1 The current defect — 0th-order injection
Read from source,
src/lib/common/adam_maps_object.F90:1024-1042:All 8 fine ghost cells
(iii+ic, jjj+jc, kkk+kc)are written from the same single coarse cell(i, j, k)— piecewise-constant injection, 0th-order. This is the executed kernel insrc/lib/common/adam_field_object.F90:856-857:5.2 The replacement — dimension-by-dimension WENO ghost-fill
The kernel is replaced by a 1D WENO interpolation in each dimension, applied successively, using the same WENO-coefficient machinery as the interior reconstruction ($(iii+ic, jjj+jc, kkk+kc)$ from a coarse stencil ${i-2, i-1, i, i+1, i+2}$ :
adam_weno_object). For a 5th-order WENO interp filling fine-ghost cellthen repeat for the$j$ and $k$ directions on the partial result. The coefficients $w_p$ depend on the fractional position of the fine-cell-center within the coarse-cell volume (e.g. ±0.25 of the coarse cell size for ic=0/1 in a 2:1 refinement) and on the WENO smoothness indicators evaluated on the coarse stencil. This is the classical structured-mesh AMR-WENO interpolation (Shen-Qiu-Christlieb 2011 §4; arXiv:2511.08335 §4.2.3).
The
local_map_ghost_cellplan structure is unchanged. The 9-column-per-row tuple already encodes everything needed; the kernel insideupdate_ghost_local(src/lib/common/adam_field_object.F90:822-869) is what changes. A newtransfer_kindcolumn may be added tolocal_map_ghost_cellto dispatch among {0th-injection legacy, 8-cell average existing, WENO-interp new, hybrid-shock new} — this is the operator-family architecture the literature review (§6m.3) called out.5.3 The hybrid smooth/shock switch
A naive 5th-order WENO interp at a shock will produce overshoots — even though ghost cells are read-only and do not directly violate conservation, they feed into the interior WENO reconstruction at the next substage, and if the ghost values are oscillatory the interior reconstruction triggers troubled-cell behavior. The fix from arXiv:2511.08335 §5.2:
Troubled-cell detector at the seam. A simple, robust detector adequate for PRISM:
Fallback at troubled cells. Drop from 5th-order WENO to 2nd-order conservative interpolation (a simple slope-limited reconstruction of the coarse cell value with the slope minmodded against neighbors). This is non-oscillatory by construction.
The hybrid switch lives entirely inside the new
transfer_kind=WENO_HYBRIDkernel inupdate_ghost_local. It does not affect the plan structure.5.4 Conservation-vs-order decoupling (the critical insight)
arXiv:2511.08335 §4.2.3, verbatim:
In ADAM's idiom: ghost cells are read by
compute_residuals_*, which produces fluxes, which then update the valid cells. Conservation of the valid cells is enforced by the flux register (Phase A), not by the ghost-fill. Ghost-fill non-conservativity is therefore safe to absorb in exchange for high order. This dissolves the FD-conservation worry that the original literature review wrestled with.5.5 References for Phase C
5.6 Phase C acceptance criteria
update_ghost_local(CPU); FNL/NVF kernels updated.weno-u-1), not at 1st order.6. Execution order and dependencies
A → B → C is the dependency order and the engineering-value order:
By0.77% drift because the dominant error term in current rmf-2realm is the unrefluxed bulk-variable update, not the divergence-cleaning residual.C can be deferred until a non-conforming-resolution use case exists. A and B should land together: A without B leaves the seam ∇·B at truncation order, which is better than current but does not realize the full guarantee.
7. Acceptance criteria for the whole issue
rmfgolden to round-off (≤1e-13 relative), across all components, across 100+ steps. This is the primary regression oracle.src/lib/common/README.md(the D.5 deliverable, written against the final code).8. Out of scope (deferred)
sources/papers_20260522_forest_octree_halo.md; not chosen because cell-centered + interface machinery (this issue) achieves the same algebraic ∇·B = 0 guarantee without restructuringphysics%nv. The face-staggered route remains a forward-looking alternative if PRISM ever grows curl-constrained vector fields (Balsara 2023, GPR fluids, GR FO-CCZ4) where the partial-staggering pattern scales further — at which point a single restructure delivers both div- and curl-preservation.forest_active_substagelockstep is the execution model.9. Persistent context — links to the artifacts produced
sources/papers_20260522_forest_octree_halo.md— 13 primary PDFs read end-to-end, §6m–6n of the review consolidate the cell-centered + interface-machinery verdict.HANDOFF_rmf2realm_2026-05-21.md— the two committed bug fixes (771dbd2f,bf324ccd) and the residual seam-coupling defect this issue addresses.src/tests/prism/regression/rmf-2realm/— the free correctness oracle.Bydrift 0% → 0.55% → 0.77% over 10 steps.Appendix H — Historical scope (the completed Phase D design work)
This appendix preserves the original Phase D design content of issue #13 (D.1 through D.4a, all completed) as design history. The forward plan is the reformulation above; the appendix is for reference only.
H.1 Phase D was the first concrete N>1 use case design
Phase D of #10 was design-blocked until a concrete N>1 use case was specified. The chosen scope: split the existing
rmfregression case along x=0 into two adjacent realms. The union of the two realms reconstructs the single-realmrmfdomain exactly, so the existingrmfgolden serves as the bit-comparable reference solution. Front-loading the Phase D infrastructure work — per-realm config, MPI carve-out, inter-realm halo exchange viaadam_maps_object, time-step coordination, AMR policy — onto a use case where every component was verifiable in isolation.H.2 Use case specification (rmf-2realm)
src/tests/prism/regression/rmf/. Maxwell, EM model, four AC coils, Neumann BCs, no IB, no AMR, 10 iterations, CFL=0.5, fd_centered order 6, SSP-RK 5-stage, weno-u-1.forest%compute_global_dt.H.3 Phase D task breakdown (final status)
adam_maps_objectwithinter_realm_neighbor_t+exchange_inter_realm_halos_forestTBP. Done 2026-05-18. Per-substage exchange granularity question reopened, resolved in D.4a via path (a) — splitadvance_one_step_forestinto per-substage TBPs.adam_forest_manifest.F90+ manifest-aware driver. Done 2026-05-18 (commitdffa19be).adam_adam_globalshim. Done 2026-05-19.adam_forest_globalexposingforest_realm(:)+forest_active_substage; realm per-substage TBPsprepare_step_forest,assemble_substage_forest,evaluate_substage_forest,finalize_step_forest,exchange_inter_realm_halos_forest;bind_my_globals_forest); rmf-2realm case files; digest.py aggregation. Done 2026-05-19 (commitc2a77708, cleanupd73b9ca8).src/lib/common/README.md. Deferred until Phases A/B/C land.H.4 May 2026 multi-realm bug fixes
Two genuine multi-realm engineering bugs surfaced when rmf-2realm was first run and were fixed before this issue was reformulated:
771dbd2f— per-realm memory budget. Each realm was grabbing the full per-process budget at allocation time, causing OOM for N>1 forests. Fix: divide the per-process budget across the realms in the forest.bf324ccd— per-realm MPI_FINALIZE. Each realm was callingMPI_FINALIZEindependently, causing post-finalize abort after the first realm shut down. Fix: single forest-level MPI finalize.After these fixes, rmf-2realm runs clean to completion on CPU. The residual
Bydigest drift (0.77%/10 steps) is the seam coupling defect that the §§3-5 forward plan addresses.H.5 Phase D risks (resolved)
The original risks section is preserved verbatim for design history:
Inter-realm halo exchange ordering— resolved by D.4a path (a): splitadvance_one_step_forestinto per-substage TBPs driven by the forest, withexchange_inter_realm_halos_forestcalled between substages via theforest_realmmodule pointer.adam_maps_objectblast radius — mitigated by keeping inter-realm additions purely additive; existing intra-realm exchange code unchanged.INI parsing complexity creep— removed by the manifest-design revision (2026-05-18); each realm uses an unchanged PRISM parser on its own complete INI.C.3 deferred shim retargeting— discharged 2026-05-19 by the C.3 closure commit.H.6 Phase D out of scope (still deferred; restated for completeness)
These remain on the Phase D backlog. The Phases A/B/C work in this issue scaffolds the interface machinery for any of them.