Coarse-fine interface machinery for ADAM/PRISM — Berger-Colella reflux, edge-E uniqueness, high-order ghost-fill

# Coarse-fine interface machinery for ADAM/PRISM — Berger-Colella reflux, edge-E uniqueness, high-order ghost-fill

> **This issue body was reformulated 2026-05-23** after a multi-session literature review (`sources/papers_20260522_forest_octree_halo.md`) and design conversation that re-scoped the work. The original Phase-D-design content (tasks D.1–D.4a, all completed) is preserved as **Appendix H — Historical scope**. The active forward plan is **Phases A → B → C** described in §§3–5 below.
>
> *Status of the original Phase D scope:*
> - D.1 (design), D.2 (`adam_maps_object` extension), D.3 (manifest parser + driver), C.3 closure (post-D.3 follow-up), and D.4a (multi-realm orchestrator scaffolding + rmf-2realm case files + digest aggregation) — **all completed and merged** (commits `2d74fb6d`, `dffa19be`, `c2a77708`, `d73b9ca8`, plus the May-2026 budget/finalize fixes `771dbd2f` and `bf324ccd`).
> - D.4b (capture rmf-2realm golden + confirm bit-comparability) — **superseded** by the present work: empirical evidence (the seam `By` digest drifts 0% → 0.55% → 0.77% over 10 steps) shows the rmf-2realm seam is *not* bit-comparable to the single-realm golden, because the seam coupling is structurally incomplete. The seam machinery has to be built *before* a meaningful golden can be captured.
> - D.5 (architecture docs) — **deferred** until §§3–5 land; the docs will reflect the actual final architecture.

---

## 1. Motivation — what the rmf-2realm regression revealed

The rmf-2realm regression case (`src/tests/prism/regression/rmf-2realm/`, see Appendix H for the construction) was built as a free correctness oracle: the two adjacent realms reconstruct the single-realm `rmf` domain by union, so the existing `rmf` golden should be bit-comparable to the digest-aggregated 2-realm output.

After fixing two genuine multi-realm engineering bugs in May 2026 — per-realm memory budget over-allocation (`771dbd2f`) and per-realm `MPI_FINALIZE` race (`bf324ccd`) — the case **runs clean to completion on CPU**. The aggregated digest, however, **does not match the single-realm `rmf` golden**. The `By` component shows a systematic drift at the inter-realm seam: 0% at t=0, 0.55% at step 5, 0.77% at step 10. The drift is **monotonic, smooth, and spatially localized at the seam** — it is not a sign error, an indexing bug, or a substage-staleness artifact (all three were eliminated during diagnosis).

The root cause is **structural**: the current `exchange_inter_realm_halos_forest` (`src/app/prism/cpu/adam_prism_cpu_object.F90:1181-1493`) is a partial implementation of the inter-realm seam coupling. Specifically:

1. **It is a same-resolution mirror copy only.** No flux reconciliation, no edge-E (or equivalently, no induction-equation flux) sharing across the seam. The induction equation in each realm sees its own independently-computed curl-of-E contribution at the seam, and the two contributions disagree at truncation order. The disagreement accumulates as the `By` drift.
2. **`find_peer_block` is local-rank-only** (line 1296). Off-rank peers hit `cycle` (line 1346) and the corresponding seam ghost is never written. For the current rmf-2realm replicated forest (both ranks own both realms) this defect is dormant; it activates immediately under disjoint-rank carve-out.
3. **`copy_peer_face` writes the axis-aligned face slab only** (lines 1366-1492). Edge and corner ghosts are not written. WENO stencils that read corner/edge ghosts at the seam see uninitialized or stale values. This is the same bug class AMReX fixed in [their issue #568](https://github.com/AMReX-Codes/amrex/issues/568) (the `multi_ghost` send-tag fix).

The first defect — missing flux/edge-E reconciliation — is the **dominant** contributor to the `By` drift (it produces a truncation-order error that grows linearly with step count) and is the focus of this issue. Defects 2 and 3 are addressed as part of the same redesign.

---

## 2. Literature foundation — what the field has converged on

A full source-verified review is in `sources/papers_20260522_forest_octree_halo.md`. The **operative consensus**, after reading 11 primary sources end-to-end:

### 2.1 The cell-centered AMR divergence-preservation question, resolved

The relevant question for ADAM/PRISM is **not** *"can a local pointwise interpolation operator preserve ∇·B = 0 on cell-centered storage?"* — six independent sources (Tóth & Roe 2002 §2/§7, Balsara 2001, Londrillo–Del Zanna 2004 §2.2, Li & Li 2004 p.2, Olivares 2019 §2, Balsara 2009 §2) converge that this question is **under-determined**: one cell-averaged B-vector per cell cannot uniquely close the face-flux + curl constraints required for an exact local prolongation.

The relevant question is **"can the discretization as a whole, with cell-centered storage everywhere, preserve ∇·B = 0 across AMR jumps by adding interface machinery (reflux, shared edge-E, unique flux reconstruction)?"** — and the answer is **yes**, demonstrated by production codes (Flash-X Spark with cell-centered MHD + flux registers; BATSRUS in cell-centered mode + flux-CT; Athena++ FluxCorrection). The mechanism is:

- **Restriction (fine → coarse):** *coarse-face flux = sum of fine-face fluxes.* Algebraically conservative; if every fine cell has ∇·B = 0 numerically, the coarse cell does too, by telescoping (Olivares 2019 §4.2.1 Eq. 26-27).
- **Induction-equation flux at the seam:** *one canonical edge-E value, shared by both sides.* If the same edge-E enters the curl-of-E on both the fine and the coarse face that share the edge, the discrete divergence telescopes algebraically across the seam (Evans & Hawley 1988; Balsara & Spicer 1999; Londrillo & Del Zanna 2004 §2.1 Fig. 1).
- **Prolongation (coarse → fine) for new fine cells:** standard slope-limited reconstruction. *Not* required to be div-free pointwise; the next-substage reflux corrects any divergence error introduced. This is the Spark/AMReX pattern (Couch 2021 §2.2; Zhang 2019).

### 2.2 The 5-author algorithmic consensus on RK-multistage refluxing

For Runge-Kutta time integration (PRISM uses SSP-RK 1/2/3/4/5), the reflux is **stage-weighted**, not a single end-of-step correction. The formula, in the form most directly applicable to ADAM's existing `adam_rk_object` (which carries `ark(s), brk(s), crk(s)` weights and `alph(s,ss), beta(s), gamm(s)` matrices), is:

$$
U_{\text{coarse}}^{n+1} \mathrel{+}= \sum_{s=1}^{\text{nrk}} W_s \cdot \frac{\Delta t}{\Delta x} \left( F_{\text{coarse},s}^{\text{used}} - \sum_{\text{fine faces}} F_{\text{fine},s} \right)
$$

where $W_s$ is the SSP-RK accumulation weight for substage $s$ (derived from `ark/brk/crk` for the SSP-RK form ADAM uses). This is the formula in **Wang et al. 2018, JCP 360 (eq. 23)** for GPU-SAMR, generalized in **Berger & Colella 1989, JCP 82:64** for the foundational case. Couch et al. 2021 §2.2 confirms the implementation in Flash-X Spark: *"a linear combination of flux calculations can also be stored in the flux register for a correction to be applied to the fluxes at the end of the update step."*

### 2.3 The high-order ghost-fill picture

When realms are conforming and at the same resolution (current rmf-2realm), ghost-fill is a same-resolution mirror copy and order is preserved trivially. When realms (or AMR levels intra-realm) differ in resolution, the existing ADAM ghost-fill drops to **0th-order piecewise-constant injection** (`src/lib/common/adam_maps_object.F90:1024-1042` — the "smoking gun" identified in the literature review). This caps global accuracy at the interface to 1st-order in a one-ghost-thick shell, regardless of the interior scheme order.

The replacement is **dimension-by-dimension high-order WENO interpolation** of the coarse cell-centered data, with a **hybrid smooth/shock switch** (5th-order WENO in smooth regions, 2nd-order conservative interp at shocks) governed by a troubled-cell detector. References: Shen, Qiu & Christlieb 2011 *JCP* 230:3780; arXiv:2511.08335 (2025) for the modern cell-centered FD treatment.

**Critical decoupling (the insight that makes this safe):** *"this process does not introduce any loss of conservation, even though a non-conservative interpolation is employed. This is because the interpolation is only applied to populate ghost cells, rather than to update the valid solution cells."* (arXiv:2511.08335 §4.2.3.) Conservation is enforced *separately* by the flux register (Phase A); ghost-fill order is enforced by WENO interpolation (Phase C). **These two mechanisms are independent.**

---

## 3. Phase A — Berger-Colella reflux on the existing flux array

**Goal.** Strict conservation of all bulk variables (ρ, momentum, energy, ψ, φ, and the B/D components) across every coarse-fine interface (intra-realm AMR jump AND inter-realm seam). This is the prerequisite for Phase B.

**Status of G2 after Phase A alone.** Conservation is exact; ∇·B is preserved *if and only if* the underlying induction-equation discretization is naturally CT-like (telescoping curl-of-E). For PRISM's current Roe-family Riemann + cleaning numerics, Phase A alone produces conservation-correct but **truncation-order** ∇·B at the seam — better than current, not yet algebraic. Phase B closes that gap.

### 3.1 New module — `adam_flux_register_object`

A new module in `src/lib/common/`, structurally parallel to `adam_maps_object` (additive, no modification to existing types). The data structure:

```fortran
module adam_flux_register_object
   use :: penf, only : I4P, R8P

   integer(I4P), parameter, public :: SEAM_KIND_INTRA_REALM_AMR = 1_I4P  !< Coarse-fine AMR jump inside one realm.
   integer(I4P), parameter, public :: SEAM_KIND_INTER_REALM    = 2_I4P  !< Inter-realm seam.

   !< Per-seam-face flux accumulator.
   type, public :: flux_register_face_t
      integer(I4P) :: seam_kind         !< SEAM_KIND_*.
      integer(I4P) :: coarse_realm      !< Realm owning the coarse side (= 0 for intra-realm).
      integer(I4P) :: coarse_block      !< Block on the coarse side.
      integer(I4P) :: coarse_face       !< Which face (±x, ±y, ±z) of the coarse block.
      integer(I4P) :: fine_realm        !< Realm owning the fine side (= 0 for intra-realm).
      integer(I4P) :: fine_block(:)     !< Allocatable list of fine blocks covering this coarse face.
      integer(I4P) :: nface_cells       !< Cells covered by this face on the coarse side.
      real(R8P), allocatable :: F_coarse(:,:,:)     !< (nv, nface_cells, nrk) — coarse-side flux per substage.
      real(R8P), allocatable :: F_fine_sum(:,:,:)   !< (nv, nface_cells, nrk) — sum of fine fluxes, accumulated per substage.
   endtype flux_register_face_t

   type, public :: flux_register_object
      type(flux_register_face_t), allocatable :: face(:)     !< All registered seam faces.
      integer(I4P) :: nfaces = 0_I4P
   contains
      procedure, pass(self) :: initialize
      procedure, pass(self) :: register_face            !< Called by topology pass at init/regrid.
      procedure, pass(self) :: accumulate_coarse_flux   !< Called by coarse-side residual.
      procedure, pass(self) :: accumulate_fine_flux     !< Called by fine-side residual.
      procedure, pass(self) :: apply_reflux             !< Called at end of substage, before next ghost-update.
      procedure, pass(self) :: reset                    !< Called at start of step.
      procedure, pass(self) :: destroy
   endtype flux_register_object
end module adam_flux_register_object
```

The register **does not own** field data; it owns the per-substage flux contributions on a thin "skin" of cells along every seam face. Memory cost is therefore $O(\text{seam-area} \times \text{nv} \times \text{nrk})$ — bounded by the surface-area-times-stages of the AMR boundaries, independent of the interior cell count.

### 3.2 Lifecycle and hook points

**Initialization.** During the topology pass that runs once at init/regrid, the inter-realm topology already discovered by `forest_object%populate_inter_realm_topology` (`src/lib/common/adam_forest_object.F90:368-406`) is replayed to call `flux_register%register_face` for every inter-realm face pair. For intra-realm AMR, an analogous pass walks each realm's `adam_grid_object` neighbor table at every refinement level, calling `register_face` for every fine-coarse adjacency. The register's `face(:)` array is fully populated before the first time step.

**Per-step reset.** `flux_register%reset` is called from `forest_object%evolve_one_step` (`src/lib/common/adam_forest_object.F90:194`) at the top of every step, zeroing `F_coarse` and `F_fine_sum` for all registered faces.

**Flux accumulation during residual computation.** Inside each realm's `compute_residuals_*` routine (e.g. `compute_residuals_fd_centered` at `src/app/prism/cpu/adam_prism_cpu_object.F90:1873-2056`), when the flux through a face has been computed, the routine checks whether the face is registered (via a precomputed mask) and if so calls:

```fortran
call flux_register%accumulate_coarse_flux(coarse_block=b, coarse_face=f, &
                                          substage=s, flux_face=F_face)
! ... on the coarse side ...

call flux_register%accumulate_fine_flux(fine_block=b, fine_face=f, &
                                        substage=s, flux_face=F_face)
! ... on the fine side; accumulates into F_fine_sum ...
```

The mask check is a single integer compare per face, so the overhead on interior faces is negligible.

**Reflux application.** At the end of each substage, **before** the next ghost-update, the forest invokes:

```fortran
call flux_register%apply_reflux(realm=forest_realm, substage=s, dt=dt)
```

`apply_reflux` performs the rank-to-rank reduce of `F_fine_sum` (if the fine and coarse sides of a face are on different ranks; this is a separate MPI exchange from the ghost-update, with its own communicator pattern), then applies the correction to the coarse-side `q` array:

$$
q_{\text{coarse cell}}(\text{nv}, i, j, k, b) \mathrel{+}= W_s \cdot \frac{\Delta t}{\Delta x_{\text{coarse}}} \cdot \big( F_{\text{coarse},s} - F_{\text{fine sum},s} \big)
$$

where $W_s$ is the SSP-RK accumulation weight. For SSP-RK 3 (ADAM's `nrk = 3`), $W_1 = W_2 = W_3 = 1$ in the natural decomposition; for SSP-RK 5 stage-weights are read from `adam_rk_object%ark(s)`.

**Verification.** A full step's worth of refluxing must satisfy conservation to round-off: for any conserved quantity $\Phi = \int q\, dV$,

$$
\Phi(t^{n+1}) - \Phi(t^n) = -\Delta t \oint_{\partial\Omega} F \cdot n \, dS
$$

with no contribution from internal coarse-fine interfaces — this is the discrete divergence theorem, and it is the unit-test acceptance criterion for Phase A.

### 3.3 Integration with the existing `comm_map_*` machinery

The flux register's MPI exchange is **a separate communication pattern** from the ghost-cell exchange (different cell sets, different send/recv directions). It does **not** plug into `comm_map_send_ghost` / `comm_map_recv_ghost` (`src/lib/common/adam_maps_object.F90:96-99`). Instead, the register builds its own (send_tags, recv_tags) tables at registration time, indexed by (fine rank, coarse rank, face) — structurally identical to AMReX's `FabArrayBase::CommMetaData`. The pack/unpack uses contiguous `flux_register_face_t%F_fine_sum` buffers, no transposition needed.

The intra-realm AMR case typically has the fine and coarse sides on the same rank (because both belong to the same realm and ADAM's intra-realm partitioning keeps levels co-resident); the inter-realm case may cross ranks. The register handles both transparently via the (send_tags, recv_tags) abstraction.

### 3.4 References and citations for Phase A

- **Berger, M. J. & Colella, P. (1989).** "Local adaptive mesh refinement for shock hydrodynamics." *Journal of Computational Physics* **82**(1):64-84. doi:10.1016/0021-9991(89)90035-1. — The foundational reflux paper. The δF correction is in §4.
- **Wang, B., Xia, Y. & Shu, C.-W. (2018).** "Efficient implementation of high order inverse Lax-Wendroff boundary treatment for conservation laws." *Journal of Computational Physics* **360**:73-99. — RK-stage-weighted reflux formula (Eq. 23). The form ADAM should adopt.
- **Couch, S. M. et al. (2021).** "Towards performance portability in the Spark astrophysical magnetohydrodynamics solver in the Flash-X simulation framework." *Parallel Computing* **108**:102830. doi:10.1016/j.parco.2021.102830. — §2.2: production reflux for RK-multistage cell-centered MHD. Quoted: *"a linear combination of flux calculations can also be stored in the flux register for a correction to be applied to the fluxes at the end of the update step."*
- **Zhang, W. et al. (2019).** "AMReX: a framework for block-structured AMR." *JOSS* **4**(37):1370. — The `FluxRegister` and `YAFluxRegister` data structure design ADAM's `flux_register_object` mirrors.
- **Olivares, H. et al. (2019).** "Constrained transport and adaptive mesh refinement in the Black Hole Accretion Code." *A&A* **629**:A61. — §4.2.1 Eq. 26-27: the telescoping property that makes reflux work.

### 3.5 Phase A acceptance criteria

- [ ] `adam_flux_register_object` module exists, compiles in all backends (CPU, FNL, NVF).
- [ ] `forest_object` registers all inter-realm seam faces at init time; per-step reset hooked in `evolve_one_step`.
- [ ] PRISM CPU `compute_residuals_*` routines accumulate fluxes on registered faces; reflux applied between substages.
- [ ] Unit test: closed-domain mass-conservation check passes to round-off (≤1e-13 relative) on a 2-realm case across 100 steps.
- [ ] rmf-2realm `By` digest drift compared to single-realm `rmf` golden: report the post-Phase-A magnitude. Expected: significantly reduced from current 0.77%/10 steps but **not** to zero (zero requires Phase B).

### 3.6 Phase A implementation log — what landed, in 8 commits

**Phase A is fully complete on `develop`.** The rmf-2realm aggregated CPU digest matches the single-realm `rmf` golden across every field variable; the 9 metadata mismatches that previously cluttered the cross-case comparison are now filtered to `SKIP_METADATA` by `digest.py compare`. The 0.77%-at-step-10 `By` drift documented in `HANDOFF_rmf2realm_2026-05-21.md` is **closed**. A regression golden for the case is committed; `run.sh` PASSes `rmf-2realm/cpu` against its own golden. The flux-register infrastructure for the FV-path reflux is wired into `forest_object` AND the PRISM CPU consumer (`compute_residuals_fv_centered`) now actively calls `accumulate_*_flux` on registered seam faces; for the rmf-2realm case (which uses `fd_centered`) the FV hook is unreachable, the accumulators stay zero, and reflux remains a numerical no-op — by design, per the 2026-05-24 decision (PRISM is FD-focused, no FV regression case in scope).

**The eight commits, in order:**

| # | Commit | Subject | Numerical effect |
|---|---|---|---|
| 1 | `6b13074` | feat(lib): scaffold flux register module for Phase A reflux machinery | Numerical no-op (skeleton TBPs) |
| 2 | `9786245` | feat(lib): wire flux register topology pass for inter-realm seams | Numerical no-op (accumulators allocated and zeroed) |
| 3 | `9d1533e` | feat(lib,prism): replace geometric seam exchange with precomputed comm-map | rmf single-realm: byte-identical. rmf-2realm: byte-identical to develop-HEAD (0.77% `By` drift exposed but not yet closed — see §3.6.1 for why) |
| 4 | `6987c20` | fix(lib,prism): close inter-realm seam — BC_SEAM override + 3-phase substage loop | **rmf-2realm digest closes** (zero field-variable mismatches vs single-realm `rmf` golden) |
| 5 | `d3e50e5` | test(prism): capture rmf-2realm CPU regression golden — seam closure | rmf-2realm now has a committed golden; `run.sh` PASSes the case against it |
| 6 | `fa20706` | test(regression): downgrade per-block-metadata digest mismatches to SKIP_METADATA | Cosmetic — cross-case `rmf-2realm` vs `rmf` golden comparison now exits 0 (filters the 9 benign metadata rows to `SKIP_METADATA`) |
| 7 | `fd91be6` | feat(lib): reflux infrastructure — accumulators, register lookup, forest q-correction | Numerical no-op (FV-path infrastructure; no consumer invokes `accumulate_*_flux` yet — see §3.7) |
| 8 | `108a96a6` | feat(prism,lib): wire FV-path inter-realm flux accumulation hook (#13) | Numerical no-op for the regression case (rmf-2realm uses `fd_centered`, hook unreachable). Hook is live for FV consumers; same-resolution mirror seam gives `F_coarse - F_fine_sum = round-off zero` by expectation. |

Merged into `develop` as `ac6edec0` (commits 1–3), `993f81fe` (commit 4), `d3e50e5` direct (commit 5), `fa20706` direct (commit 6), `c3648ef9` (commit 7 from `feature/issue13-phase-a-fv-reflux`), and `57df060e` (commit 8 from `feature/issue13-phase-a-fv-reflux-hook`). HEAD on `develop` after closure: `57df060e`.

**Material work landed by commits 1–3:**

1. `adam_flux_register_object` exists as a program-scope singleton (`forest_flux_register` on `adam_forest_global`) with full TBP signatures (`initialize`, `register_face`, `accumulate_coarse_flux`, `accumulate_fine_flux`, `apply_reflux`, `reset`, `destroy`). Topology hooks fire at init (`populate_inter_realm_topology`), per-step reset (`evolve_one_step`), and per-substage apply (between phases of the substage loop). Accumulators (`F_coarse`, `F_fine_sum`) are allocated `(nv, nface_cells, nrk)` per registered face. `accumulate_*_flux` and `apply_reflux` remain skeleton no-ops pending Phase A step 4 (PRISM `compute_residuals_*` wiring); the lifecycle infrastructure is complete and inert.

2. `adam_maps_object` gained `inter_realm_ghost_cell(:,1:10)` — a per-cell ghost map analogous to the intra-realm `local_map_ghost_cell` machinery, with layout `[peer_realm, b_send, b_recv, i_send, j_send, k_send, i_recv, j_recv, k_recv, one_or_eight]`. Built once at init by `forest_object%build_inter_realm_ghost_cell_map`, which enumerates every ghost cell in every seam-block's **full** ghost slab (face + edge + corner — the defect-B fix) and resolves the matching peer realm + block + interior cell via a geometric `find_peer_cell` test against `field%emin/emax`. Coverage for rmf-2realm: 9348 map rows per realm per rank (6144 face + 2880 edge + 324 corner).

3. `prism_cpu_object%exchange_inter_realm_halos_forest` is now a flat indexed loop over the precomputed map. The previous geometric `find_peer_block` + face-slab `copy_peer_face` helpers (which missed corner / edge ghosts) were removed. Substage selection via `forest_active_substage` is preserved.

#### 3.6.1 Why commits 1–3 left the digest unchanged — and what commit 4 fixed

Commit 3 wrote ~50% more ghost cells than the old face-slab exchange, with structurally-correct addressing — and yet the rmf-2realm digest stayed byte-identical to develop-HEAD's `0bf607ad...`. Smoking-gun: replacing the per-cell copy with `q = 999.0_R8P` left residuals **unchanged**. The seam exchange's writes were being clobbered downstream.

**Root cause #1 (BC overwrite):** each realm's INI declares physical BCs on all 6 faces (e.g. `bc_x_max = Neumann`). For a realm whose `+x` face is glued to another realm by the manifest, that declaration is wrong — the face is a SEAM, not a physical boundary. PRISM's `set_boundary_conditions` (called by `update_ghost` AFTER `exchange_inter_realm_halos_forest`) was extrapolating Neumann values into the seam-ghost cells at every substage, overwriting the peer-interior values the seam exchange wrote.

**Root cause #2 (substage-coherence race) — exposed by fixing #1:** the previous 2-phase substage loop (`assemble` → `evaluate` = `compute_residuals` + `rk%assign_stage` back-to-back) had a data race at the seam. `rk%assign_stage(s, q=dq)` overwrites `q_rk(:, interior, :, b, s)` with `dq` **in-place** (the SSP-RK trick — `q_rk(s)` holds the dq of stage `s` for the next stage's linear combination). When realm A's evaluate finished before realm B's, B's seam exchange read A's `q_rk(s)` — which by then held A's residual buffer, not A's substage state — and B's stencil fed on garbage. The race was previously masked by the BC Neumann overwrite of the corrupted ghosts; once BC was silenced on seam faces, the race surfaced. Empirically: `max|q-ghost|` jumped 5 orders of magnitude in a single substage on the second-evaluated realm, traced via `maxval(abs(q))` instrumentation at every substage entry.

**Commit 4 fixes both, together (they had to land in one commit because fixing #1 destabilises the regression without #2):**

* **BC_SEAM sentinel** added to `adam_parameters` (`= -2_I4P`, sibling of `BC_PERIODIC = -1`). `forest_object%populate_inter_realm_topology` gains `override_seam_bc_in_crown`, which walks each realm's `local_map_bc_crown` (already built during the realm's own `initialize_forest`) and rewrites column 8 (`bc_type`) from `BC_NEUMANN`/etc. to `BC_SEAM` for entries whose (block, face_1_6) matches a manifest-declared seam. PRISM's `set_boundary_conditions` dispatch ladder has no branch for `BC_SEAM` → those entries become silent no-ops at consumption time → the seam-exchange writes survive intact. The manifest acts as authoritative override over each realm's INI at the right semantic layer — post-crown construction, pre-time-loop — with no per-cell mask inside the BC inner loop and no schema extension to the realm INI.

* **Three-phase substage loop** in `forest_object%evolve_one_step`:
  * Phase 1 — all realms `assemble_substage_forest(s)`
  * Phase 2a — all realms `residuals_substage_forest(s)` (computes `self%dq`; reads peer `q_rk(s)` via the seam exchange — safe, no peer has overwritten `q_rk(s)` yet)
  * Phase 2b — all realms `assign_substage_forest(s)` (overwrites `q_rk(:, interior, :, b, s)` with `self%dq` — race-free because no peer needs to read it any more)
  * Phase 3 — forest-level reflux hook (still skeleton no-op — that's where step-4-of-Phase-A `accumulate_*_flux` + `apply_reflux` wiring will land later)

  `evaluate_substage_forest` is split into `residuals_substage_forest` + `assign_substage_forest` on both `realm_object` (defaults error_stop) and PRISM CPU + FNL (overrides). The old `evaluate_substage_forest` is kept as a backward-compat wrapper that calls both in sequence — the forest no longer invokes it.

#### 3.6.2 Failed exploration recorded as anti-pattern (NOT committed)

Before the BC_SEAM commit landed, an alternative approach was tried: add `inter_realm_face_mask(b, fec_1_6)` on `adam_maps_object` and consult it inside `set_boundary_conditions`'s crown loop to `cycle` past seam entries. **Empirically catastrophic** — same residual explosion (O(1e30) at step 1, NaN by step 10) because it triggers the same substage-coherence race the BC_SEAM + 3-phase fix had to address jointly. The per-cell-mask approach is recorded here as anti-pattern: the BC_SEAM crown-override is the right level. The mask exploration is left in the implementation-session transcript only, not in the repo.

#### 3.6.3 Verification at develop HEAD `57df060e`

* `prism-cpu-gnu` builds clean (`gcc/15.2.0 + openmpi/5.0.10-gnu15.2.0`), release and debug.
* rmf single-realm digest: 102/102 within `rtol=1e-6 atol=1e-3` (the BC_SEAM machinery is a true no-op for N=1: the manifest is absent, `populate_inter_realm_topology` returns early, BC behaves exactly as before; the FV reflux hook is unreachable for `fd_centered`).
* rmf-2realm CPU regression: PASSes against its committed golden (`run.sh cpu` → `digest match: 102 rows within rtol=1e-06 atol=0.001`).
* Cross-case sanity (rmf-2realm aggregated vs rmf single-realm golden): PASSes 102/102 with 9 `SKIP_METADATA` lines (per-block-metadata filter active). **Zero field-variable mismatches.** The 0.77%-at-step-10 `By` drift is gone.
* Reflux machinery (commits 7 + 8) is fully wired: `forest_object%apply_reflux_corrections` runs every substage, walks `forest_flux_register%face(:)`, and applies the full Berger-Colella correction `q_coarse += weight·sign·dt/dx·(F_coarse - F_fine_sum)`. For the rmf-2realm case (FD), the FV hook is unreachable → accumulators stay zero → correction is `0`. For a hypothetical FV mirror seam, `F_coarse - F_fine_sum` is round-off zero by expectation → correction is round-off zero. The non-zero path activates when true coarse-fine AMR adjacency exists; that case is not in the current Phase A regression scope.
* The per-realm `inter_realm_face_register_index(b, fec_1_6)` lookup uses a signed-integer convention (positive = this realm is the coarse side of register entry `abs(idx)`, negative = this realm is the fine side, zero = not a seam face). This encodes the coarse/fine role in the lookup itself, so no realm-identity field is needed on the realm object.

### 3.7 Phase A — what's left

Phase A as originally drafted is **fully closed**. The remaining items below are deliberate scope-cuts or follow-on phases, not Phase A blockers.

* ~~**FV-path consumer wiring**~~ — **landed in commit 8 (`108a96a6`)**. `compute_residuals_fv_centered` now calls `accumulate_coarse_flux` / `accumulate_fine_flux` on registered seam faces between reconstruction and the conservative-update loop. The dispatch coarse-vs-fine is encoded in the sign of `inter_realm_face_register_index(b, fec_1_6)` (+ = coarse side, − = fine side, 0 = not a seam). Per the 2026-05-24 decision: hook implemented but **not** exercised by a new regression case — PRISM is FD-focused. Correctness oracle was rmf-2realm staying bit-identical (passed: hook unreachable for `fd_centered`).
* **FNL multi-realm path** — the per-realm FNL singletons were rebound to value components in earlier work; the 2026-05-25 investigation (§3.8 below) drove the architectural cleanup further (retired `forest_realm`/`adam_forest_global`, threaded `realm(:)` / `s_active` / `flux_register` as dummy arguments throughout) but rmf-2realm/fnl STILL FAILS at the first OpenACC kernel launch after the cross-realm seam exchange. The remaining trigger is below the architectural layer — a WSL-UCX × OpenACC × FUNDAL device-memcpy interaction. **Status**: not blocking Phase A closure; tracked in §3.8.
* **Phase B (shared edge-E for algebraic ∇·B = 0)** and **Phase C (high-order ghost-fill for non-conforming resolutions)** — the next planned phases per §§4 and 5 below; not started.

### 3.8 FNL multi-realm investigation (2026-05-25) — architectural cleanup landed, runtime bug remains

A multi-day investigation triggered by `rmf-2realm/fnl` failing at the first inter-realm seam exchange. The investigation produced **five landed commits** on branch `feature/issue13-fnl-realm-dummy-threading` plus measured evidence that the residual failure is in the nvfortran-OpenACC / UCX / FUNDAL interaction, NOT in PRISM's own state management.

**The five landed commits (in chronological order):**

| Commit | Subject | Effect |
|---|---|---|
| `8000ae7b` | fix(prism-fnl): switch inter-realm seam copy to FUNDAL memcpy | Replaced `!$acc update host` on `dev_alloc`-managed pointers in `copy_peer_face_gpu` (memory-model violation: ACC present-table vs FUNDAL raw-pointer) with FUNDAL `dev_memcpy_from_device` / `dev_memcpy_to_device`. Closed the original "data not found on device" crash; exposed the next bug downstream. |
| `d2fb7bc8` | refactor(lib,prism): thread realm(:) as dummy through forest contract | Added optional `realm(:)` to `initialize_forest`, `assemble/residuals/assign/evaluate_substage_forest`, `post_step_forest`, `update_ghost`, `compute_residuals_*`, `save_simulation_data`. The forest passes `realm=realm` at every per-realm dispatch. Lets FNL's intra-realm and inter-realm halo refresh use the dummy-argument path. |
| `57350354` | fix(lib): plumb add_adam to field initialize for tree/field consistency | Surfaced during the investigation: `tree_object%initialize` has optional `add_adam` (default `.true.`); `field_object%initialize` had no matching dummy and silently set `blocks_number = 0` despite the tree being seeded with the ADAM ancestor block. Added `add_adam` to field, set `blocks_number = 1` ON RANK 0 ONLY. `adam_object%initialize` plumbs the same value to both tree and field. Independent fix; orthogonal to FNL multi-realm. |
| `90d0343e` | refactor(lib,prism): retire forest_realm class pointer + fix kernel device-mem thrash | Three improvements in one commit: (a) retired the `class(realm_object), pointer :: forest_realm(:)` shim from `adam_forest_global` (drop fallback paths in `update_ghost`, replace with `present(realm)`); (b) hoisted `host_buf` in `copy_peer_face_gpu` from allocatable + per-call `allocate/deallocate` to an automatic stack array (eliminated heap-tcache churn that was masking the GPU bug as a UCX-malloc abort); (c) compile-time-bound private arrays in `compute_residuals_fd_centered_dev_kernel` (replaced 6 × `private(qsx_y(1-s1:1+s1), ...)` with fixed-size + slice passed at call site — 89% reduction in 3.67 MB device-memory churn per substage). |
| `9fa34b08` | refactor(lib,prism): retire adam_forest_global, full dummy-argument threading | Closes the polymorphic / module-shared-state hygiene track: deletes `adam_forest_global.F90`; `forest_flux_register` becomes a `type(flux_register_object) :: flux_register` VALUE component on `forest_object`; `forest_active_substage` becomes an `integer, intent(in), optional :: s_active` dummy threaded through `exchange_inter_realm_halos_forest` to `copy_peer_face[_gpu]`. The forest passes `self%flux_register` and `s` through `residuals_substage_forest`. NO forest module-scope mutable state remains. |

**Diagnostic findings (measured, not speculated):**

* Five standalone FUNDAL tests (A: single-instance whole-array memcpy; B: two-instance cross-copy; C: polymorphic-class-array + select type + nested component chain; D: kernel/memcpy interleave; E: D + MPI on -np 2) — **all PASS**. FUNDAL itself + the polymorphic-dispatch pattern + the OpenACC-kernel-launch-interleave-with-FUNDAL-memcpy + MPI all work cleanly in isolation.
* The retirement of `forest_realm` (the polymorphic-class-pointer hazard) made progress (the run advances further before crashing — ~3× more `set_boundary_conditions_kernel` launches than pre-fix) but **does NOT** eliminate the crash. Confirms the bug is downstream of polymorphic-pointer handling.
* The kernel-private compile-time-bound fix achieved 89% reduction in 3.67 MB device-memory `recycle device memory → pgi_uacc_alloc` thrash; the crash persists, at the same location. Confirms device-memory thrash is a real efficiency bug but not the proximate trigger.
* Final crash signature on rmf-2realm/fnl is **`MemcpyHtoDAsync CUDA_ERROR_INVALID_VALUE` at `receive_recv_buffer_ghost_gpu_dev:139`** — the OpenACC kernel-argument-block h2d copy for an INTRA-realm ghost-receive kernel. The failing destination address (e.g. `0xe7f729d30`) appears in the trace **only at the crash line** — never allocated, never recycled — so the OpenACC runtime is generating a fresh arg-block destination for the kernel launch and CUDA rejects it.
* No ACC errors anywhere in the 51 MB trace (no `not found in present table`, no nil pointers, no double-free signatures). All ~110 k OpenACC events report success up to the kernel-launch arg-block copy that fails.

**Refined hypothesis (not yet falsified):** the cross-realm `acc_memcpy_from_device_f(c_loc(dst), c_loc(peer_realm%rk_fnl%q_rk_gpu(slice)), bytes)` inside FUNDAL — although correct in isolation per the standalone tests — interacts with **WSL-specific UCX eager-mode buffer registration on the device** (the WSL workaround `UCX_RNDV_THRESH=inf` registers eager buffers that the OpenACC runtime cannot reconcile with its arg-block allocator). The single-realm path doesn't exercise cross-instance device pointers, so the interaction never fires.

**What's next (NOT in this Phase A scope):**

* **Test on a real GPU cluster** (Leonardo, IAC) — the WSL workaround envvars are explicitly documented as WSL-only perf regressions; cluster GPUDirect RDMA uses rendezvous + actual UCX device-memory support that may or may not exhibit the same interaction.
* **Alternative seam-exchange path** that host-stages everything through CPU memory (slower but architecturally insulated from device-runtime interaction).
* **File an nvfortran/UCX bug** with the WSL repro after isolating which specific operation (the FUNDAL cross-instance memcpy, the eager-mode UCX, or the OpenACC arg-block allocator) introduces the bad device address.

**Architectural state at end of investigation (branch `feature/issue13-fnl-realm-dummy-threading`):**

* `adam_forest_global` deleted. No `forest_realm`, `forest_active_substage`, `forest_flux_register` symbols in the binary.
* `forest_object` owns `flux_register` as a value component.
* `realm(:)` flows exclusively as `class(realm_object), intent(inout), optional, target` dummy through the contract chain — NO module-scope shims for the realm array.
* `s_active` flows as `integer(I4P), intent(in), optional` dummy through `exchange_inter_realm_halos_forest`.
* `flux_register` flows as `class(flux_register_object), intent(inout), optional` dummy through `compute_residuals_interface`, present only at the CPU FV-reflux hook (PRISM FNL has no FV residual path; PRISM FD/WENO accept the dummy for contract parity).
* CLAUDE.md pointer-to-derived-type rule is fully observed on the forest path.
* CPU regression preserved (rmf-2realm/cpu PASS). rmf/fnl PASS. rmf-2realm/fnl crash is NOT in the architectural hygiene track.

---

## 4. Phase B — Shared edge-E / induction-flux uniqueness at the seam

**Goal.** Algebraic, round-off-level ∇·B = 0 (and ∇·D = 0 when constrained transport on D is enabled) across every coarse-fine interface. This is the specific divergence-preservation mechanism on top of Phase A's bulk-conservation guarantee.

**Why this is structurally analogous to Phase A.** Phase A reconciles **face-fluxes for conserved variables** at the seam; Phase B reconciles **edge-fluxes (edge-E) for the magnetic flux**. Same lifecycle, same comm pattern, same module shape. The distinction is *where* in the discretization the registered value enters: A enters the conservative update of the bulk variables, B enters the induction-equation update of B.

### 4.1 The mathematical mechanism — why shared edge-E gives algebraic ∇·B = 0

Recall the Stokes-curl form of the induction equation that every CT-equivalent discretization uses (Evans & Hawley 1988; Balsara & Spicer 1999; Londrillo–Del Zanna 2004 §2.1 Eq. 11):

$$
\frac{d B_x|_{i+½,j,k}}{dt} = - \frac{1}{\Delta y \Delta z} \big[ \Delta z E_z|_{i+½,j+½,k} - \Delta z E_z|_{i+½,j-½,k} - \Delta y E_y|_{i+½,j,k+½} + \Delta y E_y|_{i+½,j,k-½} \big]
$$

The discrete divergence of B at cell $(i,j,k)$ is then:

$$
(\nabla \cdot B)_{i,j,k} = \frac{B_x|_{i+½} - B_x|_{i-½}}{\Delta x} + \frac{B_y|_{j+½} - B_y|_{j-½}}{\Delta y} + \frac{B_z|_{k+½} - B_z|_{k-½}}{\Delta z}
$$

When the induction equation is applied to each face, every edge-E contributes to **two adjacent face updates** with opposite sign. Summing the 6 face-divergence contributions for one cell:

- Each $E_z|_{i+½, j+½, k}$ appears in $dB_x/dt|_{i+½}$ with sign $-1$ and in $dB_y/dt|_{j+½}$ with sign $+1$ (after multiplication by their respective face-area weights).
- The two contributions to $(\nabla \cdot B)$ from this one edge **cancel identically** (Olivares 2019 §3.2 verbatim: *"Since each of the line integrals of the electric field is shared by two faces, but appears with opposite sign in the time update formula for each of them, the rate of change of $(\overline{\nabla \cdot B})_\text{cell}$, which can be calculated as the sum of the rate of change of the outgoing flux through all faces, vanishes."*).

This is the **algebraic ∇·B = 0** property of CT — exact at the level of integer arithmetic, not just floating-point. It holds **provided the same edge-E value is used in both adjacent face-updates**.

At a coarse-fine seam (intra-realm AMR jump or inter-realm seam), this property breaks if the coarse-cell induction update and the fine-cell induction update use **independently-computed** edge-E values that disagree at truncation order. The fix is **shared edge-E**: at every seam edge, one canonical value is chosen and used by both sides.

### 4.2 Implementation in cell-centered ADAM — "edge-E" is a derived quantity

In a fully face-staggered CT code (BHAC, Athena++ CT-mode), edge-E is a stored primary variable. ADAM stores B at cell centers; for the *equivalent* algebraic structure, edge-E becomes a **derived quantity computed during residual evaluation** from the local cell-centered B and v reconstructions (or, equivalently, the induction-equation flux on each face is derived using a particular combination of corner-states from the L/R cell-centered reconstructions).

The PRISM CPU implementation path:

1. **PRISM's `compute_convective_fluxes_maxwell` family** (`src/app/prism/cpu/adam_prism_cpu_object.F90:298-308` — currently dispatched to one of `compute_convective_fluxes_maxwell{,_div_d,_div_b,_div_d_b}`) computes the Maxwell-equation flux at face centers. The induction-equation portion of that flux already implicitly carries the curl-of-E contribution that maps to the BHAC §3.4 UCT2 form (Olivares 2019 Eq. 22):

$$
E_z = \sqrt{\gamma} \Bigg[ \frac{c_x^+ \tilde{V}_L^x B_L^y + c_x^- \tilde{V}_R^x B_R^y - c_x^- c_y^+ (B_R^y - B_L^y)}{c_x^+ + c_x^-} + \frac{c_x^+ c_x^- (B_y^R - B_y^L) - c_y^- c_x^+ (B_x^R - B_x^L)}{(c_x^+ + c_x^-)(c_y^+ + c_y^-)} \Bigg]
$$

(simplified to the non-relativistic limit for PRISM; $\sqrt{\gamma}$ → 1, all velocities are 3-velocities).

2. **At seam faces**, instead of computing edge-E independently on each side, the residual routine reads the shared canonical value from a new `adam_edge_register_object` (structurally identical to `adam_flux_register_object` but indexed by edges, not faces). The routine's call site changes from:

```fortran
E_seam = compute_edge_E_local(B_local, v_local, ...)
```

to:

```fortran
if (edge_register%is_seam_edge(i, j, k, edge_dir)) then
   E_seam = edge_register%get_shared_E(seam_edge_id, substage=s)
else
   E_seam = compute_edge_E_local(B_local, v_local, ...)
endif
```

3. **The shared E is computed on the fine side**, using the fine-resolution reconstruction (higher accuracy), then **summed to a coarse-edge value** via the same restriction-by-averaging Olivares §4.2.1 uses for flux summation:

$$
\overline{E_{\text{coarse edge}}} = \frac{1}{N_{\text{fine edges}}} \sum_{\text{fine edges on this coarse edge}} E_{\text{fine edge}}
$$

(For 2:1 refinement in the plane of the coarse edge, $N = 2$; for 2:1 refinement in both perpendicular directions, $N = 4$.) The averaged value is then broadcast to the coarse side and used in *both* the coarse-side induction update *and* the fine-side induction updates that touch this edge.

4. **The 4 fine cells adjacent to a seam edge and the 1 coarse cell on the other side now share the exact same edge-E.** Their induction updates' divergence contributions cancel pairwise across the seam, by the same algebraic property that holds for interior edges. The discrete ∇·B = 0 is preserved across the AMR jump.

### 4.3 New module — `adam_edge_register_object`

```fortran
module adam_edge_register_object
   use :: penf, only : I4P, R8P

   type, public :: edge_register_edge_t
      integer(I4P) :: seam_kind         !< SEAM_KIND_*.
      integer(I4P) :: coarse_realm
      integer(I4P) :: coarse_block
      integer(I4P) :: coarse_edge_idx   !< Linear index of the edge within the coarse block.
      integer(I4P) :: edge_direction    !< 1=x, 2=y, 3=z.
      integer(I4P), allocatable :: fine_block(:)     !< Fine blocks touching this coarse edge.
      integer(I4P), allocatable :: fine_edge_idx(:)  !< Per-fine-block linear edge indices contributing.
      real(R8P), allocatable :: E_shared(:,:)        !< (1, nrk) — shared E value per substage. (Scalar per edge — E_z on a z-edge, etc.)
      real(R8P), allocatable :: E_fine_sum(:,:)      !< (1, nrk) — accumulator on the fine side per substage.
   endtype edge_register_edge_t

   type, public :: edge_register_object
      type(edge_register_edge_t), allocatable :: edge(:)
      integer(I4P) :: nedges = 0_I4P
   contains
      procedure, pass(self) :: initialize
      procedure, pass(self) :: register_edge          !< Called by topology pass.
      procedure, pass(self) :: accumulate_fine_E      !< Called by fine-side residual on seam edges.
      procedure, pass(self) :: reduce_and_publish     !< Called between fine-residual and induction-update.
      procedure, pass(self) :: get_shared_E           !< Called by induction-equation flux on seam edges.
      procedure, pass(self) :: reset
      procedure, pass(self) :: destroy
   endtype edge_register_object
end module adam_edge_register_object
```

### 4.4 Lifecycle (structurally identical to Phase A)

1. **Init/regrid:** topology pass registers all seam edges via `register_edge`.
2. **Per-step reset:** `edge_register%reset` zeroes accumulators.
3. **Per-substage:**
   - Fine-side residual: `accumulate_fine_E` is called for each seam edge as part of the flux computation.
   - Between fine-residual and induction-update: `reduce_and_publish` performs the MPI reduce + the 2:1 (or 4:1) averaging onto the coarse-edge index, writing the result into `E_shared`.
   - Coarse-side induction-update: `get_shared_E` retrieves the canonical value.
4. **End of substage:** any further reflux step in Phase A sees a coarse-side B that is already algebraically divergence-free at the seam.

### 4.5 References for Phase B

- **Evans, C. R. & Hawley, J. F. (1988).** "Simulation of magnetohydrodynamic flows: a constrained transport method." *ApJ* **332**:659-677. doi:10.1086/166684. — The foundational constrained-transport paper, the source of the algebraic ∇·B = 0 property.
- **Balsara, D. S. & Spicer, D. S. (1999).** "A staggered mesh algorithm using high-order Godunov fluxes to ensure solenoidal magnetic fields in MHD simulations." *JCP* **149**:270-292. — The first practical CT scheme with arithmetic-averaging corner-E; superseded for stability by UCT (Londrillo-Del Zanna 2004) but still the simplest correct form.
- **Londrillo, P. & Del Zanna, L. (2004).** "On the divergence-free condition in Godunov-type schemes for ideal magnetohydrodynamics: the upwind constrained transport method." *JCP* **195**:17-48. doi:10.1016/j.jcp.2003.09.016. — The UCT formalism. §2.1 Fig. 1 (the Yee mesh geometry) + §2.2 (the consistency demands).
- **Olivares, H. et al. (2019).** "Constrained transport and adaptive mesh refinement in the Black Hole Accretion Code." *A&A* **629**:A61. — **The production-code engineering blueprint.** §3.2 Eq. 16 (the algebraic ∇·B = 0 statement) and §3.4 UCT2 Eq. 22 (the upwinded corner-E reconstruction formula adapted to PRISM's non-relativistic limit).
- **Tóth, G. (2000).** "The ∇·B = 0 constraint in shock-capturing magnetohydrodynamics codes." *JCP* **161**:605-652. — Three-camp taxonomy. Phase B places PRISM in the CT camp at the seam, while keeping the cleaning machinery (`divergence_correction = hyperbolic|poisson`) for residual interior errors. Cost: ~+4-5% of base scheme (Table I).

### 4.6 Phase B acceptance criteria

- [ ] `adam_edge_register_object` module exists, compiles in all backends.
- [ ] PRISM's `compute_convective_fluxes_maxwell*` family is patched to query the edge register on seam edges and use the shared value.
- [ ] rmf-2realm `By` digest drift compared to single-realm `rmf` golden: **drops to round-off** (≤1e-13 relative across 100 steps). This is the acceptance criterion for the entire G2 goal.
- [ ] Pointwise discrete ∇·B at every cell adjacent to a seam: ≤1e-13 across 100 steps.
- [ ] A simple 2-realm MHD test problem (e.g. an Alfvén wave crossing the seam) preserves the wave amplitude and ∇·B = 0 to round-off, demonstrating the algebraic property is real, not just luck on rmf.

---

## 5. Phase C — High-order ghost-fill with hybrid smooth/shock switch

**Goal.** Order-matched ghost-cell values across the seam, so the WENO/centered-FD interior order is not capped to 1st by the ghost-fill. This is independent of Phases A and B (different concern: ghost-cell accuracy vs flux conservation).

**Status of G2 after Phase C.** Phase C does not change ∇·B. It changes the accuracy of *non-magnetic* variables at the seam (ρ, momentum, energy) when realms or AMR levels differ in resolution. Phase C is **not needed** for the current rmf-2realm regression (same resolution across realms — the existing same-resolution mirror copy is already exact). Phase C is needed for **any future case with non-conforming resolution at the seam** — heterogeneous-resolution forests, AMR-active multi-realm runs, the rmf-2realm-amr case that issue #13's follow-up Phase D backlog points at.

### 5.1 The current defect — 0th-order injection

Read from source, `src/lib/common/adam_maps_object.F90:1024-1042`:

```fortran
else
   ! receiving from a block coarser than me
   do k=kmin, kmax
      do j=jmin, jmax
         do i=imin, imax
            kkk = 2 * k + kdelta
            jjj = 2 * j + jdelta
            iii = 2 * i + idelta
            do kc=0,1 ; do jc=0,1 ; do ic=0,1
               self%local_map_ghost_cell(c,1:2) = [b_send, b_recv]
               self%local_map_ghost_cell(c,3:5) = [i, j, k]              ! ← same coarse cell for all 8
               self%local_map_ghost_cell(c,6:8) = [iii+ic, jjj+jc, kkk+kc]
               self%local_map_ghost_cell(c, 9 ) = 1
               c = c + 1
            enddo ; enddo ; enddo
         enddo
      enddo
   enddo
endif
```

All 8 fine ghost cells `(iii+ic, jjj+jc, kkk+kc)` are written from the **same single coarse cell** `(i, j, k)` — piecewise-constant injection, 0th-order. This is the executed kernel in `src/lib/common/adam_field_object.F90:856-857`:

```fortran
if (one_or_eight==1) then
   q(v,i_recv,j_recv,k_recv,b_recv) = q(v,i_send,j_send,k_send,b_send)   ! ← the 0th-order injection
```

### 5.2 The replacement — dimension-by-dimension WENO ghost-fill

The kernel is replaced by a 1D WENO interpolation in each dimension, applied successively, using the same WENO-coefficient machinery as the interior reconstruction (`adam_weno_object`). For a 5th-order WENO interp filling fine-ghost cell $(iii+ic, jjj+jc, kkk+kc)$ from a coarse stencil $\{i-2, i-1, i, i+1, i+2\}$:

$$
q_{\text{fine ghost}}(iii+ic, jjj+jc, kkk+kc) = \sum_{p=-2}^{+2} w_p^{(\text{fractional offset})} \cdot q_{\text{coarse}}(i+p, j, k)
$$

then repeat for the $j$ and $k$ directions on the partial result. The coefficients $w_p$ depend on the fractional position of the fine-cell-center within the coarse-cell volume (e.g. ±0.25 of the coarse cell size for ic=0/1 in a 2:1 refinement) and on the WENO smoothness indicators evaluated on the coarse stencil. This is the **classical structured-mesh AMR-WENO interpolation** (Shen-Qiu-Christlieb 2011 §4; arXiv:2511.08335 §4.2.3).

The `local_map_ghost_cell` plan structure is **unchanged**. The 9-column-per-row tuple already encodes everything needed; the kernel inside `update_ghost_local` (`src/lib/common/adam_field_object.F90:822-869`) is what changes. A new `transfer_kind` column may be added to `local_map_ghost_cell` to dispatch among {0th-injection legacy, 8-cell average existing, WENO-interp new, hybrid-shock new} — this is the **operator-family architecture** the literature review (§6m.3) called out.

### 5.3 The hybrid smooth/shock switch

A naive 5th-order WENO interp at a shock will produce overshoots — even though ghost cells are read-only and do not directly violate conservation, they feed into the interior WENO reconstruction at the next substage, and if the ghost values are oscillatory the interior reconstruction triggers troubled-cell behavior. The fix from arXiv:2511.08335 §5.2:

1. **Troubled-cell detector at the seam.** A simple, robust detector adequate for PRISM:
   - Roe-based: compute the jump $|\Delta \rho| / \min(\rho_L, \rho_R)$ and the pressure-curvature $\theta = |p_{i+1} - 2 p_i + p_{i-1}| / (p_{i+1} + 2 p_i + p_{i-1})$ on the coarse stencil.
   - If $\theta > 10^{-2}$ or $|\Delta \rho| / \min > 0.2$, mark the cell as troubled.

2. **Fallback at troubled cells.** Drop from 5th-order WENO to **2nd-order conservative interpolation** (a simple slope-limited reconstruction of the coarse cell value with the slope minmodded against neighbors). This is non-oscillatory by construction.

The hybrid switch lives entirely inside the new `transfer_kind=WENO_HYBRID` kernel in `update_ghost_local`. It does not affect the plan structure.

### 5.4 Conservation-vs-order decoupling (the critical insight)

arXiv:2511.08335 §4.2.3, verbatim:

> *"this process does not introduce any loss of conservation, even though a non-conservative interpolation is employed. This is because the interpolation is only applied to populate ghost cells, rather than to update the valid solution cells."*

In ADAM's idiom: **ghost cells are read by `compute_residuals_*`, which produces fluxes, which then update the valid cells. Conservation of the valid cells is enforced by the flux register (Phase A), not by the ghost-fill.** Ghost-fill non-conservativity is therefore safe to absorb in exchange for high order. This dissolves the FD-conservation worry that the original literature review wrestled with.

### 5.5 References for Phase C

- **Shen, C., Qiu, J.-M. & Christlieb, A. (2011).** "Adaptive mesh refinement based on high order finite difference WENO scheme for multi-scale simulations." *JCP* **230**(9):3780-3802. doi:10.1016/j.jcp.2011.02.008. — The foundational FD-WENO AMR ghost-fill paper. §4 (the WENO interpolation), §5 (hybrid smooth/shock switch).
- **arXiv:2511.08335 (2025).** "An Improved High-order AMR Framework for Shock-turbulence Interaction based on cell-centered finite difference schemes." — Modern cell-centered FD treatment. §4.2.3 (conservation-vs-order decoupling), §4.2.3 Eq. 15 (RK3 temporal interpolation for subcycled levels — reserved for future subcycled forests, not needed in current lockstep mode), §5.2 (troubled-cell detector).
- **Couch, S. M. et al. (2021).** Flash-X Spark §2.2 — production reference for the cell-centered AMR-WENO ghost-fill in a code structurally identical to ADAM.

### 5.6 Phase C acceptance criteria

- [ ] WENO-interp ghost-fill kernel implemented in `update_ghost_local` (CPU); FNL/NVF kernels updated.
- [ ] Hybrid smooth/shock switch with troubled-cell detector implemented.
- [ ] Convergence test on a 2-realm case with **non-conforming resolution** (one realm at 16³, the other at 32³): global L_∞ error in a smooth test problem (e.g. sinusoidal advection) converges at the interior WENO order (5 for `weno-u-1`), not at 1st order.
- [ ] Strong-shock test (Sod tube crossing the seam) shows no spurious oscillations at the interface.

---

## 6. Execution order and dependencies

| Phase | Depends on | Unblocks | Closes (G1, G2, accuracy) |
|---|---|---|---|
| **A** — Reflux | D.4a (done); existing comm primitives | B, C | bulk conservation across seam |
| **B** — Edge-E sharing | A (the lifecycle pattern is reused) | nothing | algebraic ∇·B = 0 across seam |
| **C** — WENO ghost-fill | nothing structural (independent of A, B) | non-conforming-resolution forests | seam accuracy at interior order |

A → B → C is the dependency order *and* the engineering-value order:
- **A alone** restores conservation and (per literature evidence) **may close most of the `By` 0.77% drift** because the dominant error term in current rmf-2realm is the unrefluxed bulk-variable update, not the divergence-cleaning residual.
- **A+B** delivers the full G2 goal: algebraic ∇·B = 0 across the seam, cell-centered storage preserved.
- **A+B+C** delivers G1 (unified operator family across same-res and non-conforming seams).

C can be **deferred** until a non-conforming-resolution use case exists. A and B should land together: A without B leaves the seam ∇·B at truncation order, which is better than current but does not realize the full guarantee.

---

## 7. Acceptance criteria for the whole issue

- [ ] Phases A and B implemented, tested per §3.5 and §4.6.
- [ ] **rmf-2realm CPU digest matches single-realm `rmf` golden to round-off** (≤1e-13 relative), across all components, across 100+ steps. This is the primary regression oracle.
- [ ] FNL backend builds with the same machinery; rmf-2realm FNL digest captures committed.
- [ ] Brio-Wu rotated shock-tube on a 2-realm case crossing the seam: no spurious oscillations, conservation to round-off, ∇·B ≤ 1e-13 everywhere.
- [ ] Phase C deferred until a non-conforming use case is specified.
- [ ] Architecture documented in `src/lib/common/README.md` (the D.5 deliverable, written against the final code).

---

## 8. Out of scope (deferred)

- Face-staggering of B, D (the Balsara 2001 / Li-Li 2004 / Olivares 2019 route). Examined in `sources/papers_20260522_forest_octree_halo.md`; **not chosen** because cell-centered + interface machinery (this issue) achieves the same algebraic ∇·B = 0 guarantee without restructuring `physics%nv`. The face-staggered route remains a forward-looking alternative if PRISM ever grows curl-constrained vector fields (Balsara 2023, GPR fluids, GR FO-CCZ4) where the partial-staggering pattern scales further — at which point a single restructure delivers both div- and curl-preservation.
- High-order temporal interpolation between substages on subcycled forests (RK3 Taylor expansion, arXiv:2511.08335 Eq. 15). Reserved hook; not needed while `forest_active_substage` lockstep is the execution model.
- Constrained-transport on **E** (the curl-preserving extension, Balsara-Käppeli-Boscheri-Dumbser 2023). Not needed for PRISM's current Maxwell scope.

---

## 9. Persistent context — links to the artifacts produced

- **Literature review (source-verified):** [`sources/papers_20260522_forest_octree_halo.md`](https://github.com/szaghi/adam/blob/develop/sources/papers_20260522_forest_octree_halo.md) — 13 primary PDFs read end-to-end, §6m–6n of the review consolidate the cell-centered + interface-machinery verdict.
- **Reference handoff (the rmf-2realm seam diagnosis):** [`HANDOFF_rmf2realm_2026-05-21.md`](https://github.com/szaghi/adam/blob/develop/HANDOFF_rmf2realm_2026-05-21.md) — the two committed bug fixes (`771dbd2f`, `bf324ccd`) and the residual seam-coupling defect this issue addresses.
- **Regression case:** `src/tests/prism/regression/rmf-2realm/` — the free correctness oracle.
- **rmf-2realm CPU output for diagnosis:** the digest of the current (broken) seam coupling shows monotonic `By` drift 0% → 0.55% → 0.77% over 10 steps.
- **Predecessor migration plan:** [#10](https://github.com/szaghi/adam/issues/10) (the realm-orchestrator migration; #13 is its Phase D continuation).

---

## Appendix H — Historical scope (the completed Phase D design work)

This appendix preserves the original Phase D design content of issue #13 (D.1 through D.4a, all completed) as design history. The forward plan is the reformulation above; the appendix is for reference only.

### H.1 Phase D was the first concrete N>1 use case design

Phase D of [#10](https://github.com/szaghi/adam/issues/10) was design-blocked until a concrete N>1 use case was specified. The chosen scope: split the existing `rmf` regression case along x=0 into two adjacent realms. The union of the two realms reconstructs the single-realm `rmf` domain exactly, so the existing `rmf` golden serves as the bit-comparable reference solution. Front-loading the Phase D infrastructure work — per-realm config, MPI carve-out, inter-realm halo exchange via `adam_maps_object`, time-step coordination, AMR policy — onto a use case where every component was verifiable in isolation.

### H.2 Use case specification (rmf-2realm)

- **Physics:** identical to `src/tests/prism/regression/rmf/`. Maxwell, EM model, four AC coils, Neumann BCs, no IB, no AMR, 10 iterations, CFL=0.5, fd_centered order 6, SSP-RK 5-stage, weno-u-1.
- **Topology:** single-axis split at x=0. realm 1: x ∈ [-0.06, 0], ni=8. realm 2: x ∈ [0, +0.06], ni=8. dx=0.0075 in both, matches single-realm.
- **Coils:** 4 coils; coils 1 & 2 straddle x=0 (each realm computes its own cells); coils 3 & 4 entirely in one realm.
- **MPI:** replicated forest. Both ranks own both realms. Inter-realm halo is local memcpy on each rank.
- **Time-step:** globally synchronous single dt via `forest%compute_global_dt`.
- **AMR:** disabled at v1.
- **Configuration:** manifest + per-realm INI files.

### H.3 Phase D task breakdown (final status)

- **D.1** Phase D design (this document). **Done.**
- **D.2** Extend `adam_maps_object` with `inter_realm_neighbor_t` + `exchange_inter_realm_halos_forest` TBP. **Done 2026-05-18.** Per-substage exchange granularity question reopened, resolved in D.4a via path (a) — split `advance_one_step_forest` into per-substage TBPs.
- **D.3** Add manifest parser `adam_forest_manifest.F90` + manifest-aware driver. **Done 2026-05-18** (commit `dffa19be`).
- **C.3 closure** Post-D.3 follow-up retargeting `adam_adam_global` shim. **Done 2026-05-19**.
- **D.4a** Multi-realm orchestrator scaffolding (`adam_forest_global` exposing `forest_realm(:)` + `forest_active_substage`; realm per-substage TBPs `prepare_step_forest`, `assemble_substage_forest`, `evaluate_substage_forest`, `finalize_step_forest`, `exchange_inter_realm_halos_forest`; `bind_my_globals_forest`); rmf-2realm case files; digest.py aggregation. **Done 2026-05-19** (commit `c2a77708`, cleanup `d73b9ca8`).
- **D.4b** Capture rmf-2realm CPU golden + confirm bit-comparability. **Superseded** by the present issue body §§3-5 — empirical evidence showed the rmf-2realm seam is not bit-comparable to single-realm because the seam coupling is structurally incomplete (the present work).
- **D.5** Document architecture in `src/lib/common/README.md`. **Deferred** until Phases A/B/C land.

### H.4 May 2026 multi-realm bug fixes

Two genuine multi-realm engineering bugs surfaced when rmf-2realm was first run and were fixed before this issue was reformulated:

- **`771dbd2f`** — per-realm memory budget. Each realm was grabbing the full per-process budget at allocation time, causing OOM for N>1 forests. Fix: divide the per-process budget across the realms in the forest.
- **`bf324ccd`** — per-realm MPI_FINALIZE. Each realm was calling `MPI_FINALIZE` independently, causing post-finalize abort after the first realm shut down. Fix: single forest-level MPI finalize.

After these fixes, rmf-2realm runs clean to completion on CPU. The residual `By` digest drift (0.77%/10 steps) is the seam coupling defect that the §§3-5 forward plan addresses.

### H.5 Phase D risks (resolved)

The original risks section is preserved verbatim for design history:

- ~~**Inter-realm halo exchange ordering**~~ — resolved by D.4a path (a): split `advance_one_step_forest` into per-substage TBPs driven by the forest, with `exchange_inter_realm_halos_forest` called between substages via the `forest_realm` module pointer.
- **`adam_maps_object` blast radius** — mitigated by keeping inter-realm additions purely additive; existing intra-realm exchange code unchanged.
- ~~**INI parsing complexity creep**~~ — removed by the manifest-design revision (2026-05-18); each realm uses an unchanged PRISM parser on its own complete INI.
- ~~**C.3 deferred shim retargeting**~~ — discharged 2026-05-19 by the C.3 closure commit.

### H.6 Phase D out of scope (still deferred; restated for completeness)

- Disjoint-rank MPI carve-out.
- AMR across realm boundaries (Phase C of this issue's forward plan addresses the technical infrastructure).
- Per-realm dt / sub-cycling.
- Per-realm physics or numerical scheme.
- Schwarz-style overlap coupling.
- Three or more realms.
- N>1 in NASTO / CHASE / PATCH.

These remain on the Phase D backlog. The Phases A/B/C work in this issue scaffolds the interface machinery for any of them.







#	Commit	Subject	Numerical effect
1	`6b13074`	feat(lib): scaffold flux register module for Phase A reflux machinery	Numerical no-op (skeleton TBPs)
2	`9786245`	feat(lib): wire flux register topology pass for inter-realm seams	Numerical no-op (accumulators allocated and zeroed)
3	`9d1533e`	feat(lib,prism): replace geometric seam exchange with precomputed comm-map	rmf single-realm: byte-identical. rmf-2realm: byte-identical to develop-HEAD (0.77% `By` drift exposed but not yet closed — see §3.6.1 for why)
4	`6987c20`	fix(lib,prism): close inter-realm seam — BC_SEAM override + 3-phase substage loop	rmf-2realm digest closes (zero field-variable mismatches vs single-realm `rmf` golden)
5	`d3e50e5`	test(prism): capture rmf-2realm CPU regression golden — seam closure	rmf-2realm now has a committed golden; `run.sh` PASSes the case against it
6	`fa20706`	test(regression): downgrade per-block-metadata digest mismatches to SKIP_METADATA	Cosmetic — cross-case `rmf-2realm` vs `rmf` golden comparison now exits 0 (filters the 9 benign metadata rows to `SKIP_METADATA`)
7	`fd91be6`	feat(lib): reflux infrastructure — accumulators, register lookup, forest q-correction	Numerical no-op (FV-path infrastructure; no consumer invokes `accumulate_*_flux` yet — see §3.7)
8	`108a96a6`	feat(prism,lib): wire FV-path inter-realm flux accumulation hook (#13)	Numerical no-op for the regression case (rmf-2realm uses `fd_centered`, hook unreachable). Hook is live for FV consumers; same-resolution mirror seam gives `F_coarse - F_fine_sum = round-off zero` by expectation.

Commit	Subject	Effect
`8000ae7b`	fix(prism-fnl): switch inter-realm seam copy to FUNDAL memcpy	Replaced `!$acc update host` on `dev_alloc`-managed pointers in `copy_peer_face_gpu` (memory-model violation: ACC present-table vs FUNDAL raw-pointer) with FUNDAL `dev_memcpy_from_device` / `dev_memcpy_to_device`. Closed the original "data not found on device" crash; exposed the next bug downstream.
`d2fb7bc8`	refactor(lib,prism): thread realm(:) as dummy through forest contract	Added optional `realm(:)` to `initialize_forest`, `assemble/residuals/assign/evaluate_substage_forest`, `post_step_forest`, `update_ghost`, `compute_residuals_*`, `save_simulation_data`. The forest passes `realm=realm` at every per-realm dispatch. Lets FNL's intra-realm and inter-realm halo refresh use the dummy-argument path.
`57350354`	fix(lib): plumb add_adam to field initialize for tree/field consistency	Surfaced during the investigation: `tree_object%initialize` has optional `add_adam` (default `.true.`); `field_object%initialize` had no matching dummy and silently set `blocks_number = 0` despite the tree being seeded with the ADAM ancestor block. Added `add_adam` to field, set `blocks_number = 1` ON RANK 0 ONLY. `adam_object%initialize` plumbs the same value to both tree and field. Independent fix; orthogonal to FNL multi-realm.
`90d0343e`	refactor(lib,prism): retire forest_realm class pointer + fix kernel device-mem thrash	Three improvements in one commit: (a) retired the `class(realm_object), pointer :: forest_realm(:)` shim from `adam_forest_global` (drop fallback paths in `update_ghost`, replace with `present(realm)`); (b) hoisted `host_buf` in `copy_peer_face_gpu` from allocatable + per-call `allocate/deallocate` to an automatic stack array (eliminated heap-tcache churn that was masking the GPU bug as a UCX-malloc abort); (c) compile-time-bound private arrays in `compute_residuals_fd_centered_dev_kernel` (replaced 6 × `private(qsx_y(1-s1:1+s1), ...)` with fixed-size + slice passed at call site — 89% reduction in 3.67 MB device-memory churn per substage).
`9fa34b08`	refactor(lib,prism): retire adam_forest_global, full dummy-argument threading	Closes the polymorphic / module-shared-state hygiene track: deletes `adam_forest_global.F90`; `forest_flux_register` becomes a `type(flux_register_object) :: flux_register` VALUE component on `forest_object`; `forest_active_substage` becomes an `integer, intent(in), optional :: s_active` dummy threaded through `exchange_inter_realm_halos_forest` to `copy_peer_face[_gpu]`. The forest passes `self%flux_register` and `s` through `residuals_substage_forest`. NO forest module-scope mutable state remains.

Phase	Depends on	Unblocks	Closes (G1, G2, accuracy)
A — Reflux	D.4a (done); existing comm primitives	B, C	bulk conservation across seam
B — Edge-E sharing	A (the lifecycle pattern is reused)	nothing	algebraic ∇·B = 0 across seam
C — WENO ghost-fill	nothing structural (independent of A, B)	non-conforming-resolution forests	seam accuracy at interior order

Coarse-fine interface machinery for ADAM/PRISM — Berger-Colella reflux, edge-E uniqueness, high-order ghost-fill #13

Description

Coarse-fine interface machinery for ADAM/PRISM — Berger-Colella reflux, edge-E uniqueness, high-order ghost-fill

1. Motivation — what the rmf-2realm regression revealed

2. Literature foundation — what the field has converged on

2.1 The cell-centered AMR divergence-preservation question, resolved

2.2 The 5-author algorithmic consensus on RK-multistage refluxing

2.3 The high-order ghost-fill picture

3. Phase A — Berger-Colella reflux on the existing flux array

3.1 New module — adam_flux_register_object

3.2 Lifecycle and hook points

3.3 Integration with the existing comm_map_* machinery

3.4 References and citations for Phase A

3.5 Phase A acceptance criteria

3.6 Phase A implementation log — what landed, in 8 commits

3.6.1 Why commits 1–3 left the digest unchanged — and what commit 4 fixed

3.6.2 Failed exploration recorded as anti-pattern (NOT committed)

3.6.3 Verification at develop HEAD 57df060e

3.7 Phase A — what's left

3.8 FNL multi-realm investigation (2026-05-25) — architectural cleanup landed, runtime bug remains

4. Phase B — Shared edge-E / induction-flux uniqueness at the seam

4.1 The mathematical mechanism — why shared edge-E gives algebraic ∇·B = 0

4.2 Implementation in cell-centered ADAM — "edge-E" is a derived quantity

4.3 New module — adam_edge_register_object

4.4 Lifecycle (structurally identical to Phase A)

4.5 References for Phase B

4.6 Phase B acceptance criteria

5. Phase C — High-order ghost-fill with hybrid smooth/shock switch

5.1 The current defect — 0th-order injection

5.2 The replacement — dimension-by-dimension WENO ghost-fill

5.3 The hybrid smooth/shock switch

5.4 Conservation-vs-order decoupling (the critical insight)

5.5 References for Phase C

5.6 Phase C acceptance criteria

6. Execution order and dependencies

7. Acceptance criteria for the whole issue

8. Out of scope (deferred)

9. Persistent context — links to the artifacts produced

Appendix H — Historical scope (the completed Phase D design work)

H.1 Phase D was the first concrete N>1 use case design

H.2 Use case specification (rmf-2realm)

H.3 Phase D task breakdown (final status)

H.4 May 2026 multi-realm bug fixes

H.5 Phase D risks (resolved)

H.6 Phase D out of scope (still deferred; restated for completeness)

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

3.1 New module — `adam_flux_register_object`

3.3 Integration with the existing `comm_map_*` machinery

3.6.3 Verification at develop HEAD `57df060e`

4.3 New module — `adam_edge_register_object`