Type: pure research RFC. No implementation timeline. Filed as a long-lived discussion / reference document. Related: #16 (forest α: end-of-step barrier — the production-path partner of this RFC).
Summary
This RFC scopes γ — continuous (dense) output for inter-realm seam coupling under same-dt, asymmetric-K. Under γ, when receiver realm A reads peer realm B mid-step at stage time t_A = c_A(s) · dt, the seam fill does NOT copy B's stored stage buffer verbatim (which is at peer time t_B = c_B(s) · dt ≠ t_A). Instead, B's continuous extension polynomial evaluates B's state at θ = t_A / dt, and that interpolated value is what crosses the seam. The result: full-order seam coupling in time, matching the integrator's discrete order on both sides.
γ is the strict-correctness alternative to α (the AMReX-aligned, end-of-step barrier with stale-by-one-step ghosts; see #16). γ is not proposed for implementation under the plasma thruster development arc — α suffices for that use case. γ exists in this RFC for future research, benchmark convergence runs, and as a reference design for ADAM contributors who eventually need full-order seam coupling.
Motivation
α drops the per-stage seam exchange and lives with O(dt) seam-coupling errors in exchange for AMReX-aligned simplicity. For plasma-thruster simulations where seams separate active and quiet regions, this is acceptable: the seam error is bounded, well-understood (Berger-Oliger 1984), and dominated by other discretization errors near the seam.
For other classes of problem, α is insufficient:
- Order-of-accuracy benchmarks: when validating ADAM against analytic solutions, the seam region must not contaminate the convergence rate. Under α, a coupling-order drop at the seam is measurable in the L² norm and limits the demonstrable convergence rate to first order in time regardless of the integrator's discrete order.
- Strong seam-active coupling: simulations where the dynamics legitimately straddle the seam (shocks crossing the seam, transient features moving through the boundary) need the seam to be invisible to the physics. Stale-by-one-step ghosts inject low-order errors precisely where they hurt most.
- Asymmetric-K with very different K: when K_A=3 and K_B=11 (a future high-order spectral-deferred-correction scenario), the per-stage skew between A and B is large. α's "treat-mid-step-as-stale" approach is the same regardless of K-gap, but the worst-case error grows with the gap. γ's interpolation is order-preserving regardless.
- Implicit-coupling regimes: where the seam is a stiff coupling and the realms need each other's current state (not last-step state) to remain stable. α may destabilize; γ keeps coupling temporally consistent.
The use case is research-grade ADAM applications rather than production plasma-thruster work. The implementation cost is substantial (see below); the value is conditional on having a research goal that exercises it.
Mathematical content
Continuous extension of a Runge-Kutta method
A continuous (dense) extension of an explicit RK method of stages S and discrete order p is a one-parameter family of polynomials b*_s(θ), θ ∈ [0, 1], such that
q(t + θ·dt) ≈ q^n + dt · Σ_{s=1}^{S} b*_s(θ) · k_s
is an order-p* approximation to the true ODE flow at any θ ∈ [0, 1], where p* ≤ p. The k_s are the standard slope vectors (residual evaluations) at the integrator's stages. The polynomials satisfy:
b*_s(0) = 0 for all s (the extension at θ = 0 returns the previous step's state)
b*_s(1) = b_s (the extension at θ = 1 returns the discrete update — recovers the standard step)
- Order conditions of degree
p* (Hairer-Nørsett-Wanner Vol. I §II.6 enumerates these)
The construction of b*_s(θ) is integrator-specific:
| Method |
Discrete order p |
Dense order p* |
b*_s(θ) degree |
Reference |
| Dormand-Prince DOPRI5 (FSAL) |
5 |
4 |
5 (with 7 stages, FSAL) |
HNW Vol. I §II.6 |
| Classical RK4 |
4 |
3 |
Hermite cubic |
Shampine 1986 |
| Verner 6(5) |
6 |
5 |
extra stages required |
Verner 1979 |
| SSP-RK-3 (Gottlieb-Shu) |
3 |
2 (natural) |
quadratic |
Ketcheson-Macdonald-Gottlieb 2010 |
| SSP-RK-5 (Spiteri-Ruuth) |
4 (3 stages strong) |
3 (natural) |
cubic |
same |
The SSP caveat
For SSP-RK schemes (PRISM's production integrator family), dense output is not a free byproduct. SSP methods are designed for nonlinear stability under TVD-bounded data; dense output of order p* ≥ 2 typically loses the SSP property at intermediate θ. Ketcheson, Macdonald & Gottlieb (2010) prove that SSP-preserving dense output of order p* ≥ 2 requires additional stages beyond the integrator's nominal S. So shipping γ for SSP-RK-3 means shipping a SSP-RK-3-with-dense variant that has S+1 or S+2 evaluation slots, of which one or two are purely for dense output — different cost profile, different stage storage.
For non-SSP RK (Dormand-Prince family), dense output is well-established and used by every adaptive ODE solver (MATLAB ode45, SciPy RK45, SUNDIALS' ARKode). If γ is implemented, the cleanest start is to add a Dormand-Prince-with-dense-output integrator alongside the existing SSP family rather than retrofitting SSP-RK with extra stages.
Storage representation in ADAM
ADAM's q_rk(:,:,:,:,:,s) stores states (Shu-Osher form), not slopes (k_s). The dense formula needs slopes. The transformation from stored states to slopes is invertible:
k_s = (q_rk(s) - linear_combination_of_prior_states) / (dt · c_s)
This is exact but introduces additional FLOPs per dense evaluation. Alternatively, the RK object could store slopes directly under γ-mode (separate from the existing q_rk(s) storage), at the cost of doubling the per-realm RK memory footprint.
Proposed contract surface
! New TBP on realm_object — the continuous-extension query
subroutine evaluate_at_stage_time_forest(self, theta, q_buffer)
!< Evaluate self's state at intermediate time t + theta*dt using dense
!< output, where theta in [0, 1]. The forest invokes this from the peer-
!< side during seam fill when the receiver's stage time differs from the
!< sender's.
!<
!< Phase invariant: invoked between open_step_forest and close_step_forest
!< of the same outer step; the stage history needed for evaluation lives
!< in the integrator-private buffers (for RK realms: self%rk%q_rk(:,...,
!< 1..s_max_so_far)). Calling outside this window or before s_max_so_far
!< exceeds the requested theta is undefined behaviour.
class(realm_object), intent(in) :: self
real(R8P), intent(in) :: theta !< Fractional time, [0,1].
real(R8P), intent(inout) :: q_buffer(:,:,:,:,:) !< Output: q at t + theta*dt.
end subroutine
The forest's per-stage seam fill changes from a direct buffer copy to a peer-side query:
For stage k:
For each receiver realm A, for each peer realm B:
theta_A = c_A(k) ! receiver's stage time, fractional within dt
call B%evaluate_at_stage_time_forest(theta=theta_A, q_buffer=ghost_buf_peer)
Copy from ghost_buf_peer into A's seam ghosts via the existing
seam_local_map_ghost_cell row range.
This is a two-step roundtrip per (receiver, peer) instead of α's zero-step (no mid-step fill at all) or the legacy contract's one-step (direct buffer copy).
Cost components
C1. Per-integrator dense-output coefficients. Each integrator family needs hard-coded b*_s(θ) polynomial coefficients. For SSP-RK-3 and SSP-RK-5 the natural-order extensions (p* = p - 1) are available; for full-order extensions, additional stages must be derived. For Yoshida / Blanes-Moan / CFM (composition methods, already wired in PRISM), dense output is non-standard and may not be published — original derivation work would be needed. For Leapfrog (component-staggered), dense output is meaningless in the standard sense.
C2. Stage-history availability. Full-formula dense output requires ALL k_s for s = 1..S. If receiver A asks B for state at θ_A while B is at its own stage 3 of 5, B has only k_1, k_2, k_3. Two responses:
- Truncated dense output: use only the available stages, accepting a lower order during partial-history queries. Order drops as a function of (stages available) / S.
- Delay the call: A only queries B after B has completed all stages. But then both are at end-of-step → identical to α.
The honest implementation is truncated dense output with documented order degradation during partial-history windows. Receiver A computes which stages B has completed by the time the seam fill fires, and queries B's truncated formula. The full-order coupling materializes only when both A and B happen to be queriable at θ-values for which both their truncated formulas reach full order — non-trivial to arrange in general.
C3. FNL OpenACC port. Today's seam fill on FNL (prism_fnl_object%fill_seam_from_peer_forest) is a single device-to-device copy under !$acc parallel loop. Under γ, the device kernel must evaluate a polynomial summing S slope vectors at runtime per cell per variable. This is FLOPs-heavy but cache-friendly; expected runtime cost is O(S × nv × nface_cells) per stage per (receiver, peer), versus α's O(nv × nface_cells) once per step. For SSP-RK-5 this is 5× the per-stage cost, times K_max stages, versus α's once. Order-of-magnitude factor for the seam fill — though the seam fill is typically dwarfed by interior residuals, so total runtime impact may be modest.
C4. Reflux interaction. Under γ the per-stage flux register can be kept (third axis = K), with the convention that F_coarse(:,:,s) and F_fine_sum(:,:,s) store the flux at stage s of the coarse realm. The fine realm's flux accumulation at a non-coincident stage time t_fine = c_fine(s_fine)·dt must be PROJECTED onto coarse-stage bins via the dense-output formula (essentially an ∫ F(τ)dτ quadrature). This is a real mathematical decision (which quadrature? trapezoidal? Gauss?); the implementation cost is non-trivial.
C5. Validation against analytic. γ's claimed payoff is "full-order seam coupling". Demonstrating this requires regression cases with analytic solutions where convergence rate at the seam can be measured. Existing rmf-2realm is not such a case (it's a regression against numerical golden, not against analytic). A new test infrastructure for order-of-convergence studies would need to be built — itself a non-trivial effort.
Estimated timeline (informational, no commitment)
A serious γ implementation is 6–18 months of focused work:
- 2–3 months: derive / catalog dense-output coefficients for ADAM's existing RK families; SSP-RK extensions per Ketcheson et al.
- 1–2 months: design and implement the
evaluate_at_stage_time_forest TBP on realm_object; PRISM-CPU override.
- 2–3 months: FNL OpenACC port of the dense-output kernel; performance tuning.
- 2–3 months: reflux quadrature on flux register; PRISM-CPU + FNL.
- 2–3 months: validation infrastructure (analytic-solution convergence cases); demonstrating measured full-order at the seam.
- 1–2 months: integration with α (γ as an opt-in mode), documentation, regression rebaseline.
This is research-grade work. Funding model: PhD thesis chapter, JCP-aspirant publication, or HPC center deliverable. Not appropriate for the production plasma-thruster development arc.
Recommendation
Do not implement γ. Implement α (per issue #16). Reconsider γ if and when:
- A measurable benchmark requirement emerges (analytic solution convergence at the seam),
- A plasma-thruster simulation case is demonstrated where α's stale-by-one-step ghosts visibly contaminate the result, OR
- A research collaboration / publication opportunity makes the 6–18-month effort scoped and funded.
This RFC stays open as a long-lived reference. If any of (1)–(3) materializes, the design surface is here ready for staffing.
References
Foundational (Runge-Kutta dense output)
- Hairer, E., Nørsett, S. P., Wanner, G. (1993), Solving Ordinary Differential Equations I: Nonstiff Problems, 2nd ed., Springer. Vol. I §II.6 "Dense Output, Discontinuities, Derivatives", pp. 188–199. The canonical treatment.
- Shampine, L. F. (1986), "Some practical Runge-Kutta formulas", Math. Comp. 46 (173), 135–150. Hermite cubic dense output for RK4.
- Verner, J. H. (1979), "Families of imbedded Runge-Kutta methods", SIAM J. Numer. Anal. 16 (5), 857–875.
SSP-specific (the relevant body for ADAM's production integrators)
- Ketcheson, D. I., Macdonald, C. B., Gottlieb, S. (2010), "Optimal implicit strong stability preserving Runge–Kutta methods", Appl. Numer. Math. 59 (2), 373–392. SSP-preserving continuous extensions.
- Higueras, I., Roldán, T. (2018), "New third order low-storage SSP explicit Runge-Kutta methods", J. Sci. Comput. 79 (3), 1882–1906. Continuous-output variants for low-storage SSP.
- Gottlieb, S., Ketcheson, D. I., Shu, C.-W. (2011), Strong Stability Preserving Runge-Kutta and Multistep Time Discretizations, World Scientific. Chapter 5 discusses temporal accuracy at boundaries.
- Spiteri, R. J., Ruuth, S. J. (2002), "A new class of optimal high-order strong-stability-preserving time discretization methods", SIAM J. Numer. Anal. 40 (2), 469–491. The source for the SSP-RK-5 coefficients ADAM uses.
Adaptive-mesh / multi-region temporal coupling (where dense output meets the seam problem)
- Berger, M. J., Oliger, J. (1984), "Adaptive mesh refinement for hyperbolic partial differential equations", J. Comput. Phys. 53 (3), 484–512. The original AMR paper. Linear-in-time interpolation at coarse-fine boundaries (dense output of order 1, equivalent to AMReX's
FillCoarsePatch). This is α's literature anchor; γ would extend to higher dense-output order.
- Almgren, A. S., et al. (2010), "CASTRO: A new compressible astrophysical solver", Astrophys. J. 715 (2), 1221–1238. Linear-in-time interpolation between coarse-step values for fine-step ghost fills in production AMR astrophysics.
- Dumbser, M., Zanotti, O., Loubère, R., Diot, S. (2014), "A posteriori subcell limiting of the discontinuous Galerkin finite element method for hyperbolic conservation laws", J. Comput. Phys. 278, 47–75. Higher-order temporal reconstruction at interfaces in ADER schemes.
- van Leer, B., Nomura, S. (2005), "Discontinuous Galerkin for diffusion", AIAA Paper 2005-5108, 17th AIAA CFD Conference. Temporal reconstruction at interfaces — the most direct precedent for the γ seam-coupling formulation.
AMReX correspondence (for ADAM contributors familiar with that ecosystem)
- AMReX source (
Src/AmrCore/AMReX_FillPatchUtil.cpp, Src/Amr/AMReX_Amr.cpp): FillCoarsePatch uses linear-in-time interp of coarse t^n and t^{n+1} for subcycled fine ghost fill. γ for ADAM is the order-of-accuracy upgrade from AMReX's linear interpolation to full RK dense output. AMReX itself does not use full RK dense output at interfaces.
Summary
This RFC scopes γ — continuous (dense) output for inter-realm seam coupling under same-
dt, asymmetric-K. Under γ, when receiver realm A reads peer realm B mid-step at stage timet_A = c_A(s) · dt, the seam fill does NOT copy B's stored stage buffer verbatim (which is at peer timet_B = c_B(s) · dt ≠ t_A). Instead, B's continuous extension polynomial evaluates B's state atθ = t_A / dt, and that interpolated value is what crosses the seam. The result: full-order seam coupling in time, matching the integrator's discrete order on both sides.γ is the strict-correctness alternative to α (the AMReX-aligned, end-of-step barrier with stale-by-one-step ghosts; see #16). γ is not proposed for implementation under the plasma thruster development arc — α suffices for that use case. γ exists in this RFC for future research, benchmark convergence runs, and as a reference design for ADAM contributors who eventually need full-order seam coupling.
Motivation
α drops the per-stage seam exchange and lives with
O(dt)seam-coupling errors in exchange for AMReX-aligned simplicity. For plasma-thruster simulations where seams separate active and quiet regions, this is acceptable: the seam error is bounded, well-understood (Berger-Oliger 1984), and dominated by other discretization errors near the seam.For other classes of problem, α is insufficient:
The use case is research-grade ADAM applications rather than production plasma-thruster work. The implementation cost is substantial (see below); the value is conditional on having a research goal that exercises it.
Mathematical content
Continuous extension of a Runge-Kutta method
A continuous (dense) extension of an explicit RK method of stages
Sand discrete orderpis a one-parameter family of polynomialsb*_s(θ),θ ∈ [0, 1], such thatis an order-
p*approximation to the true ODE flow at anyθ ∈ [0, 1], wherep* ≤ p. Thek_sare the standard slope vectors (residual evaluations) at the integrator's stages. The polynomials satisfy:b*_s(0) = 0for alls(the extension atθ = 0returns the previous step's state)b*_s(1) = b_s(the extension atθ = 1returns the discrete update — recovers the standard step)p*(Hairer-Nørsett-Wanner Vol. I §II.6 enumerates these)The construction of
b*_s(θ)is integrator-specific:pp*b*_s(θ)degreeThe SSP caveat
For SSP-RK schemes (PRISM's production integrator family), dense output is not a free byproduct. SSP methods are designed for nonlinear stability under TVD-bounded data; dense output of order
p* ≥ 2typically loses the SSP property at intermediateθ. Ketcheson, Macdonald & Gottlieb (2010) prove that SSP-preserving dense output of orderp* ≥ 2requires additional stages beyond the integrator's nominalS. So shipping γ for SSP-RK-3 means shipping a SSP-RK-3-with-dense variant that hasS+1orS+2evaluation slots, of which one or two are purely for dense output — different cost profile, different stage storage.For non-SSP RK (Dormand-Prince family), dense output is well-established and used by every adaptive ODE solver (MATLAB
ode45, SciPyRK45, SUNDIALS' ARKode). If γ is implemented, the cleanest start is to add a Dormand-Prince-with-dense-output integrator alongside the existing SSP family rather than retrofitting SSP-RK with extra stages.Storage representation in ADAM
ADAM's
q_rk(:,:,:,:,:,s)stores states (Shu-Osher form), not slopes (k_s). The dense formula needs slopes. The transformation from stored states to slopes is invertible:This is exact but introduces additional FLOPs per dense evaluation. Alternatively, the RK object could store slopes directly under γ-mode (separate from the existing
q_rk(s)storage), at the cost of doubling the per-realm RK memory footprint.Proposed contract surface
The forest's per-stage seam fill changes from a direct buffer copy to a peer-side query:
This is a two-step roundtrip per (receiver, peer) instead of α's zero-step (no mid-step fill at all) or the legacy contract's one-step (direct buffer copy).
Cost components
C1. Per-integrator dense-output coefficients. Each integrator family needs hard-coded
b*_s(θ)polynomial coefficients. For SSP-RK-3 and SSP-RK-5 the natural-order extensions (p* = p - 1) are available; for full-order extensions, additional stages must be derived. For Yoshida / Blanes-Moan / CFM (composition methods, already wired in PRISM), dense output is non-standard and may not be published — original derivation work would be needed. For Leapfrog (component-staggered), dense output is meaningless in the standard sense.C2. Stage-history availability. Full-formula dense output requires ALL
k_sfors = 1..S. If receiver A asks B for state atθ_Awhile B is at its own stage 3 of 5, B has onlyk_1, k_2, k_3. Two responses:The honest implementation is truncated dense output with documented order degradation during partial-history windows. Receiver A computes which stages B has completed by the time the seam fill fires, and queries B's truncated formula. The full-order coupling materializes only when both A and B happen to be queriable at θ-values for which both their truncated formulas reach full order — non-trivial to arrange in general.
C3. FNL OpenACC port. Today's seam fill on FNL (
prism_fnl_object%fill_seam_from_peer_forest) is a single device-to-device copy under!$acc parallel loop. Under γ, the device kernel must evaluate a polynomial summingSslope vectors at runtime per cell per variable. This is FLOPs-heavy but cache-friendly; expected runtime cost isO(S × nv × nface_cells)per stage per (receiver, peer), versus α'sO(nv × nface_cells)once per step. For SSP-RK-5 this is5×the per-stage cost, timesK_maxstages, versus α's once. Order-of-magnitude factor for the seam fill — though the seam fill is typically dwarfed by interior residuals, so total runtime impact may be modest.C4. Reflux interaction. Under γ the per-stage flux register can be kept (third axis = K), with the convention that
F_coarse(:,:,s)andF_fine_sum(:,:,s)store the flux at stagesof the coarse realm. The fine realm's flux accumulation at a non-coincident stage timet_fine = c_fine(s_fine)·dtmust be PROJECTED onto coarse-stage bins via the dense-output formula (essentially an∫ F(τ)dτquadrature). This is a real mathematical decision (which quadrature? trapezoidal? Gauss?); the implementation cost is non-trivial.C5. Validation against analytic. γ's claimed payoff is "full-order seam coupling". Demonstrating this requires regression cases with analytic solutions where convergence rate at the seam can be measured. Existing rmf-2realm is not such a case (it's a regression against numerical golden, not against analytic). A new test infrastructure for order-of-convergence studies would need to be built — itself a non-trivial effort.
Estimated timeline (informational, no commitment)
A serious γ implementation is 6–18 months of focused work:
evaluate_at_stage_time_forestTBP onrealm_object; PRISM-CPU override.This is research-grade work. Funding model: PhD thesis chapter, JCP-aspirant publication, or HPC center deliverable. Not appropriate for the production plasma-thruster development arc.
Recommendation
Do not implement γ. Implement α (per issue #16). Reconsider γ if and when:
This RFC stays open as a long-lived reference. If any of (1)–(3) materializes, the design surface is here ready for staffing.
References
Foundational (Runge-Kutta dense output)
SSP-specific (the relevant body for ADAM's production integrators)
Adaptive-mesh / multi-region temporal coupling (where dense output meets the seam problem)
FillCoarsePatch). This is α's literature anchor; γ would extend to higher dense-output order.AMReX correspondence (for ADAM contributors familiar with that ecosystem)
Src/AmrCore/AMReX_FillPatchUtil.cpp,Src/Amr/AMReX_Amr.cpp):FillCoarsePatchuses linear-in-time interp of coarset^nandt^{n+1}for subcycled fine ghost fill. γ for ADAM is the order-of-accuracy upgrade from AMReX's linear interpolation to full RK dense output. AMReX itself does not use full RK dense output at interfaces.