Skip to content

Forest γ (research RFC): continuous (dense) output for full-order seam coupling under asymmetric K #17

@szaghi

Description

@szaghi

Type: pure research RFC. No implementation timeline. Filed as a long-lived discussion / reference document. Related: #16 (forest α: end-of-step barrier — the production-path partner of this RFC).

Summary

This RFC scopes γ — continuous (dense) output for inter-realm seam coupling under same-dt, asymmetric-K. Under γ, when receiver realm A reads peer realm B mid-step at stage time t_A = c_A(s) · dt, the seam fill does NOT copy B's stored stage buffer verbatim (which is at peer time t_B = c_B(s) · dt ≠ t_A). Instead, B's continuous extension polynomial evaluates B's state at θ = t_A / dt, and that interpolated value is what crosses the seam. The result: full-order seam coupling in time, matching the integrator's discrete order on both sides.

γ is the strict-correctness alternative to α (the AMReX-aligned, end-of-step barrier with stale-by-one-step ghosts; see #16). γ is not proposed for implementation under the plasma thruster development arc — α suffices for that use case. γ exists in this RFC for future research, benchmark convergence runs, and as a reference design for ADAM contributors who eventually need full-order seam coupling.

Motivation

α drops the per-stage seam exchange and lives with O(dt) seam-coupling errors in exchange for AMReX-aligned simplicity. For plasma-thruster simulations where seams separate active and quiet regions, this is acceptable: the seam error is bounded, well-understood (Berger-Oliger 1984), and dominated by other discretization errors near the seam.

For other classes of problem, α is insufficient:

  • Order-of-accuracy benchmarks: when validating ADAM against analytic solutions, the seam region must not contaminate the convergence rate. Under α, a coupling-order drop at the seam is measurable in the L² norm and limits the demonstrable convergence rate to first order in time regardless of the integrator's discrete order.
  • Strong seam-active coupling: simulations where the dynamics legitimately straddle the seam (shocks crossing the seam, transient features moving through the boundary) need the seam to be invisible to the physics. Stale-by-one-step ghosts inject low-order errors precisely where they hurt most.
  • Asymmetric-K with very different K: when K_A=3 and K_B=11 (a future high-order spectral-deferred-correction scenario), the per-stage skew between A and B is large. α's "treat-mid-step-as-stale" approach is the same regardless of K-gap, but the worst-case error grows with the gap. γ's interpolation is order-preserving regardless.
  • Implicit-coupling regimes: where the seam is a stiff coupling and the realms need each other's current state (not last-step state) to remain stable. α may destabilize; γ keeps coupling temporally consistent.

The use case is research-grade ADAM applications rather than production plasma-thruster work. The implementation cost is substantial (see below); the value is conditional on having a research goal that exercises it.

Mathematical content

Continuous extension of a Runge-Kutta method

A continuous (dense) extension of an explicit RK method of stages S and discrete order p is a one-parameter family of polynomials b*_s(θ), θ ∈ [0, 1], such that

q(t + θ·dt) ≈ q^n + dt · Σ_{s=1}^{S} b*_s(θ) · k_s

is an order-p* approximation to the true ODE flow at any θ ∈ [0, 1], where p* ≤ p. The k_s are the standard slope vectors (residual evaluations) at the integrator's stages. The polynomials satisfy:

  • b*_s(0) = 0 for all s (the extension at θ = 0 returns the previous step's state)
  • b*_s(1) = b_s (the extension at θ = 1 returns the discrete update — recovers the standard step)
  • Order conditions of degree p* (Hairer-Nørsett-Wanner Vol. I §II.6 enumerates these)

The construction of b*_s(θ) is integrator-specific:

Method Discrete order p Dense order p* b*_s(θ) degree Reference
Dormand-Prince DOPRI5 (FSAL) 5 4 5 (with 7 stages, FSAL) HNW Vol. I §II.6
Classical RK4 4 3 Hermite cubic Shampine 1986
Verner 6(5) 6 5 extra stages required Verner 1979
SSP-RK-3 (Gottlieb-Shu) 3 2 (natural) quadratic Ketcheson-Macdonald-Gottlieb 2010
SSP-RK-5 (Spiteri-Ruuth) 4 (3 stages strong) 3 (natural) cubic same

The SSP caveat

For SSP-RK schemes (PRISM's production integrator family), dense output is not a free byproduct. SSP methods are designed for nonlinear stability under TVD-bounded data; dense output of order p* ≥ 2 typically loses the SSP property at intermediate θ. Ketcheson, Macdonald & Gottlieb (2010) prove that SSP-preserving dense output of order p* ≥ 2 requires additional stages beyond the integrator's nominal S. So shipping γ for SSP-RK-3 means shipping a SSP-RK-3-with-dense variant that has S+1 or S+2 evaluation slots, of which one or two are purely for dense output — different cost profile, different stage storage.

For non-SSP RK (Dormand-Prince family), dense output is well-established and used by every adaptive ODE solver (MATLAB ode45, SciPy RK45, SUNDIALS' ARKode). If γ is implemented, the cleanest start is to add a Dormand-Prince-with-dense-output integrator alongside the existing SSP family rather than retrofitting SSP-RK with extra stages.

Storage representation in ADAM

ADAM's q_rk(:,:,:,:,:,s) stores states (Shu-Osher form), not slopes (k_s). The dense formula needs slopes. The transformation from stored states to slopes is invertible:

k_s = (q_rk(s) - linear_combination_of_prior_states) / (dt · c_s)

This is exact but introduces additional FLOPs per dense evaluation. Alternatively, the RK object could store slopes directly under γ-mode (separate from the existing q_rk(s) storage), at the cost of doubling the per-realm RK memory footprint.

Proposed contract surface

! New TBP on realm_object — the continuous-extension query
subroutine evaluate_at_stage_time_forest(self, theta, q_buffer)
   !< Evaluate self's state at intermediate time t + theta*dt using dense
   !< output, where theta in [0, 1]. The forest invokes this from the peer-
   !< side during seam fill when the receiver's stage time differs from the
   !< sender's.
   !<
   !< Phase invariant: invoked between open_step_forest and close_step_forest
   !< of the same outer step; the stage history needed for evaluation lives
   !< in the integrator-private buffers (for RK realms: self%rk%q_rk(:,...,
   !< 1..s_max_so_far)). Calling outside this window or before s_max_so_far
   !< exceeds the requested theta is undefined behaviour.
   class(realm_object), intent(in)    :: self
   real(R8P),           intent(in)    :: theta            !< Fractional time, [0,1].
   real(R8P),           intent(inout) :: q_buffer(:,:,:,:,:)  !< Output: q at t + theta*dt.
end subroutine

The forest's per-stage seam fill changes from a direct buffer copy to a peer-side query:

For stage k:
   For each receiver realm A, for each peer realm B:
      theta_A = c_A(k)   ! receiver's stage time, fractional within dt
      call B%evaluate_at_stage_time_forest(theta=theta_A, q_buffer=ghost_buf_peer)
      Copy from ghost_buf_peer into A's seam ghosts via the existing
        seam_local_map_ghost_cell row range.

This is a two-step roundtrip per (receiver, peer) instead of α's zero-step (no mid-step fill at all) or the legacy contract's one-step (direct buffer copy).

Cost components

C1. Per-integrator dense-output coefficients. Each integrator family needs hard-coded b*_s(θ) polynomial coefficients. For SSP-RK-3 and SSP-RK-5 the natural-order extensions (p* = p - 1) are available; for full-order extensions, additional stages must be derived. For Yoshida / Blanes-Moan / CFM (composition methods, already wired in PRISM), dense output is non-standard and may not be published — original derivation work would be needed. For Leapfrog (component-staggered), dense output is meaningless in the standard sense.

C2. Stage-history availability. Full-formula dense output requires ALL k_s for s = 1..S. If receiver A asks B for state at θ_A while B is at its own stage 3 of 5, B has only k_1, k_2, k_3. Two responses:

  • Truncated dense output: use only the available stages, accepting a lower order during partial-history queries. Order drops as a function of (stages available) / S.
  • Delay the call: A only queries B after B has completed all stages. But then both are at end-of-step → identical to α.

The honest implementation is truncated dense output with documented order degradation during partial-history windows. Receiver A computes which stages B has completed by the time the seam fill fires, and queries B's truncated formula. The full-order coupling materializes only when both A and B happen to be queriable at θ-values for which both their truncated formulas reach full order — non-trivial to arrange in general.

C3. FNL OpenACC port. Today's seam fill on FNL (prism_fnl_object%fill_seam_from_peer_forest) is a single device-to-device copy under !$acc parallel loop. Under γ, the device kernel must evaluate a polynomial summing S slope vectors at runtime per cell per variable. This is FLOPs-heavy but cache-friendly; expected runtime cost is O(S × nv × nface_cells) per stage per (receiver, peer), versus α's O(nv × nface_cells) once per step. For SSP-RK-5 this is the per-stage cost, times K_max stages, versus α's once. Order-of-magnitude factor for the seam fill — though the seam fill is typically dwarfed by interior residuals, so total runtime impact may be modest.

C4. Reflux interaction. Under γ the per-stage flux register can be kept (third axis = K), with the convention that F_coarse(:,:,s) and F_fine_sum(:,:,s) store the flux at stage s of the coarse realm. The fine realm's flux accumulation at a non-coincident stage time t_fine = c_fine(s_fine)·dt must be PROJECTED onto coarse-stage bins via the dense-output formula (essentially an ∫ F(τ)dτ quadrature). This is a real mathematical decision (which quadrature? trapezoidal? Gauss?); the implementation cost is non-trivial.

C5. Validation against analytic. γ's claimed payoff is "full-order seam coupling". Demonstrating this requires regression cases with analytic solutions where convergence rate at the seam can be measured. Existing rmf-2realm is not such a case (it's a regression against numerical golden, not against analytic). A new test infrastructure for order-of-convergence studies would need to be built — itself a non-trivial effort.

Estimated timeline (informational, no commitment)

A serious γ implementation is 6–18 months of focused work:

  • 2–3 months: derive / catalog dense-output coefficients for ADAM's existing RK families; SSP-RK extensions per Ketcheson et al.
  • 1–2 months: design and implement the evaluate_at_stage_time_forest TBP on realm_object; PRISM-CPU override.
  • 2–3 months: FNL OpenACC port of the dense-output kernel; performance tuning.
  • 2–3 months: reflux quadrature on flux register; PRISM-CPU + FNL.
  • 2–3 months: validation infrastructure (analytic-solution convergence cases); demonstrating measured full-order at the seam.
  • 1–2 months: integration with α (γ as an opt-in mode), documentation, regression rebaseline.

This is research-grade work. Funding model: PhD thesis chapter, JCP-aspirant publication, or HPC center deliverable. Not appropriate for the production plasma-thruster development arc.

Recommendation

Do not implement γ. Implement α (per issue #16). Reconsider γ if and when:

  1. A measurable benchmark requirement emerges (analytic solution convergence at the seam),
  2. A plasma-thruster simulation case is demonstrated where α's stale-by-one-step ghosts visibly contaminate the result, OR
  3. A research collaboration / publication opportunity makes the 6–18-month effort scoped and funded.

This RFC stays open as a long-lived reference. If any of (1)–(3) materializes, the design surface is here ready for staffing.

References

Foundational (Runge-Kutta dense output)

  • Hairer, E., Nørsett, S. P., Wanner, G. (1993), Solving Ordinary Differential Equations I: Nonstiff Problems, 2nd ed., Springer. Vol. I §II.6 "Dense Output, Discontinuities, Derivatives", pp. 188–199. The canonical treatment.
  • Shampine, L. F. (1986), "Some practical Runge-Kutta formulas", Math. Comp. 46 (173), 135–150. Hermite cubic dense output for RK4.
  • Verner, J. H. (1979), "Families of imbedded Runge-Kutta methods", SIAM J. Numer. Anal. 16 (5), 857–875.

SSP-specific (the relevant body for ADAM's production integrators)

  • Ketcheson, D. I., Macdonald, C. B., Gottlieb, S. (2010), "Optimal implicit strong stability preserving Runge–Kutta methods", Appl. Numer. Math. 59 (2), 373–392. SSP-preserving continuous extensions.
  • Higueras, I., Roldán, T. (2018), "New third order low-storage SSP explicit Runge-Kutta methods", J. Sci. Comput. 79 (3), 1882–1906. Continuous-output variants for low-storage SSP.
  • Gottlieb, S., Ketcheson, D. I., Shu, C.-W. (2011), Strong Stability Preserving Runge-Kutta and Multistep Time Discretizations, World Scientific. Chapter 5 discusses temporal accuracy at boundaries.
  • Spiteri, R. J., Ruuth, S. J. (2002), "A new class of optimal high-order strong-stability-preserving time discretization methods", SIAM J. Numer. Anal. 40 (2), 469–491. The source for the SSP-RK-5 coefficients ADAM uses.

Adaptive-mesh / multi-region temporal coupling (where dense output meets the seam problem)

  • Berger, M. J., Oliger, J. (1984), "Adaptive mesh refinement for hyperbolic partial differential equations", J. Comput. Phys. 53 (3), 484–512. The original AMR paper. Linear-in-time interpolation at coarse-fine boundaries (dense output of order 1, equivalent to AMReX's FillCoarsePatch). This is α's literature anchor; γ would extend to higher dense-output order.
  • Almgren, A. S., et al. (2010), "CASTRO: A new compressible astrophysical solver", Astrophys. J. 715 (2), 1221–1238. Linear-in-time interpolation between coarse-step values for fine-step ghost fills in production AMR astrophysics.
  • Dumbser, M., Zanotti, O., Loubère, R., Diot, S. (2014), "A posteriori subcell limiting of the discontinuous Galerkin finite element method for hyperbolic conservation laws", J. Comput. Phys. 278, 47–75. Higher-order temporal reconstruction at interfaces in ADER schemes.
  • van Leer, B., Nomura, S. (2005), "Discontinuous Galerkin for diffusion", AIAA Paper 2005-5108, 17th AIAA CFD Conference. Temporal reconstruction at interfaces — the most direct precedent for the γ seam-coupling formulation.

AMReX correspondence (for ADAM contributors familiar with that ecosystem)

  • AMReX source (Src/AmrCore/AMReX_FillPatchUtil.cpp, Src/Amr/AMReX_Amr.cpp): FillCoarsePatch uses linear-in-time interp of coarse t^n and t^{n+1} for subcycled fine ghost fill. γ for ADAM is the order-of-accuracy upgrade from AMReX's linear interpolation to full RK dense output. AMReX itself does not use full RK dense output at interfaces.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or requestquestionFurther information is requested

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions