PRISM segfaults when constrained_transport != no AND divergence_correction = poisson (heap corruption in q_name)

# PRISM segfaults when `constrained_transport != no` AND `divergence_correction = poisson`

## Summary

Running PRISM-CPU with the two settings

```ini
[numerics]
constrained_transport    = DB        ; or D, B, BD
divergence_correction    = poisson
```

both active produces a SIGSEGV during initialization (release build, `prism-cpu-gnu`, GCC 15.2.0 + OpenMPI 5.0.10, 2 MPI ranks). Either toggle alone runs to completion. The release-mode crash surfaces inside `__GI___libc_realloc` (heap corruption), so the Fortran frames are absent without debug info. A debug build (`prism-cpu-gnu-debug`) catches **a different and separate latent bug** — see "Secondary finding" below.

This bug blocks adding a regression test for the FNL `impose_ct_correction` kernel refactor done in issue #10 Step 4 — the only kernel that the CT-correction code path exercises is currently unreached by any committed input.

## Acceptance criteria

- [ ] A PRISM-CPU run with `constrained_transport = DB` AND `divergence_correction = poisson` (otherwise identical to `src/tests/prism/regression/rmf/input.ini`) completes 10 iterations without crash.
- [ ] The fix does not regress `rmf` (CPU + FNL digests + residuals).
- [ ] A new regression case under `src/tests/prism/regression/rmf-ct/` exercises the code path; golden captured per the standard CI-authority procedure.
- [ ] The pre-existing WENO debug-mode crash (secondary finding below) is filed separately or fixed in the same PR with a note.

## Reproduction

From repo root, with `module load openmpi/5.0.10-gnu15.2.0` (or any GNU+OpenMPI toolchain matching the regression harness):

```bash
fobis build --mode prism-cpu-gnu

mkdir -p /tmp/ct-repro && cd /tmp/ct-repro
cp /path/to/adam/src/tests/prism/regression/rmf/input.ini .
sed -i 's/constrained_transport    = no/constrained_transport    = DB/' input.ini
sed -i 's/divergence_correction    = no/divergence_correction    = poisson/' input.ini
sed -i 's/output_basename        = rmf_regression/output_basename        = ct_repro/' input.ini

mpirun -np 2 /path/to/adam/exe/adam_prism_cpu
```

Observed:

```
[mpi-00000]  nodes number:   +1
[mpi-00001]  nodes number:   +1
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
#0  0x...      at sysdeps/unix/sysv/linux/x86_64/libc_sigaction.c:0
#1  0x...      in musable
#2  0x...      in __GI___libc_realloc
#3  ...        (no Fortran symbols in release build)
prterun ... rank 1 ... exited on signal 11 (Segmentation fault)
```

The single-toggle variants run cleanly:

| `constrained_transport` | `divergence_correction` | Result |
|-------------------------|-------------------------|--------|
| `no`                    | `no` (= base `rmf`)     | OK — 10 iters |
| `DB`                    | `no`                    | OK — 10 iters |
| `no`                    | `poisson`               | OK — 10 iters |
| `DB`                    | `poisson`               | **SIGSEGV in init** |

## Root cause (primary hypothesis, needs confirmation)

The most likely culprit is an array-bound overrun in `prism_common_object%initialize`'s contained `io_initialize`, at `src/app/prism/common/adam_prism_common_object.F90:232-280`.

The variable-name arrays `self%q_name`, `self%dq_name`, `self%div_name`, `self%curl_name` are allocated with size `physics%nv` at `adam_prism_common_object.F90:125-128` (inside `allocate_common`):

```fortran
allocate(self%q_name(   self%nv))
allocate(self%dq_name(  self%nv))
allocate(self%div_name( self%nv))
allocate(self%curl_name(self%nv))
```

`physics%nv` is computed in `prism_physics_object%initialize` at `adam_prism_physics_object.F90:296-389`. With **either** `constrained_transport = DB` alone OR `divergence_correction = poisson` alone, `nv_cl = 0` and `nv = 9` (verified by stdout: `number of variables in q (nv): +9` in both single-toggle runs).

Then `io_initialize` at line 239-280 builds the names array using `numerics%constrained_transport_D/B` directly (NOT `physics%nv_cl`):

```fortran
associate(add_phi=>numerics%constrained_transport_D, &
          add_psi=>numerics%constrained_transport_B)
             q_name = ['Dx ','Dy ','Dz ','Bx ','By ','Bz ']
if (add_phi) q_name = [q_name, 'phi']                ! +1 if CT_D
if (add_psi) q_name = [q_name, 'psi']                ! +1 if CT_B
             q_name = [q_name, 'Jx ','Jy ','Jz ']
if (add_rho) q_name = [q_name, 'rho']
do v=1, size(q_name,dim=1)
   self%q_name(v) = trim(q_name(v))                  ! writes v=10, v=11 → OOB
enddo
```

With `constrained_transport = DB`, `q_name` has 11 entries; the loop writes into `self%q_name(10)` and `self%q_name(11)`, which are **outside the allocated 9-entry array**. Same pattern repeats for `dq_name`, `div_name`, `curl_name`.

This is silent heap corruption. Why it appears benign with `CT=DB` alone and lethal in combination with `divergence_correction=poisson` is timing-dependent — the corrupted memory happens to be touched (allocated/realloc'd) later only in the combined path. The bug is the same in both; one variant just gets lucky.

The structural fix is to **derive `add_phi`/`add_psi` from `physics%nv_cl`** so the names array size matches the allocated state-vector size — or symmetrically, **make `physics%nv` track `constrained_transport_*`** so `nv_cl` adds the right number of slots when CT is active.

Inspecting `adam_prism_physics_object.F90:296-389` shows that `nv_cl` IS bumped under `case(DIV_CORR_VAR_HYPER)` (hyperbolic divergence correction) — see lines 315-369 where `nv_cl=1` for one of CT_D/CT_B set and `nv_cl=2` for both — but **NOT** under `case(DIV_CORR_VAR_POISS)` (line 297-314, `nv_cl=0` unconditionally) and NOT under `case default` (line 371-389, also `nv_cl=0`). So CT only contributes to `nv` when hyperbolic correction is selected. With Poisson correction or no correction, `q` has no slots for `phi/psi` even though CT writes them into the names array.

This is a longstanding latent inconsistency: it suggests CT correction with Poisson divergence cleaning was never actually validated end-to-end, OR `phi/psi` are not supposed to be persistent state variables under Poisson and the names array is the only place they leak.

## Fix plan

### Step 1 — Confirm the root cause

- [ ] Reproduce locally per "Reproduction" above and add a Fortran-mode debug trap: build with `prism-cpu-gnu-debug` after **also** addressing the secondary WENO debug bug below (or set `-fcheck=bounds` selectively on `adam_prism_common_object`). Inspect the backtrace at the OOB write in `io_initialize`.
- [ ] Verify the hypothesis by adding `error stop` if `size(q_name) > self%nv` immediately before the loop — this should fire on `CT=DB` regardless of divergence-correction choice.
- [ ] If the hypothesis confirms, the bug class is "PRISM has two ground-truth sources for `nv` (physics vs numerics) that disagree under Poisson + CT".

### Step 2 — Decide the fix shape

Three options, in increasing scope:

1. **Minimal: gate `add_phi`/`add_psi` on `physics%nv_cl`** rather than `numerics%constrained_transport_*`. Two-line change in `io_initialize`. Documents the relationship "phi/psi names exist only when phi/psi slots exist in q". Does NOT add `phi/psi` to q under Poisson — i.e. CT under Poisson silently runs as CT under "no correction" for the purposes of state storage. May break correctness if `impose_ct_correction` reads/writes from `q(7,...)` / `q(8,...)` slots that under Poisson hold `Jx/Jy/Jz`. **Audit `impose_ct_correction` against `var_Jx/Jy/Jz` for slot collisions before choosing this.**

2. **Structural: extend `nv_cl` bookkeeping to Poisson**. In `adam_prism_physics_object.F90:297-314` (`case(DIV_CORR_VAR_POISS)`), apply the same `nv_cl = 0/1/2` logic that `DIV_CORR_VAR_HYPER` uses at lines 315-369, with `var_Jx/Jy/Jz` shifted accordingly. This makes `q` carry `phi/psi` slots under Poisson + CT, matching the names array shape. Requires verifying that downstream consumers (Poisson solver, RK update, IO) handle the wider state vector. Higher blast radius but architecturally cleaner.

3. **Comprehensive: refactor `nv` derivation to a single source**. Have `numerics` own the `add_phi/add_psi` flags AND `physics%nv_cl` derive from them with no `select case` on correction type. Eliminates the divergence at the root. Largest blast radius.

Recommended starting point: **option 1** with a `select case` guard or `error stop` if `impose_ct_correction` ever runs under Poisson, to make the silent gap explicit. Escalate to option 2 only when a use case for CT+Poisson is committed.

### Step 3 — Test coverage

- [ ] Add `src/tests/prism/regression/rmf-ct/` with the smaller of the working CT combinations (CT=DB + divergence_correction=hyperbolic, if that one runs — needs verification, not yet tested in this investigation).
- [ ] Capture goldens per the standard CI-authority procedure documented in `src/tests/prism/regression/README.md`.
- [ ] Confirm the new case exercises `impose_ct_correction` by adding a runtime print or counter in the kernel for the first run.
- [ ] Wire the new case into the existing `run.sh` discovery loop (no change needed — it auto-discovers any `<case>/input.ini`).

## Out of scope

- The hyperbolic-correction path (`divergence_correction = hyperbolic`) is not investigated here. It probably works because the existing `nv_cl` bookkeeping covers it.
- This issue does NOT address whether CT correction is numerically correct, only that the setup path doesn't crash.

## Secondary finding (file separately if not bundled)

A **pre-existing** bug surfaces under any debug-mode build (`prism-cpu-gnu-debug`, gfortran 15.2.0): even the base `rmf` regression input segfaults at `src/lib/common/adam_weno_object.F90:140` inside the `pure function description` when called from `weno%initialize` at line 272. The backtrace:

```
#1  __adam_weno_object_MOD_description     at src/lib/common/adam_weno_object.F90:140
#2  __adam_weno_object_MOD_initialize      at src/lib/common/adam_weno_object.F90:272
#3  __adam_realm_object_MOD_initialize     at src/lib/common/adam_realm_object.F90:447
```

Line 140 builds a string via `desc//mpih%myrankstr//'...'`. The most likely root cause is that `description` is declared `pure` but reads the module-scope variable `mpih%myrankstr` (`adam_mpih_global`'s singleton), which is illegal under strict Fortran 2008 purity rules and may trigger gfortran's bounds/sanity checks to mishandle the access. **This blocks the use of debug-mode builds for ANY PRISM investigation**, including the primary bug above — which is why this issue's reproduction sticks to release mode despite the resulting opacity in the backtrace.

Fix: either drop the `pure` annotation on `description` (the function clearly has side-effecting dependencies on module state), or refactor `description` to take `myrankstr` as an explicit dummy argument. Two-line change either way.

## Estimated effort

- Confirmation + minimal fix (option 1): **half a session**.
- Structural fix (option 2): **1-2 sessions** plus golden recapture for any existing tests that change shape.
- Regression case + golden capture: **half a session** once option 1/2 is chosen.
- Secondary WENO debug-mode fix: **30 minutes**.

## References

- Forest-of-trees migration plan: #10 (this bug blocks the "CT-correction regression case" follow-up noted in #10's Step 4 status).
- Last known-working CT use: none committed. `grep -rinE "constrained_transport\s*=\s*(D|B|DB|BD)" src/tests/` returns zero hits across the entire `src/tests/` tree.


`constrained_transport`	`divergence_correction`	Result
`no`	`no` (= base `rmf`)	OK — 10 iters
`DB`	`no`	OK — 10 iters
`no`	`poisson`	OK — 10 iters
`DB`	`poisson`	SIGSEGV in init

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PRISM segfaults when constrained_transport != no AND divergence_correction = poisson (heap corruption in q_name) #11

PRISM segfaults when `constrained_transport != no` AND `divergence_correction = poisson`

Summary

Acceptance criteria

Reproduction

Root cause (primary hypothesis, needs confirmation)

Fix plan

Step 1 — Confirm the root cause

Step 2 — Decide the fix shape

Step 3 — Test coverage

Out of scope

Secondary finding (file separately if not bundled)

Estimated effort

References

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

PRISM segfaults when constrained_transport != no AND divergence_correction = poisson (heap corruption in q_name) #11

Description

PRISM segfaults when constrained_transport != no AND divergence_correction = poisson

Summary

Acceptance criteria

Reproduction

Root cause (primary hypothesis, needs confirmation)

Fix plan

Step 1 — Confirm the root cause

Step 2 — Decide the fix shape

Step 3 — Test coverage

Out of scope

Secondary finding (file separately if not bundled)

Estimated effort

References

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

PRISM segfaults when `constrained_transport != no` AND `divergence_correction = poisson`