Skip to content

NonlinearSolveBaseEnzymeExt: preserve prob.p / prob.u0 aliasing in return-value shadow#937

Closed
ChrisRackauckas-Claude wants to merge 1 commit into
SciML:masterfrom
ChrisRackauckas-Claude:enzyme-shadow-alias-fix
Closed

NonlinearSolveBaseEnzymeExt: preserve prob.p / prob.u0 aliasing in return-value shadow#937
ChrisRackauckas-Claude wants to merge 1 commit into
SciML:masterfrom
ChrisRackauckas-Claude:enzyme-shadow-alias-fix

Conversation

@ChrisRackauckas-Claude
Copy link
Copy Markdown
Contributor

Note

Draft — please ignore until reviewed by @ChrisRackauckas.

Summary

Enzyme.make_zero(sol) in the augmented_primal return path recursively allocates fresh zero buffers for every mutable field of the returned NonlinearSolution, including sol.prob.p and sol.prob.u0. Those fields alias the outer caller's active p / u0 shadows. Severing that aliasing means any cotangent a downstream consumer writes into sol.prob.p (or .u0) lands in a dangling buffer instead of the buffer the outer Enzyme tape is tracking, silently dropping that gradient contribution.

The fix introduces a tiny _make_solution_zero helper that pre-seeds the make_zero IdDict so prob.p and prob.u0 map to themselves and the recursion short-circuits. The original buffers are reused verbatim in the shadow; sol.u (the actual derivative-carrying field) still gets a fresh zero buffer. Guards nothing parameters and non-mutable values.

Independent of #936 (caches accumulation work) — they touch the same file but address unrelated bugs in the rule.

Verification

Confirmed with a tiny MWE that naive Enzyme.make_zero(sol) returns a shadow whose prob.p is a fresh buffer:

sol = SciMLBase.build_solution(prob, nothing, [1.5, 2.0], zeros(2))
dsol = Enzyme.make_zero(sol)
@assert objectid(dsol.prob.p) != objectid(sol.prob.p)   # aliasing severed

With _make_solution_zero:

dsol = EXT._make_solution_zero(sol)
@assert dsol.prob.p === sol.prob.p                       # alias preserved
@assert all(iszero, dsol.u)                              # u is still fresh-zero

Both checks are encoded in lib/NonlinearSolveBase/test/enzyme_make_solution_zero.jl.

Discovery context

Surfaced while investigating a separate MethodError: no method matching MixedDuplicated(::NonlinearSolution, ::NonlinearSolution) on the polyalg-selected algorithm path. That MethodError turned out to be upstream in Enzyme's create_activity_wrapper (Enzyme/.../rules/jitrules.jl:14, invoked from runtime_generic_augfwd on the downstream SciMLBase.wrap_sol(::NonlinearSolution, …) call — the wrapper emits MixedDuplicated(primarg, shadowarg) with shadowarg::NonlinearSolution instead of Base.RefValue{NonlinearSolution} on the type-unstable generic dispatch path triggered by AutoSpecializeCallable{FunctionWrappersWrapper{…}} in the solution's type parameters). Pinning a concrete algorithm sidesteps that bug; the alias issue here is independent and stands on its own.

Test plan

  • Pkg.test("NonlinearSolveBase") passes — 32/32 on Julia 1.12.4 (was 28; +4 from the new asserts).
  • CI green.

Bumps

NonlinearSolveBase 2.26.02.26.1 (patch — bugfix only, no API change).

`Enzyme.make_zero(sol)` in the augmented_primal return path recursively
allocates fresh zero buffers for every mutable field of the
`NonlinearSolution`, including `sol.prob.p` and `sol.prob.u0`. Those
fields alias the outer caller's active `p` / `u0` shadows, so severing
the aliasing means any cotangent a downstream consumer writes back into
`sol.prob.p` (or `.u0`) lands in a dangling buffer instead of the
buffer the outer Enzyme tape is tracking, silently dropping that
contribution.

Replace the call with `_make_solution_zero(sol)`, which pre-seeds the
`make_zero` IdDict so `prob.p` and `prob.u0` map to themselves and the
recursion short-circuits — the original buffers are reused verbatim
while `sol.u` (the actual derivative-carrying field) still gets a
fresh zero buffer. Guards `nothing` parameters and non-mutable values.

Unit test asserts (a) naive `Enzyme.make_zero` breaks aliasing on a
`NonlinearProblem`-backed solution, (b) `_make_solution_zero`
preserves it (`===` and `objectid`), (c) `sol.u` is a fresh zero
buffer, (d) `nothing` `p` doesn't crash the pre-seed helper.

Independent of the `_accum_tangent!` caches-walk work in this branch.
Does not by itself fix the unrelated polyalg `MixedDuplicated`
MethodError, which has been traced to Enzyme's `create_activity_wrapper`
emitting `MixedDuplicated(::T, ::T)` for `wrap_sol(::NonlinearSolution)`
on the type-unstable generic dispatch path, an upstream Enzyme issue.

Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
@ChrisRackauckas-Claude
Copy link
Copy Markdown
Contributor Author

Closing as not required. Ablation study with the desauty SCC init test rewritten to use the proper Enzyme API (Enzyme.autodiff + Duplicated(iprob), SciML/SciMLSensitivity.jl#1454) confirms this fix does not change the outcome — desauty still passes with this PR's _make_solution_zero change reverted to plain Enzyme.make_zero. The alias-loss-on-make_zero pattern is real but harmless when the downstream user follows the documented Enzyme contract (don't Const closures with mutable captures).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants