NonlinearSolveBaseEnzymeExt: preserve prob.p / prob.u0 aliasing in return-value shadow#937
Closed
ChrisRackauckas-Claude wants to merge 1 commit into
Closed
Conversation
3 tasks
`Enzyme.make_zero(sol)` in the augmented_primal return path recursively allocates fresh zero buffers for every mutable field of the `NonlinearSolution`, including `sol.prob.p` and `sol.prob.u0`. Those fields alias the outer caller's active `p` / `u0` shadows, so severing the aliasing means any cotangent a downstream consumer writes back into `sol.prob.p` (or `.u0`) lands in a dangling buffer instead of the buffer the outer Enzyme tape is tracking, silently dropping that contribution. Replace the call with `_make_solution_zero(sol)`, which pre-seeds the `make_zero` IdDict so `prob.p` and `prob.u0` map to themselves and the recursion short-circuits — the original buffers are reused verbatim while `sol.u` (the actual derivative-carrying field) still gets a fresh zero buffer. Guards `nothing` parameters and non-mutable values. Unit test asserts (a) naive `Enzyme.make_zero` breaks aliasing on a `NonlinearProblem`-backed solution, (b) `_make_solution_zero` preserves it (`===` and `objectid`), (c) `sol.u` is a fresh zero buffer, (d) `nothing` `p` doesn't crash the pre-seed helper. Independent of the `_accum_tangent!` caches-walk work in this branch. Does not by itself fix the unrelated polyalg `MixedDuplicated` MethodError, which has been traced to Enzyme's `create_activity_wrapper` emitting `MixedDuplicated(::T, ::T)` for `wrap_sol(::NonlinearSolution)` on the type-unstable generic dispatch path, an upstream Enzyme issue. Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
3642ed2 to
d813e91
Compare
2 tasks
Contributor
Author
|
Closing as not required. Ablation study with the desauty SCC init test rewritten to use the proper Enzyme API ( |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Note
Draft — please ignore until reviewed by @ChrisRackauckas.
Summary
Enzyme.make_zero(sol)in the augmented_primal return path recursively allocates fresh zero buffers for every mutable field of the returnedNonlinearSolution, includingsol.prob.pandsol.prob.u0. Those fields alias the outer caller's activep/u0shadows. Severing that aliasing means any cotangent a downstream consumer writes intosol.prob.p(or.u0) lands in a dangling buffer instead of the buffer the outer Enzyme tape is tracking, silently dropping that gradient contribution.The fix introduces a tiny
_make_solution_zerohelper that pre-seeds themake_zeroIdDict soprob.pandprob.u0map to themselves and the recursion short-circuits. The original buffers are reused verbatim in the shadow;sol.u(the actual derivative-carrying field) still gets a fresh zero buffer. Guardsnothingparameters and non-mutable values.Independent of #936 (caches accumulation work) — they touch the same file but address unrelated bugs in the rule.
Verification
Confirmed with a tiny MWE that naive
Enzyme.make_zero(sol)returns a shadow whoseprob.pis a fresh buffer:With
_make_solution_zero:Both checks are encoded in
lib/NonlinearSolveBase/test/enzyme_make_solution_zero.jl.Discovery context
Surfaced while investigating a separate
MethodError: no method matching MixedDuplicated(::NonlinearSolution, ::NonlinearSolution)on the polyalg-selected algorithm path. That MethodError turned out to be upstream in Enzyme'screate_activity_wrapper(Enzyme/.../rules/jitrules.jl:14, invoked fromruntime_generic_augfwdon the downstreamSciMLBase.wrap_sol(::NonlinearSolution, …)call — the wrapper emitsMixedDuplicated(primarg, shadowarg)withshadowarg::NonlinearSolutioninstead ofBase.RefValue{NonlinearSolution}on the type-unstable generic dispatch path triggered byAutoSpecializeCallable{FunctionWrappersWrapper{…}}in the solution's type parameters). Pinning a concrete algorithm sidesteps that bug; the alias issue here is independent and stands on its own.Test plan
Pkg.test("NonlinearSolveBase")passes — 32/32 on Julia 1.12.4 (was 28; +4 from the new asserts).Bumps
NonlinearSolveBase2.26.0→2.26.1(patch — bugfix only, no API change).