Skip to content

Reverse mode returns zero gradient on closure-captured aliased mutable buffer (workaround: copy) #3124

@ChrisRackauckas-Claude

Description

@ChrisRackauckas-Claude

Summary

Enzyme.gradient(set_runtime_activity(Reverse), Const(loss), x) returns an all-zero gradient when:

  1. A mutable struct instance holding a Vector{Float64} is captured by a closure.
  2. The loss constructs a new instance of the same struct whose Vector field aliases the captured buffer (i.e. it stores the same Vector object, no copy).
  3. The loss mutates that buffer through the new struct from the active input.
  4. The loss reads sum(buffer).

Manually copying the buffer before mutation (so the alias is broken) gives the correct gradient that matches FiniteDiff to >8 sig figs. Plain Reverse (no runtime activity) raises EnzymeRuntimeActivityError on the aliasing version; set_runtime_activity(Reverse) runs through and silently returns zeros.

This was reduced from a SciML / ModelingToolkit MTKParameters case where the closure captures iprob.p, a repack callback returns a new MTKParameters whose caches::Tuple{Vector{Float64}} field aliases iprob.p.caches, and solve! then mutates p_new.caches[1]. The standalone version below has no SciML dependencies.

MWE

using Enzyme, FiniteDiff

mutable struct Holder
    v::Vector{Float64}
end

const captured = Holder([0.0, 0.0, 0.0])

function loss_alias(t::Vector{Float64})
    h = Holder(captured.v)           # NEW struct, .v aliases captured.v
    for i in eachindex(h.v)
        h.v[i] = t[i]^2
    end
    return sum(h.v)
end

function loss_copy(t::Vector{Float64})
    h = Holder(copy(captured.v))     # break the alias
    for i in eachindex(h.v)
        h.v[i] = t[i]^2
    end
    return sum(h.v)
end

t0 = [1.0, 2.0, 3.0]
mode = set_runtime_activity(Reverse)

@show FiniteDiff.finite_difference_gradient(loss_alias, t0)
@show FiniteDiff.finite_difference_gradient(loss_copy,  t0)
@show Enzyme.gradient(mode, Const(loss_alias), t0)
@show Enzyme.gradient(mode, Const(loss_copy),  t0)

Output

loss_alias(t0)  = 14.0
loss_copy(t0)   = 14.0
FD  grad (alias) = [2.0000000000471077, 3.9999999999475415, 5.999999999994649]
FD  grad (copy)  = [2.0000000000471077, 3.9999999999475415, 5.999999999994649]
Enz grad (alias) = ([0.0, 0.0, 0.0],)        # WRONG
Enz grad (copy)  = ([2.0, 4.0, 6.0],)        # correct, matches FD

With plain Reverse (no runtime activity), the loss_alias call instead raises:

EnzymeRuntimeActivityError: Detected potential need for runtime activity.
... Failure within method: getproperty(::Holder, ::Symbol) ...

so the issue manifests as either a hard error (plain Reverse) or a silently wrong zero gradient (set_runtime_activity(Reverse)).

Expected

Enzyme.gradient(mode, Const(loss_alias), t0) == [2.0, 4.0, 6.0] (same as the loss_copy version, same as FiniteDiff). The captured Holder is Const from Enzyme's perspective, but the buffer inside it is being treated as the active storage for the gradient computation in this call, and Enzyme should follow the alias and accumulate into it.

Workaround

Manually copy the buffer before storing it in the new struct, so the new struct does not alias any closure-captured storage.

Versions

  • Julia: 1.12.6
  • Enzyme: v0.13.150
  • FiniteDiff: v2.31.0
  • OS: Linux x86_64

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions