Share same cache among similar operators in `AddedOperator` by albertomercurio · Pull Request #370 · SciML/SciMLOperators.jl

albertomercurio · 2026-05-06T12:34:53Z

Checklist

Appropriate tests were added
Any code changes were done in a way that does not break public API
All documentation related to code changes were updated
The new code follows the
contributor guidelines, in particular the SciML Style Guide and
COLPRAC.
Any new documentation only uses public API

Additional context

When caching an AddedOperator, we currently cache all the suboperators iteratively. However, this is suboptimal, as many of them can in principle share the same cache when possible.

In this PR I implement this cache sharing in the AddedOperator. Let's imagine I have the sum of Composed + Tensor + Tensor + Composed + Matrix + Tensor, many of them require some cache, but they could share the same cache. We can be sure they can share the same cache if

They are the same constructor (e.g., ComposedOperator)
The size of their cache is the same

the first one can be checked even at compile time, while the second requires to define _get_cache_shapes(op, v), which returns the shape of the cache.

Benchmarks

This method allows to save a lot of memory, as also shown in this example

using SparseArrays
using SciMLOperators
using CairoMakie

function generate_op(N)
    M = 10
    sparsity = 1 / N

    A_list = ntuple(i -> MatrixOperator(sprand(N, N, sparsity)) * MatrixOperator(sprand(N, N, sparsity)), Val(M))

    op = SciMLOperators.AddedOperator(A_list)

    u = rand(N, N)
    return cache_operator(op, u)
    # return op
end

# %%

N_list = 4 .^ (2:7)
sizes_main = [Base.summarysize(generate_op(N)) for N in N_list]
sizes_pr = [Base.summarysize(generate_op(N)) for N in N_list]

# %%

fig = Figure()
ax = Axis(fig[1, 1], xlabel="N", ylabel="Size (MB)", xscale=log10, yscale=log10)

scatterlines!(ax, N_list, sizes_main ./ 1e6, label="main")
scatterlines!(ax, N_list, sizes_pr ./ 1e6, label="PR")

axislegend(ax; position = :lt)

fig

Co-authored-by: Copilot <copilot@github.com>

albertomercurio · 2026-05-17T12:18:19Z

I think this PR is ready. It improves the cache handling for AddedOperator

ChrisRackauckas-Claude

Thanks for the work on this — the memory wins are real for the benchmark case, but I think there are correctness issues that need to be fixed before merging. I dug into the diff, ran reproducers locally against this branch on Julia 1.11.9, and found the following.

1. `InexactError` on mixed-eltype `ComposedOperator`s in the same `AddedOperator` (correctness)

A real-world QM-style pattern fails:

using SciMLOperators, LinearAlgebra
Ar = MatrixOperator(rand(Float64, 4, 4));    Br = MatrixOperator(rand(Float64, 4, 4))
Ac = MatrixOperator(rand(ComplexF64, 4, 4)); Bc = MatrixOperator(rand(ComplexF64, 4, 4))
L  = (Ar*Br) + (Ac*Bc)
Ls = cache_operator(L, rand(Float64, 4))
mul!(similar(rand(Float64,4), ComplexF64), Ls, rand(Float64,4))
# ERROR: InexactError: Float64(... + 0.121im)

Both sub-ops are ComposedOperator, so the asker Ac*Bc is forced to share Ar*Br's cache. The donor's cache is Float64 (because v is real and cache_self uses promote_type(eltype(op), eltype(zero(v)))). The asker would normally allocate ComplexF64 slots, since its inner Bc is ComplexF64. But _cache_compatible at src/interface.jl:210 only checks size(hint) == shape — no eltype check — so sharing proceeds. Then mul!(L.cache[1], Bc, v) writes a complex result into a Float64 buffer.

The reverse order (Ac*Bc) + (Ar*Br) also breaks, in the 5-arg path: copy!(cache, w) with w::ComplexF64 and cache::Float64.

Worth noting that Ls * v happens to return correct results because Base.:*(L::AddedOperator, v) (src/basic.jl:676) does an out-of-place sum(op -> op*v, L.ops) that bypasses the cache entirely. So the new testset asserts L * u ≈ expected, but that doesn't exercise the in-place path that the cache is built for. The mul!(w, L, v) path is what users on hot loops will hit.

2. `_get_cache_shapes(::ComposedOperator)` is off-by-one (logic bug)

# A is 5×3, B is 3×4, C = A*B, v: 4-vec
SciMLOperators._get_cache_shapes(cache_operator(A*B, v), v)
# ((5,), (3,))         ← claimed
map(size, cache_operator(A*B, v).cache)
# ((3,), (4,))         ← actual

_get_cache_shapes at src/basic.jl:959 claims slot i has shape (size(L.ops[i],1), K). But cache_self at src/basic.jl:969 actually allocates slot i (for i<N) with the next operator's row dim — (size(L.ops[i+1],1), K) — and slot N with size(v), not (size(L.ops[N],1), K).

Effect:

For non-square ComposedOperators, the spec-vs-actual mismatch makes _cache_compatible reject the donor's cache → sharing never triggers for any non-square inner ops. The benchmark wins disappear in this regime.
For square inner ops, the bug is masked because size(op,1) == size(op,2). But that's also exactly where the eltype-mismatch bug bites (#1).

Better to have a single source of truth here — either compute shapes inside cache_self and stash them, or factor out a helper that both functions use, so they can't drift.

3. `FunctionOperator._get_cache_shapes` ignores `size(v,2)` and eltype

src/func.jl:593:

function _get_cache_shapes(L::FunctionOperator, v::AbstractVecOrMat)
    return (L.traits.sizes[1], L.traits.sizes[2])
end

For matrix input it returns single-column shapes. That'll always reject sharing for a matrix v, but could also falsely match against a same-vector-shaped donor and graft the wrong buffer. Should incorporate size(v,2) and output_eltype.

4. No device check in `_cache_compatible`

A CPU donor and GPU asker (or vice versa) with the same size(...) would silently graft a wrong-device cache. Unlikely to occur in normal usage but the API permits it.

5. Aliasing is brittle — worth a comment

After sharing, several L.ops[i].cache are ===. This works only because AddedOperator's mul! is strictly serial. Any future change that parallelizes L.ops evaluation (Threads, separate CUDA streams, async) creates data races on the shared buffers. A comment near the donor selection or near the cache field would help future maintainers not break this invariant.

6. Test coverage gap

The new testset at test/basic.jl:411 only uses square Float64 MatrixOperator(rand(N,N)) matrices and asserts L * u (the path that bypasses the cache). None of the failure modes above are covered. Suggest adding:

Mixed-eltype ComposedOperators (e.g. real + complex inner ops).
Non-square inner ops in ComposedOperator.
FunctionOperator inside an AddedOperator with both vector and matrix v.
mul!(w, L, v) and mul!(w, L, v, α, β) against the cached AddedOperator (not just *).
The benchmark in the PR description also only measures Base.summarysize; an in-place mul! benchmark would catch correctness regressions of this kind.

Suggested minimal fixes

Extend _cache_compatible (or _get_cache_shapes) to carry and check eltype.
Fix _get_cache_shapes(::ComposedOperator) so it matches what cache_self allocates, ideally by sharing a common helper.
Fix _get_cache_shapes(::FunctionOperator) to include size(v,2) and output_eltype.
Add the missing tests above so the eltype/non-square/in-place paths are covered.

Happy to take a closer look at a follow-up commit if useful. Reproducers I used:

repro scripts

# repro3.jl — confirms #1
using SciMLOperators, LinearAlgebra
Ar = MatrixOperator(rand(Float64, 4, 4));    Br = MatrixOperator(rand(Float64, 4, 4))
Ac = MatrixOperator(rand(ComplexF64, 4, 4)); Bc = MatrixOperator(rand(ComplexF64, 4, 4))
CR = Ar * Br; CC = Ac * Bc
L = CR + CC
v = rand(Float64, 4)
Ls = cache_operator(L, v)
@show map(eltype, Ls.ops[1].cache) map(eltype, Ls.ops[2].cache) Ls.ops[2].cache === Ls.ops[1].cache
w = similar(v, ComplexF64); fill!(w, 0)
mul!(w, Ls, v)   # InexactError

# repro.jl — confirms #2
using SciMLOperators
A = MatrixOperator(rand(5, 3)); B = MatrixOperator(rand(3, 4))
C = A * B; v = rand(4)
Cc = cache_operator(C, v)
@show SciMLOperators._get_cache_shapes(Cc, v)   # ((5,), (3,))
@show map(size, Cc.cache)                       # ((3,), (4,))

albertomercurio · 2026-05-19T02:27:17Z

Ok I should have fixed all the issues. I'm not sure about the FunctionOperator. But the others should be fixed

This was referenced May 6, 2026

Introduce SuperOperatorMatrixForm and add matrix_form argument qutip/QuantumToolbox.jl#707

Open

Suboptimal and bad performance in mesolve qutip/QuantumToolbox.jl#617

Open

albertomercurio and others added 4 commits May 17, 2026 14:16

Share same cache among similar operators in AddedOperator

a753731

Co-authored-by: Copilot <copilot@github.com>

Remove ishermitian method

3c8a124

Reuse _get_cache_shapes in cache_self

a86a6b3

Co-authored-by: Copilot <copilot@github.com>

Format code

d803c2f

albertomercurio force-pushed the master branch from c8dc18e to d803c2f Compare May 17, 2026 12:17

ChrisRackauckas-Claude suggested changes May 18, 2026

View reviewed changes

albertomercurio added 2 commits May 19, 2026 04:11

Fix issues

33017d5

Format code

39a6b9a

albertomercurio requested a review from ChrisRackauckas-Claude May 19, 2026 02:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Share same cache among similar operators in `AddedOperator`#370

Share same cache among similar operators in `AddedOperator`#370
albertomercurio wants to merge 6 commits into
SciML:masterfrom
albertomercurio:master

albertomercurio commented May 6, 2026

Uh oh!

albertomercurio commented May 17, 2026

Uh oh!

ChrisRackauckas-Claude left a comment

Uh oh!

albertomercurio commented May 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

albertomercurio commented May 6, 2026

Checklist

Additional context

Benchmarks

Uh oh!

albertomercurio commented May 17, 2026

Uh oh!

ChrisRackauckas-Claude left a comment

Choose a reason for hiding this comment

1. InexactError on mixed-eltype ComposedOperators in the same AddedOperator (correctness)

2. _get_cache_shapes(::ComposedOperator) is off-by-one (logic bug)

3. FunctionOperator._get_cache_shapes ignores size(v,2) and eltype

4. No device check in _cache_compatible

5. Aliasing is brittle — worth a comment

6. Test coverage gap

Suggested minimal fixes

Uh oh!

albertomercurio commented May 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

1. `InexactError` on mixed-eltype `ComposedOperator`s in the same `AddedOperator` (correctness)

2. `_get_cache_shapes(::ComposedOperator)` is off-by-one (logic bug)

3. `FunctionOperator._get_cache_shapes` ignores `size(v,2)` and eltype

4. No device check in `_cache_compatible`