ITensorMPS and CUDA.jl Duplication / Erroneous Memory Allocations in DMRG

Hello, 

I’m running a standard `dmrg` computation through `ITensorMPS` with the [CUDA.jl](http://cuda.jl/) backend, but the total memory allocation is greater than 2x what I see when running an identical code on CPU only. It seems like there are duplicate or erroneous allocations, but I haven’t been able to identify where.

During the `dmrg` computation, I get an Out of Memory Error: 

```
ERROR: LoadError: Out of GPU memory trying to allocate 1000.814 MiB
Effective GPU memory usage: 99.97% (39.481 GiB/39.494 GiB)
Memory pool usage: 35.356 GiB (38.969 GiB reserved)
```

I’m running the computation with a 121 site MPS (physical dimension of 2) and I incrementally increase the MaxBond dimension at each sweep iteration. At the point of failure, the MPS has an approximate memory footprint of 2 GB and an MPO of around 11 MB. Looking through the ITensorMPS code, I found this note

https://github.com/ITensor/ITensorMPS.jl/blob/509e11efc232885262f5ae199cb4b64c6e56d498/src/dmrg.jl#L252

Note, I run the same code using the CPU without issue and the peak memory footprint is 21.6 GB. 

I’m running this on a Perlmutter GPU node (see https://docs.nersc.gov/systems/perlmutter/architecture/) with a single NVIDIA A100-SXM4-40GB. 

Here is the stacktrace for reference: 

```
Stacktrace:
  [1] _pool_alloc
    @ ~/.julia/packages/CUDA/724Sm/src/memory.jl:666 [inlined]
  [2] macro expansion
    @ ~/.julia/packages/CUDA/724Sm/src/memory.jl:623 [inlined]
  [3] macro expansion
    @ ./timing.jl:461 [inlined]
  [4] pool_alloc
    @ ~/.julia/packages/CUDA/724Sm/src/memory.jl:622 [inlined]
  [5] (::CUDA.var"#650#651"{CUDA.DeviceMemory, Int64})()
    @ CUDA ~/.julia/packages/CUDA/724Sm/src/array.jl:92
  [6] cached_alloc(f::CUDA.var"#650#651"{CUDA.DeviceMemory, Int64}, key::Tuple{UnionAll, CuDevice, DataType, Int64})
    @ GPUArrays ~/.julia/packages/GPUArrays/3a5jB/src/host/alloc_cache.jl:36
  [7] CuArray{Float64, 5, CUDA.DeviceMemory}(::UndefInitializer, dims::NTuple{5, Int64})
    @ CUDA ~/.julia/packages/CUDA/724Sm/src/array.jl:91
  [8] similar
    @ ~/.julia/packages/CUDA/724Sm/src/array.jl:186 [inlined]
  [9] permutedims(B::CuArray{Float64, 5, CUDA.DeviceMemory}, perm::NTuple{5, Int64})
    @ Base ./multidimensional.jl:1674
 [10] permutedims
    @ ~/.julia/packages/NDTensors/WbqtM/src/lib/Expose/src/functions/permutedims.jl:2 [inlined]
 [11] _contract!(CT::CuArray{Float64, 5, CUDA.DeviceMemory}, AT::CuArray{Float64, 5, CUDA.DeviceMemory}, BT::CuArray{Float64, 4, CUDA.DeviceMemory}, props::NDTensors.ContractionProperties{5, 4, 5}, α::Bool, β::Bool)
    @ NDTensors ~/.julia/packages/NDTensors/WbqtM/src/abstractarray/tensoralgebra/contract.jl:126
 [12] _contract!(CT::NDTensors.DenseTensor{Float64, 5, NTuple{5, Index{Int64}}, NDTensors.Dense{Float64, CuArray{Float64, 1, CUDA.DeviceMemory}}}, AT::NDTensors.DenseTensor{Float64, 5, NTuple{5, Index{Int64}}, NDTensors.Dense{Float64, CuArray{Float64, 1, CUDA.DeviceMemory}}}, BT::NDTensors.DenseTensor{Float64, 4, NTuple{4, Index{Int64}}, NDTensors.Dense{Float64, CuArray{Float64, 1, CUDA.DeviceMemory}}}, props::NDTensors.ContractionProperties{5, 4, 5}, α::Bool, β::Bool)
    @ NDTensors ~/.julia/packages/NDTensors/WbqtM/src/dense/tensoralgebra/contract.jl:230
 [13] contract!
    @ ~/.julia/packages/NDTensors/WbqtM/src/dense/tensoralgebra/contract.jl:213 [inlined]
 [14] contract! (repeats 2 times)
    @ ~/.julia/packages/NDTensors/WbqtM/src/tensoroperations/generic_tensor_operations.jl:166 [inlined]
 [15] _contract!!(output_tensor::NDTensors.DenseTensor{Float64, 5, NTuple{5, Index{Int64}}, NDTensors.Dense{Float64, CuArray{Float64, 1, CUDA.DeviceMemory}}}, labelsoutput_tensor::NTuple{5, Int64}, tensor1::NDTensors.DenseTensor{Float64, 5, NTuple{5, Index{Int64}}, NDTensors.Dense{Float64, CuArray{Float64, 1, CUDA.DeviceMemory}}}, labelstensor1::NTuple{5, Int64}, tensor2::NDTensors.DenseTensor{Float64, 4, NTuple{4, Index{Int64}}, NDTensors.Dense{Float64, CuArray{Float64, 1, CUDA.DeviceMemory}}}, labelstensor2::NTuple{4, Int64}, α::Int64, β::Int64)
    @ NDTensors ~/.julia/packages/NDTensors/WbqtM/src/tensoroperations/generic_tensor_operations.jl:144
 [16] _contract!!
    @ ~/.julia/packages/NDTensors/WbqtM/src/tensoroperations/generic_tensor_operations.jl:132 [inlined]
 [17] contract!!
    @ ~/.julia/packages/NDTensors/WbqtM/src/tensoroperations/generic_tensor_operations.jl:220 [inlined]
 [18] contract!!
    @ ~/.julia/packages/NDTensors/WbqtM/src/tensoroperations/generic_tensor_operations.jl:189 [inlined]
 [19] contract(tensor1::NDTensors.DenseTensor{Float64, 5, NTuple{5, Index{Int64}}, NDTensors.Dense{Float64, CuArray{Float64, 1, CUDA.DeviceMemory}}}, labelstensor1::NTuple{5, Int64}, tensor2::NDTensors.DenseTensor{Float64, 4, NTuple{4, Index{Int64}}, NDTensors.Dense{Float64, CuArray{Float64, 1, CUDA.DeviceMemory}}}, labelstensor2::NTuple{4, Int64}, labelsoutput_tensor::NTuple{5, Int64})
    @ NDTensors ~/.julia/packages/NDTensors/WbqtM/src/tensoroperations/generic_tensor_operations.jl:113
 [20] contract(::Type{NDTensors.CanContract{NDTensors.DenseTensor{Float64, 5, NTuple{5, Index{Int64}}, NDTensors.Dense{Float64, CuArray{Float64, 1, CUDA.DeviceMemory}}}, NDTensors.DenseTensor{Float64, 4, NTuple{4, Index{Int64}}, NDTensors.Dense{Float64, CuArray{Float64, 1, CUDA.DeviceMemory}}}}}, tensor1::NDTensors.DenseTensor{Float64, 5, NTuple{5, Index{Int64}}, NDTensors.Dense{Float64, CuArray{Float64, 1, CUDA.DeviceMemory}}}, labels_tensor1::NTuple{5, Int64}, tensor2::NDTensors.DenseTensor{Float64, 4, NTuple{4, Index{Int64}}, NDTensors.Dense{Float64, CuArray{Float64, 1, CUDA.DeviceMemory}}}, labels_tensor2::NTuple{4, Int64})
    @ NDTensors ~/.julia/packages/NDTensors/WbqtM/src/tensoroperations/generic_tensor_operations.jl:91
 [21] contract
    @ ~/.julia/packages/SimpleTraits/7VJph/src/SimpleTraits.jl:332 [inlined]
 [22] _contract(A::NDTensors.DenseTensor{Float64, 5, NTuple{5, Index{Int64}}, NDTensors.Dense{Float64, CuArray{Float64, 1, CUDA.DeviceMemory}}}, B::NDTensors.DenseTensor{Float64, 4, NTuple{4, Index{Int64}}, NDTensors.Dense{Float64, CuArray{Float64, 1, CUDA.DeviceMemory}}})
    @ ITensors ~/.julia/packages/ITensors/VuQ3D/src/tensor_operations/tensor_algebra.jl:3
 [23] _contract(A::ITensor, B::ITensor)
    @ ITensors ~/.julia/packages/ITensors/VuQ3D/src/tensor_operations/tensor_algebra.jl:9
 [24] contract(A::ITensor, B::ITensor)
    @ ITensors ~/.julia/packages/ITensors/VuQ3D/src/tensor_operations/tensor_algebra.jl:76
 [25] *
    @ ~/.julia/packages/ITensors/VuQ3D/src/tensor_operations/tensor_algebra.jl:63 [inlined]
 [26] contract(P::ProjMPO, v::ITensor)
    @ ITensorMPS ~/.julia/packages/ITensorMPS/tnopq/src/abstractprojmpo/abstractprojmpo.jl:51
 [27] product(P::ProjMPO, v::ITensor)
    @ ITensorMPS ~/.julia/packages/ITensorMPS/tnopq/src/abstractprojmpo/abstractprojmpo.jl:71
 [28] AbstractProjMPO
    @ ~/.julia/packages/ITensorMPS/tnopq/src/abstractprojmpo/abstractprojmpo.jl:87 [inlined]
 [29] apply
    @ ~/.julia/packages/KrylovKit/ZcdRg/src/apply.jl:2 [inlined]
 [30] apply_scalartype
    @ ~/.julia/packages/KrylovKit/ZcdRg/src/apply.jl:35 [inlined]
 [31] #eigsolve#51
    @ ~/.julia/packages/KrylovKit/ZcdRg/src/eigsolve/eigsolve.jl:210 [inlined]
 [32] eigsolve
    @ ~/.julia/packages/KrylovKit/ZcdRg/src/eigsolve/eigsolve.jl:209 [inlined]
 [33] macro expansion
    @ ~/.julia/packages/ITensorMPS/tnopq/src/dmrg.jl:238 [inlined]
 [34] macro expansion
    @ ./timing.jl:461 [inlined]
 [35] dmrg(PH::ProjMPO, psi0::MPS, sweeps::Sweeps; which_decomp::Nothing, svd_alg::Nothing, observer::DemoObserver, outputlevel::Int64, write_when_maxdim_exceeds::Nothing, write_path::String, eigsolve_tol::Float64, eigsolve_krylovdim::Int64, eigsolve_maxiter::Int64, eigsolve_verbosity::Int64, eigsolve_which_eigenvalue::Symbol, ishermitian::Bool)
    @ ITensorMPS ~/.julia/packages/ITensorMPS/tnopq/src/dmrg.jl:205
 [36] dmrg
    @ ~/.julia/packages/ITensorMPS/tnopq/src/dmrg.jl:157 [inlined]
 [37] #dmrg#606
    @ ~/.julia/packages/ITensorMPS/tnopq/src/dmrg.jl:28 [inlined]
 [38] dmrg
    @ ~/.julia/packages/ITensorMPS/tnopq/src/dmrg.jl:21 [inlined]
 [39] #dmrg#612
    @ ~/.julia/packages/ITensorMPS/tnopq/src/dmrg.jl:396 [inlined]
 [40] dmrg
    @ ~/.julia/packages/ITensorMPS/tnopq/src/dmrg.jl:386 [inlined]
 [41] dmrg_gpu(H::MPO, psi0::MPS; nsweeps::Int64, cutoff::Float64, observer::DemoObserver, outputlevel::Int64, maxbonddim::Int64)
    @ Main /scratch/username/project//src/utils.jl:113
 [42] dmrg_wrapper(Nx::Int64, Ny::Int64, sites::Vector{Index{Int64}}, H::MPO, maxbonddim::Int64; init_state::@NamedTuple{type::Symbol, bonddim::Int64}, nsweeps::Int64, cutoff::Float64, outputlevel::Int64, checkpoint_itr::Int64, load_checkpoint::Int64, GPU::Bool)
    @ Main /scratch/username/project//src/utils.jl:171
 [43] sim_wrapper(expt::Dict{Symbol, Any})
    @ Main /scratch/username/project/scripts/savemps.jl:26
 [44] (::var"#main##2#main##3")(k::Dict{Symbol, Any})
    @ Main ~/.julia/packages/DrWatson/2QF5p/src/saving_files.jl:177
 [45] produce_or_load(f::var"#main##2#main##3", config::Dict{Symbol, Any}, path::String; suffix::String, prefix::String, tag::Bool, gitpath::String, loadfile::Bool, storepatch::Bool, force::Bool, verbose::Bool, wsave_kwargs::@NamedTuple{}, wload_kwargs::@NamedTuple{}, filename::Nothing, kwargs::@Kwargs{})
    @ DrWatson ~/.julia/packages/DrWatson/2QF5p/src/saving_files.jl:108
 [46] macro expansion
    @ ~/.julia/packages/DrWatson/2QF5p/src/saving_files.jl:176 [inlined]
 [47] macro expansion
    @ /scratch/username/project/scripts/savemps.jl:88 [inlined]
 [48] macro expansion
    @ ./timing.jl:689 [inlined]
 [49] main()
    @ Main /scratch/username/project/scripts/savemps.jl:86
 [50] top-level scope
    @ /scratch/username/project/scripts/savemps.jl:97
 [51] include(mod::Module, _path::String)
    @ Base ./Base.jl:306
 [52] exec_options(opts::Base.JLOptions)
    @ Base ./client.jl:317
in expression starting at /scratch/username/project/scripts/savemps.jl:97
srun: error: nid001236: task 0: Exited with exit code 1
srun: Terminating StepId=50150823.1
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ITensorMPS and CUDA.jl Duplication / Erroneous Memory Allocations in DMRG #3115

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

ITensorMPS and CUDA.jl Duplication / Erroneous Memory Allocations in DMRG #3115

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions