I’m running the computation with a 121 site MPS (physical dimension of 2) and I incrementally increase the MaxBond dimension at each sweep iteration. At the point of failure, the MPS has an approximate memory footprint of 2 GB and an MPO of around 11 MB. Looking through the ITensorMPS code, I found this note
Note, I run the same code using the CPU without issue and the peak memory footprint is 21.6 GB.
Stacktrace:
[1] _pool_alloc
@ ~/.julia/packages/CUDA/724Sm/src/memory.jl:666 [inlined]
[2] macro expansion
@ ~/.julia/packages/CUDA/724Sm/src/memory.jl:623 [inlined]
[3] macro expansion
@ ./timing.jl:461 [inlined]
[4] pool_alloc
@ ~/.julia/packages/CUDA/724Sm/src/memory.jl:622 [inlined]
[5] (::CUDA.var"#650#651"{CUDA.DeviceMemory, Int64})()
@ CUDA ~/.julia/packages/CUDA/724Sm/src/array.jl:92
[6] cached_alloc(f::CUDA.var"#650#651"{CUDA.DeviceMemory, Int64}, key::Tuple{UnionAll, CuDevice, DataType, Int64})
@ GPUArrays ~/.julia/packages/GPUArrays/3a5jB/src/host/alloc_cache.jl:36
[7] CuArray{Float64, 5, CUDA.DeviceMemory}(::UndefInitializer, dims::NTuple{5, Int64})
@ CUDA ~/.julia/packages/CUDA/724Sm/src/array.jl:91
[8] similar
@ ~/.julia/packages/CUDA/724Sm/src/array.jl:186 [inlined]
[9] permutedims(B::CuArray{Float64, 5, CUDA.DeviceMemory}, perm::NTuple{5, Int64})
@ Base ./multidimensional.jl:1674
[10] permutedims
@ ~/.julia/packages/NDTensors/WbqtM/src/lib/Expose/src/functions/permutedims.jl:2 [inlined]
[11] _contract!(CT::CuArray{Float64, 5, CUDA.DeviceMemory}, AT::CuArray{Float64, 5, CUDA.DeviceMemory}, BT::CuArray{Float64, 4, CUDA.DeviceMemory}, props::NDTensors.ContractionProperties{5, 4, 5}, α::Bool, β::Bool)
@ NDTensors ~/.julia/packages/NDTensors/WbqtM/src/abstractarray/tensoralgebra/contract.jl:126
[12] _contract!(CT::NDTensors.DenseTensor{Float64, 5, NTuple{5, Index{Int64}}, NDTensors.Dense{Float64, CuArray{Float64, 1, CUDA.DeviceMemory}}}, AT::NDTensors.DenseTensor{Float64, 5, NTuple{5, Index{Int64}}, NDTensors.Dense{Float64, CuArray{Float64, 1, CUDA.DeviceMemory}}}, BT::NDTensors.DenseTensor{Float64, 4, NTuple{4, Index{Int64}}, NDTensors.Dense{Float64, CuArray{Float64, 1, CUDA.DeviceMemory}}}, props::NDTensors.ContractionProperties{5, 4, 5}, α::Bool, β::Bool)
@ NDTensors ~/.julia/packages/NDTensors/WbqtM/src/dense/tensoralgebra/contract.jl:230
[13] contract!
@ ~/.julia/packages/NDTensors/WbqtM/src/dense/tensoralgebra/contract.jl:213 [inlined]
[14] contract! (repeats 2 times)
@ ~/.julia/packages/NDTensors/WbqtM/src/tensoroperations/generic_tensor_operations.jl:166 [inlined]
[15] _contract!!(output_tensor::NDTensors.DenseTensor{Float64, 5, NTuple{5, Index{Int64}}, NDTensors.Dense{Float64, CuArray{Float64, 1, CUDA.DeviceMemory}}}, labelsoutput_tensor::NTuple{5, Int64}, tensor1::NDTensors.DenseTensor{Float64, 5, NTuple{5, Index{Int64}}, NDTensors.Dense{Float64, CuArray{Float64, 1, CUDA.DeviceMemory}}}, labelstensor1::NTuple{5, Int64}, tensor2::NDTensors.DenseTensor{Float64, 4, NTuple{4, Index{Int64}}, NDTensors.Dense{Float64, CuArray{Float64, 1, CUDA.DeviceMemory}}}, labelstensor2::NTuple{4, Int64}, α::Int64, β::Int64)
@ NDTensors ~/.julia/packages/NDTensors/WbqtM/src/tensoroperations/generic_tensor_operations.jl:144
[16] _contract!!
@ ~/.julia/packages/NDTensors/WbqtM/src/tensoroperations/generic_tensor_operations.jl:132 [inlined]
[17] contract!!
@ ~/.julia/packages/NDTensors/WbqtM/src/tensoroperations/generic_tensor_operations.jl:220 [inlined]
[18] contract!!
@ ~/.julia/packages/NDTensors/WbqtM/src/tensoroperations/generic_tensor_operations.jl:189 [inlined]
[19] contract(tensor1::NDTensors.DenseTensor{Float64, 5, NTuple{5, Index{Int64}}, NDTensors.Dense{Float64, CuArray{Float64, 1, CUDA.DeviceMemory}}}, labelstensor1::NTuple{5, Int64}, tensor2::NDTensors.DenseTensor{Float64, 4, NTuple{4, Index{Int64}}, NDTensors.Dense{Float64, CuArray{Float64, 1, CUDA.DeviceMemory}}}, labelstensor2::NTuple{4, Int64}, labelsoutput_tensor::NTuple{5, Int64})
@ NDTensors ~/.julia/packages/NDTensors/WbqtM/src/tensoroperations/generic_tensor_operations.jl:113
[20] contract(::Type{NDTensors.CanContract{NDTensors.DenseTensor{Float64, 5, NTuple{5, Index{Int64}}, NDTensors.Dense{Float64, CuArray{Float64, 1, CUDA.DeviceMemory}}}, NDTensors.DenseTensor{Float64, 4, NTuple{4, Index{Int64}}, NDTensors.Dense{Float64, CuArray{Float64, 1, CUDA.DeviceMemory}}}}}, tensor1::NDTensors.DenseTensor{Float64, 5, NTuple{5, Index{Int64}}, NDTensors.Dense{Float64, CuArray{Float64, 1, CUDA.DeviceMemory}}}, labels_tensor1::NTuple{5, Int64}, tensor2::NDTensors.DenseTensor{Float64, 4, NTuple{4, Index{Int64}}, NDTensors.Dense{Float64, CuArray{Float64, 1, CUDA.DeviceMemory}}}, labels_tensor2::NTuple{4, Int64})
@ NDTensors ~/.julia/packages/NDTensors/WbqtM/src/tensoroperations/generic_tensor_operations.jl:91
[21] contract
@ ~/.julia/packages/SimpleTraits/7VJph/src/SimpleTraits.jl:332 [inlined]
[22] _contract(A::NDTensors.DenseTensor{Float64, 5, NTuple{5, Index{Int64}}, NDTensors.Dense{Float64, CuArray{Float64, 1, CUDA.DeviceMemory}}}, B::NDTensors.DenseTensor{Float64, 4, NTuple{4, Index{Int64}}, NDTensors.Dense{Float64, CuArray{Float64, 1, CUDA.DeviceMemory}}})
@ ITensors ~/.julia/packages/ITensors/VuQ3D/src/tensor_operations/tensor_algebra.jl:3
[23] _contract(A::ITensor, B::ITensor)
@ ITensors ~/.julia/packages/ITensors/VuQ3D/src/tensor_operations/tensor_algebra.jl:9
[24] contract(A::ITensor, B::ITensor)
@ ITensors ~/.julia/packages/ITensors/VuQ3D/src/tensor_operations/tensor_algebra.jl:76
[25] *
@ ~/.julia/packages/ITensors/VuQ3D/src/tensor_operations/tensor_algebra.jl:63 [inlined]
[26] contract(P::ProjMPO, v::ITensor)
@ ITensorMPS ~/.julia/packages/ITensorMPS/tnopq/src/abstractprojmpo/abstractprojmpo.jl:51
[27] product(P::ProjMPO, v::ITensor)
@ ITensorMPS ~/.julia/packages/ITensorMPS/tnopq/src/abstractprojmpo/abstractprojmpo.jl:71
[28] AbstractProjMPO
@ ~/.julia/packages/ITensorMPS/tnopq/src/abstractprojmpo/abstractprojmpo.jl:87 [inlined]
[29] apply
@ ~/.julia/packages/KrylovKit/ZcdRg/src/apply.jl:2 [inlined]
[30] apply_scalartype
@ ~/.julia/packages/KrylovKit/ZcdRg/src/apply.jl:35 [inlined]
[31] #eigsolve#51
@ ~/.julia/packages/KrylovKit/ZcdRg/src/eigsolve/eigsolve.jl:210 [inlined]
[32] eigsolve
@ ~/.julia/packages/KrylovKit/ZcdRg/src/eigsolve/eigsolve.jl:209 [inlined]
[33] macro expansion
@ ~/.julia/packages/ITensorMPS/tnopq/src/dmrg.jl:238 [inlined]
[34] macro expansion
@ ./timing.jl:461 [inlined]
[35] dmrg(PH::ProjMPO, psi0::MPS, sweeps::Sweeps; which_decomp::Nothing, svd_alg::Nothing, observer::DemoObserver, outputlevel::Int64, write_when_maxdim_exceeds::Nothing, write_path::String, eigsolve_tol::Float64, eigsolve_krylovdim::Int64, eigsolve_maxiter::Int64, eigsolve_verbosity::Int64, eigsolve_which_eigenvalue::Symbol, ishermitian::Bool)
@ ITensorMPS ~/.julia/packages/ITensorMPS/tnopq/src/dmrg.jl:205
[36] dmrg
@ ~/.julia/packages/ITensorMPS/tnopq/src/dmrg.jl:157 [inlined]
[37] #dmrg#606
@ ~/.julia/packages/ITensorMPS/tnopq/src/dmrg.jl:28 [inlined]
[38] dmrg
@ ~/.julia/packages/ITensorMPS/tnopq/src/dmrg.jl:21 [inlined]
[39] #dmrg#612
@ ~/.julia/packages/ITensorMPS/tnopq/src/dmrg.jl:396 [inlined]
[40] dmrg
@ ~/.julia/packages/ITensorMPS/tnopq/src/dmrg.jl:386 [inlined]
[41] dmrg_gpu(H::MPO, psi0::MPS; nsweeps::Int64, cutoff::Float64, observer::DemoObserver, outputlevel::Int64, maxbonddim::Int64)
@ Main /scratch/username/project//src/utils.jl:113
[42] dmrg_wrapper(Nx::Int64, Ny::Int64, sites::Vector{Index{Int64}}, H::MPO, maxbonddim::Int64; init_state::@NamedTuple{type::Symbol, bonddim::Int64}, nsweeps::Int64, cutoff::Float64, outputlevel::Int64, checkpoint_itr::Int64, load_checkpoint::Int64, GPU::Bool)
@ Main /scratch/username/project//src/utils.jl:171
[43] sim_wrapper(expt::Dict{Symbol, Any})
@ Main /scratch/username/project/scripts/savemps.jl:26
[44] (::var"#main##2#main##3")(k::Dict{Symbol, Any})
@ Main ~/.julia/packages/DrWatson/2QF5p/src/saving_files.jl:177
[45] produce_or_load(f::var"#main##2#main##3", config::Dict{Symbol, Any}, path::String; suffix::String, prefix::String, tag::Bool, gitpath::String, loadfile::Bool, storepatch::Bool, force::Bool, verbose::Bool, wsave_kwargs::@NamedTuple{}, wload_kwargs::@NamedTuple{}, filename::Nothing, kwargs::@Kwargs{})
@ DrWatson ~/.julia/packages/DrWatson/2QF5p/src/saving_files.jl:108
[46] macro expansion
@ ~/.julia/packages/DrWatson/2QF5p/src/saving_files.jl:176 [inlined]
[47] macro expansion
@ /scratch/username/project/scripts/savemps.jl:88 [inlined]
[48] macro expansion
@ ./timing.jl:689 [inlined]
[49] main()
@ Main /scratch/username/project/scripts/savemps.jl:86
[50] top-level scope
@ /scratch/username/project/scripts/savemps.jl:97
[51] include(mod::Module, _path::String)
@ Base ./Base.jl:306
[52] exec_options(opts::Base.JLOptions)
@ Base ./client.jl:317
in expression starting at /scratch/username/project/scripts/savemps.jl:97
srun: error: nid001236: task 0: Exited with exit code 1
srun: Terminating StepId=50150823.1
Hello,
I’m running a standard
dmrgcomputation throughITensorMPSwith the CUDA.jl backend, but the total memory allocation is greater than 2x what I see when running an identical code on CPU only. It seems like there are duplicate or erroneous allocations, but I haven’t been able to identify where.During the
dmrgcomputation, I get an Out of Memory Error:I’m running the computation with a 121 site MPS (physical dimension of 2) and I incrementally increase the MaxBond dimension at each sweep iteration. At the point of failure, the MPS has an approximate memory footprint of 2 GB and an MPO of around 11 MB. Looking through the ITensorMPS code, I found this note
https://github.com/ITensor/ITensorMPS.jl/blob/509e11efc232885262f5ae199cb4b64c6e56d498/src/dmrg.jl#L252
Note, I run the same code using the CPU without issue and the peak memory footprint is 21.6 GB.
I’m running this on a Perlmutter GPU node (see https://docs.nersc.gov/systems/perlmutter/architecture/) with a single NVIDIA A100-SXM4-40GB.
Here is the stacktrace for reference: