Better interface for `runcircuit` on GPUs #274

ghost · 2022-04-02T14:00:14Z

I would like to experiment with the speedup gained when running PastaQ on a GPU, however it seems that the GPU example provided is out of date and does not run, please could this be updated at some point to demonstrate the basic usage.

mtfishman · 2022-04-02T15:56:18Z

That would be great if you could try running it on GPU, I'm curious if circuit evolution gets any speedups.

I just updated the example (https://github.com/GTorlai/PastaQ.jl/blob/master/examples/gpu/2_randomcircuit.jl), please try running it and let me know if there are any issues and also please post any benchmark results that you get (it would be great to know whether or not it is faster on GPU).

ghost · 2022-04-03T11:22:54Z

Thanks for doing this, I will leave the issue open and update you later what I find.

ghost · 2022-04-06T20:38:22Z

Starting with the GPU example provided, I fixed the depth of the random circuit to equal the number of qubits (n qubits and n layers), and then measured the execution time with the standard code on a CPU (Intel Xeon Silver 4114) vs with the ITensorGPU version (on a Nvidia Tesla V100). It seems like for this circuit structure you gain an advantage using the GPU around 15 qubits, which is roughly what I would have expected.

mtfishman · 2022-04-06T20:47:33Z

Cool, thanks! Great to see that there is a speedup.

ghost · 2022-04-06T20:54:25Z

I have some Hilbert-Schmidt Inner products to compute between unitaries on reasonably large systems I would also like to try speeding up with the GPU, however I haven't managed to successfully convert it in the same manner.

A small toy example (hopefully correct) would be:

using ITensors
using PastaQ

function U_gates(p)
    gates = Tuple[]
    gates = vcat(gates, ("Rxx", (1, 2), (φ = p,)))
    return gates
end

function V_gates()
    gates = Tuple[]
    gates = vcat(gates, ("H", 1))
    gates = vcat(gates, ("CX", (1, 2)))
    return gates
end

N = 2
hilbert = qubits(N)
U = runcircuit(hilbert, U_gates(0.1); process=true)
V = runcircuit(hilbert, V_gates(); process=true)
H = 1 - (2^(-2.0^N))*abs2(inner(U, V))

println(H)

In the GPU example provided, 'productstate(N)' is moved onto the GPU, however I tried doing the same with 'qubits(N)' but this didn't seem to match the convert_eltype function.

mtfishman · 2022-04-06T20:57:11Z

Yeah, unfortunately now you will have to define the product state and move it to GPU yourself, since PastaQ isn't really "GPU aware". We will have to think of an interface in PastaQ for specifying that the circuit evolution should be performed on GPU.

mtfishman · 2022-04-06T21:31:06Z

I guess an issue in the example you show is also that you would need to move the gates to GPU yourself as well... There are definitely some improvements we could make on the PastaQ side to make running circuits on GPU more convenient. Does it work to build the product state and gates and move them to GPU yourself for that case? Giacomo and I will have to discuss a good interface for running circuits on GPU.

ghost · 2022-04-07T19:36:23Z

I tried to do that like this but I didn't get it working, do you have an alternate suggestion?

using ITensors
using PastaQ

using CUDA: allowscalar
allowscalar(false)

using ITensors.NDTensors: tensor

function convert_eltype(ElType::Type, T::ITensor)
    if eltype(T) == ElType
        return T
    end
    return itensor(ElType.(tensor(T)))
end

device = cu
device_eltype = ComplexF64

function U_gates(p)
    gates = Tuple[]
    gates = vcat(gates, ("Rxx", (1, 2), (φ = p,)))
    return gates
end

function V_gates()
    gates = Tuple[]
    gates = vcat(gates, ("H", 1))
    gates = vcat(gates, ("CX", (1, 2)))
    return gates
end

N = 2
hilbert_cpu = qubits(N)
hilbert = device(convert_eltype.(device_eltype, hilbert_cpu))

U_tensors = map(gate_layer -> device.(convert_eltype.(device_eltype, gate_layer)), build circuit(hilbert, U_gates(0.1)))
V_tensors = map(gate_layer -> device.(convert_eltype.(device_eltype, gate_layer)), build circuit(hilbert, V_gates()))

U = runcircuit(hilbert, U_tensors; process=true)
V = runcircuit(hilbert, V_tensors; process=true)
H = 1 - (2^(-2.0^N))*abs2(inner(U, V))

println(H)

This gave the error

mtfishman · 2022-04-07T19:55:40Z

Try:

using ITensors
using ITensorGPU
using PastaQ

using CUDA: allowscalar
allowscalar(false)

using ITensors.NDTensors: tensor

function convert_eltype(ElType::Type, T::ITensor)
    if eltype(T) == ElType
        return T
    end
    return itensor(ElType.(tensor(T)))
end

device = cu
device_eltype = ComplexF64

function U_gates(p)
    gates = Tuple[]
    gates = vcat(gates, ("Rxx", (1, 2), (ϕ = p,)))
    return gates
end

function V_gates()
    gates = Tuple[]
    gates = vcat(gates, ("H", 1))
    gates = vcat(gates, ("CX", (1, 2)))
    return gates
end

N = 2
state_cpu = productstate(qubits(N))
state = device(convert_eltype.(device_eltype, state_cpu))

U_tensors = map(gate -> device(convert_eltype(device_eltype, gate)), buildcircuit(state, U_gates(0.1)))
V_tensors = map(gate -> device(convert_eltype(device_eltype, gate)), buildcircuit(state, V_gates()))

U = runcircuit(state, U_tensors; process=true)
V = runcircuit(state, V_tensors; process=true)
H = 1 - (2^(-2.0^N))*abs2(inner(U, V))

println(H)

The reason for your error is that qubits(N) just defines the Hilbert space (the site indices), not the actual tensors, which you were then trying to transfer to GPU. Only the tensor data itself gets sent to the GPU, so you first need to make the state with productstate and then transfer the state to GPU. There were also a few other issues I fixed in that code.

Ultimately I think we should have a function like cu_runcircuit so that you could just run your original script and it would initialize the data on GPU for you.

ghost · 2022-04-09T14:00:12Z

Using this code it does successfully run on the GPU and complete without error. I do think however that inputting qubits(N) into productstate() might be computing the wrong quantity, as when using the standard CPU code, the printed output changes value when switching to productstate(qubits(N)) from qubits(N).
Doesn't using productstate() mean we will be computing an inner product with the state/MPS U|0> rather than the unitary/MPO U?

mtfishman · 2022-04-09T16:18:51Z

Ah, I didn't appreciate that you were trying to evolve an operator. Probably you want to use productoperator(qubits(N)) which makes an identity MPO.

ghost · 2022-04-11T18:27:22Z

Using productoperator(qubits(N)) gives an incorrect answer, it outputs roughly 1e-15 rather than the correct answer of around 0.8...
I will keep trying other methods, but thank you for your help with this problem.

mtfishman · 2022-04-11T18:29:59Z

Yeah, I think I would need more details of the actual computation you are trying to do in order to help more. You may need to dig into the runcircuit function to try to reproduce whatever starting state/process it is creating for you and make it yourself so you can transfer it to GPU.

mtfishman · 2022-04-14T15:45:23Z

@JoeGibbs88 please take a look at #282. I added an interface where you can specify device=cu in runcircuit to run on GPU.

Additionally you can specify eltype=Float32 or eltype=ComplexF32 since oftentimes single precision calculations are faster on GPU.

Could you test out the branch runcircuit_gpu for your example and see if it works?

mtfishman · 2022-04-15T13:18:51Z

@JoeGibbs88 #282 is merged and will be available soon in PastaQ 0.0.22, please try updating to the latest version of PastaQ and try out the new device and eltype keyword arguments in runcircuit.

ghost · 2022-04-27T17:09:18Z

@mtfishman thanks for updating the interface, apologies for the late reply. I have tried using the new syntax to run the code I posted previously, however it errored out when running on the GPU.

Code:

using ITensorGPU
using PastaQ

#using CUDA: allowscalar
#allowscalar(false)

function U_gates(p)
    gates = Tuple[]
    gates = vcat(gates, ("Rxx", (1, 2), (φ = p,)))
    return gates
end

function V_gates()
    gates = Tuple[]
    gates = vcat(gates, ("H", 1))
    gates = vcat(gates, ("CX", (1, 2)))
    return gates
end

N = 2
hilbert = qubits(N)
U = runcircuit(hilbert, U_gates(0.1); device=cu, process=true)
V = runcircuit(hilbert, V_gates(); device=cu, process=true)
H = 1 - (2^(-2.0^N))*abs2(inner(U, V))

println(H)

(I wasn't sure if the allowscalar lines were needed but it didn't change the error)

Error:

ERROR: LoadError: Scalar indexing is disallowed.
Invocation of getindex resulted in scalar indexing of a GPU array.
This is typically caused by calling an iterating implementation of a method.
Such implementations *do not* execute on the GPU, but very slowly on the CPU,
and therefore are only permitted from the REPL for prototyping purposes.
If you did intend to index this array, annotate the caller with @allowscalar.
Stacktrace:
 [1] error(s::String)
   @ Base ./error.jl:33
 [2] assertscalar(op::String)
   @ GPUArrays ~/.julia/packages/GPUArrays/Zecv7/src/host/indexing.jl:53
 [3] getindex(xs::CuArray{ComplexF64, 4, CUDA.Mem.DeviceBuffer}, I::Int64)
   @ GPUArrays ~/.julia/packages/GPUArrays/Zecv7/src/host/indexing.jl:86
 [4] getindex
   @ ~/.julia/packages/Strided/7AzbR/src/stridedview.jl:86 [inlined]
 [5] macro expansion
   @ ~/.julia/packages/Strided/7AzbR/src/mapreduce.jl:304 [inlined]
 [6] macro expansion
   @ ./simdloop.jl:77 [inlined]
 [7] macro expansion
   @ ~/.julia/packages/Strided/7AzbR/src/mapreduce.jl:303 [inlined]
 [8] _mapreduce_kernel!(f::Strided.CaptureArgs{typeof(identity), Tuple{Strided.Arg}}, op::Nothing, initop::Nothing, dims::NTuple{4, Int64}, blocks::NTuple{4, Int64}, arrays::Tuple{Strided.StridedView{ComplexF64, 4, CuArray{ComplexF64, 4, CUDA.Mem.DeviceBuffer}, typeof(identity)}, Strided.StridedView{ComplexF64, 4, CuArray{ComplexF64, 4, CUDA.Mem.DeviceBuffer}, typeof(identity)}}, strides::Tuple{NTuple{4, Int64}, NTuple{4, Int64}}, offsets::Tuple{Int64, Int64})
   @ Strided ~/.julia/packages/Strided/7AzbR/src/mapreduce.jl:205
 [9] _mapreduce_block!(f::Any, op::Any, initop::Any, dims::NTuple{4, Int64}, strides::Tuple{NTuple{4, Int64}, NTuple{4, Int64}}, offsets::Tuple{Int64, Int64}, costs::NTuple{4, Int64}, arrays::Tuple{Strided.StridedView{ComplexF64, 4, CuArray{ComplexF64, 4, CUDA.Mem.DeviceBuffer}, typeof(identity)}, Strided.StridedView{ComplexF64, 4, CuArray{ComplexF64, 4, CUDA.Mem.DeviceBuffer}, typeof(identity)}})
   @ Strided ~/.julia/packages/Strided/7AzbR/src/mapreduce.jl:135
[10] _mapreduce_order!(f::Any, op::Any, initop::Any, dims::NTuple{4, Int64}, strides::Tuple{NTuple{4, Int64}, NTuple{4, Int64}}, arrays::Tuple{Strided.StridedView{ComplexF64, 4, CuArray{ComplexF64, 4, CUDA.Mem.DeviceBuffer}, typeof(identity)}, Strided.StridedView{ComplexF64, 4, CuArray{ComplexF64, 4, CUDA.Mem.DeviceBuffer}, typeof(identity)}})
   @ Strided ~/.julia/packages/Strided/7AzbR/src/mapreduce.jl:120
[11] _mapreduce_fuse!(f::Any, op::Any, initop::Any, dims::NTuple{4, Int64}, arrays::Tuple{Strided.StridedView{ComplexF64, 4, CuArray{ComplexF64, 4, CUDA.Mem.DeviceBuffer}, typeof(identity)}, Strided.StridedView{ComplexF64, 4, CuArray{ComplexF64, 4, CUDA.Mem.DeviceBuffer}, typeof(identity)}})
   @ Strided ~/.julia/packages/Strided/7AzbR/src/mapreduce.jl:98
[12] copyto!
   @ ~/.julia/packages/Strided/7AzbR/src/broadcast.jl:34 [inlined]
[13] materialize!
   @ ./broadcast.jl:894 [inlined]
[14] materialize!
   @ ./broadcast.jl:891 [inlined]
[15] permutedims!(R::NDTensors.DenseTensor{ComplexF64, 4, NTuple{4, ITensors.Index{Int64}}, NDTensors.Dense{ComplexF64, CuArray{ComplexF64, 1, CUDA.Mem.DeviceBuffer}}}, T::NDTensors.DenseTensor{ComplexF64, 4, NTuple{4, ITensors.Index{Int64}}, NDTensors.Dense{ComplexF64, CuArray{ComplexF64, 1, CUDA.Mem.DeviceBuffer}}}, perm::NTuple{4, Int64})
   @ NDTensors ~/.julia/packages/NDTensors/hSwie/src/dense.jl:360
[16] contract!!(R::NDTensors.DenseTensor{ComplexF64, 3, Tuple{ITensors.Index{Int64}, ITensors.Index{Int64}, ITensors.Index{Int64}}, NDTensors.Dense{ComplexF64, CuArray{ComplexF64, 1, CUDA.Mem.DeviceBuffer}}}, labelsR::Tuple{Int64, Int64, Int64}, T1::NDTensors.Tensor{Number, 3, NDTensors.Combiner, Tuple{ITensors.Index{Int64}, ITensors.Index{Int64}, ITensors.Index{Int64}}}, labelsT1::Tuple{Int64, Int64, Int64}, T2::NDTensors.DenseTensor{ComplexF64, 4, NTuple{4, ITensors.Index{Int64}}, NDTensors.Dense{ComplexF64, CuArray{ComplexF64, 1, CUDA.Mem.DeviceBuffer}}}, labelsT2::NTuple{4, Int64})
   @ NDTensors ~/.julia/packages/NDTensors/hSwie/src/combiner.jl:114
[17] contract!!(R::NDTensors.DenseTensor{ComplexF64, 3, Tuple{ITensors.Index{Int64}, ITensors.Index{Int64}, ITensors.Index{Int64}}, NDTensors.Dense{ComplexF64, CuArray{ComplexF64, 1, CUDA.Mem.DeviceBuffer}}}, labelsR::Tuple{Int64, Int64, Int64}, T1::NDTensors.DenseTensor{ComplexF64, 4, NTuple{4, ITensors.Index{Int64}}, NDTensors.Dense{ComplexF64, CuArray{ComplexF64, 1, CUDA.Mem.DeviceBuffer}}}, labelsT1::NTuple{4, Int64}, T2::NDTensors.Tensor{Number, 3, NDTensors.Combiner, Tuple{ITensors.Index{Int64}, ITensors.Index{Int64}, ITensors.Index{Int64}}}, labelsT2::Tuple{Int64, Int64, Int64})
   @ NDTensors ~/.julia/packages/NDTensors/hSwie/src/combiner.jl:123
[18] contract(T1::NDTensors.DenseTensor{ComplexF64, 4, NTuple{4, ITensors.Index{Int64}}, NDTensors.Dense{ComplexF64, CuArray{ComplexF64, 1, CUDA.Mem.DeviceBuffer}}}, labelsT1::NTuple{4, Int64}, T2::NDTensors.Tensor{Number, 3, NDTensors.Combiner, Tuple{ITensors.Index{Int64}, ITensors.Index{Int64}, ITensors.Index{Int64}}}, labelsT2::Tuple{Int64, Int64, Int64}, labelsR::Tuple{Int64, Int64, Int64})
   @ NDTensors ~/.julia/packages/NDTensors/hSwie/src/dense.jl:576
[19] contract(T1::NDTensors.DenseTensor{ComplexF64, 4, NTuple{4, ITensors.Index{Int64}}, NDTensors.Dense{ComplexF64, CuArray{ComplexF64, 1, CUDA.Mem.DeviceBuffer}}}, labelsT1::NTuple{4, Int64}, T2::NDTensors.Tensor{Number, 3, NDTensors.Combiner, Tuple{ITensors.Index{Int64}, ITensors.Index{Int64}, ITensors.Index{Int64}}}, labelsT2::Tuple{Int64, Int64, Int64})
   @ NDTensors ~/.julia/packages/NDTensors/hSwie/src/dense.jl:564
[20] _contract(A::NDTensors.DenseTensor{ComplexF64, 4, NTuple{4, ITensors.Index{Int64}}, NDTensors.Dense{ComplexF64, CuArray{ComplexF64, 1, CUDA.Mem.DeviceBuffer}}}, B::NDTensors.Tensor{Number, 3, NDTensors.Combiner, Tuple{ITensors.Index{Int64}, ITensors.Index{Int64}, ITensors.Index{Int64}}})
   @ ITensors ~/.julia/packages/ITensors/uvZHM/src/itensor.jl:1764
[21] _contract(A::ITensors.ITensor, B::ITensors.ITensor)
   @ ITensors ~/.julia/packages/ITensors/uvZHM/src/itensor.jl:1770
[22] contract(A::ITensors.ITensor, B::ITensors.ITensor)
   @ ITensors ~/.julia/packages/ITensors/uvZHM/src/itensor.jl:1872
[23] #225
   @ ~/.julia/packages/ITensors/uvZHM/src/itensor.jl:1958 [inlined]
[24] BottomRF
   @ ./reduce.jl:81 [inlined]
[25] afoldl
   @ ./operators.jl:534 [inlined]
[26] _foldl_impl
   @ ./tuple.jl:263 [inlined]
[27] foldl_impl
   @ ./reduce.jl:48 [inlined]
[28] mapfoldl_impl
   @ ./reduce.jl:44 [inlined]
[29] #mapfoldl#214
   @ ./reduce.jl:160 [inlined]
[30] mapfoldl
   @ ./reduce.jl:160 [inlined]
[31] #foldl#215
   @ ./reduce.jl:178 [inlined]
[32] foldl
   @ ./reduce.jl:178 [inlined]
[33] contract(As::Tuple{ITensors.ITensor, ITensors.ITensor, ITensors.ITensor}; sequence::String, kwargs::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
   @ ITensors ~/.julia/packages/ITensors/uvZHM/src/itensor.jl:1958
[34] contract
   @ ~/.julia/packages/ITensors/uvZHM/src/itensor.jl:1957 [inlined]
[35] #contract#229
   @ ~/.julia/packages/ITensors/uvZHM/src/itensor.jl:1968 [inlined]
[36] contract
   @ ~/.julia/packages/ITensors/uvZHM/src/itensor.jl:1968 [inlined]
[37] #*#231
   @ ~/.julia/packages/ITensors/uvZHM/src/itensor.jl:1978 [inlined]
[38] *
   @ ~/.julia/packages/ITensors/uvZHM/src/itensor.jl:1978 [inlined]
[39] svd(A::ITensors.ITensor, Linds::Vector{ITensors.Index{Int64}}; kwargs::Base.Iterators.Pairs{Symbol, Any, NTuple{6, Symbol}, NamedTuple{(:cutoff, :maxdim, :svd_alg, :tags, :ortho, :alg), Tuple{Float64, Int64, String, String, String, String}}})
   @ ITensors ~/.julia/packages/ITensors/uvZHM/src/decomp.jl:101
[40] factorize_svd(A::ITensors.ITensor, Linds::Vector{ITensors.Index{Int64}}; kwargs::Base.Iterators.Pairs{Symbol, Any, NTuple{5, Symbol}, NamedTuple{(:cutoff, :maxdim, :svd_alg, :tags, :ortho), Tuple{Float64, Int64, String, String, String}}})
   @ ITensors ~/.julia/packages/ITensors/uvZHM/src/decomp.jl:378
[41] factorize(A::ITensors.ITensor, Linds::Vector{ITensors.Index{Int64}}; kwargs::Base.Iterators.Pairs{Symbol, Any, NTuple{5, Symbol}, NamedTuple{(:cutoff, :maxdim, :svd_alg, :tags, :ortho), Tuple{Float64, Int64, String, String, String}}})
   @ ITensors ~/.julia/packages/ITensors/uvZHM/src/decomp.jl:498
[42] ITensors.MPO(A::ITensors.ITensor, sites::Vector{Vector{ITensors.Index{Int64}}}; leftinds::Nothing, orthocenter::Int64, kwargs::Base.Iterators.Pairs{Symbol, Any, Tuple{Symbol, Symbol, Symbol}, NamedTuple{(:cutoff, :maxdim, :svd_alg), Tuple{Float64, Int64, String}}})
   @ ITensors ~/.julia/packages/ITensors/uvZHM/src/mps/abstractmps.jl:1826
[43] setindex!(ψ::ITensors.MPO, A::ITensors.ITensor, r::UnitRange{Int64}; orthocenter::Int64, perm::Nothing, kwargs::Base.Iterators.Pairs{Symbol, Any, Tuple{Symbol, Symbol, Symbol}, NamedTuple{(:cutoff, :maxdim, :svd_alg), Tuple{Float64, Int64, String}}})
   @ ITensors ~/.julia/packages/ITensors/uvZHM/src/mps/abstractmps.jl:1768
[44] setindex!(::ITensors.MPO, ::ITensors.ITensor, ::UnitRange{Int64}, ::Pair{Symbol, Any}, ::Vararg{Pair{Symbol, Any}, N} where N; kwargs::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
   @ ITensors ~/.julia/packages/ITensors/uvZHM/src/mps/abstractmps.jl:1779
[45] setindex!(::ITensors.MPO, ::ITensors.ITensor, ::UnitRange{Int64}, ::Pair{Symbol, Any}, ::Pair{Symbol, Any}, ::Vararg{Pair{Symbol, Any}, N} where N)
   @ ITensors ~/.julia/packages/ITensors/uvZHM/src/mps/abstractmps.jl:1779
[46] product(o::ITensors.ITensor, ψ::ITensors.MPO, ns::Vector{Int64}; move_sites_back::Bool, apply_dag::Bool, kwargs::Base.Iterators.Pairs{Symbol, Any, Tuple{Symbol, Symbol, Symbol}, NamedTuple{(:cutoff, :maxdim, :svd_alg), Tuple{Float64, Int64, String}}})
   @ ITensors ~/.julia/packages/ITensors/uvZHM/src/mps/abstractmps.jl:2003
[47] product(As::Vector{ITensors.ITensor}, ψ::ITensors.MPO; move_sites_back_between_gates::Bool, move_sites_back::Bool, kwargs::Base.Iterators.Pairs{Symbol, Any, NTuple{4, Symbol}, NamedTuple{(:apply_dag, :cutoff, :maxdim, :svd_alg), Tuple{Bool, Float64, Int64, String}}})
   @ ITensors ~/.julia/packages/ITensors/uvZHM/src/mps/abstractmps.jl:2110
[48] runcircuit(M::ITensors.MPO, circuit_tensors::Vector{ITensors.ITensor}; apply_dag::Bool, cutoff::Float64, maxdim::Int64, svd_alg::String, move_sites_back::Bool, eltype::Nothing, device::typeof(identity), kwargs::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
   @ PastaQ ~/.julia/packages/PastaQ/9DfPk/src/circuits/runcircuit.jl:360
[49] runcircuit(hilbert::Vector{ITensors.Index{Int64}}, circuit::Vector{Tuple}; full_representation::Bool, process::Bool, noise::Nothing, eltype::Nothing, device::Function, kwargs::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
   @ PastaQ ~/.julia/packages/PastaQ/9DfPk/src/circuits/runcircuit.jl:139
[50] top-level scope
   @ ~/julia/REFF/HS_simple.jl:28

mtfishman · 2022-04-27T18:42:53Z

Hi Joe,

When I run the code:

using ITensorGPU
using PastaQ

function U_gates(p)
  gates = Tuple[]
  gates = vcat(gates, ("Rxx", (1, 2), (ϕ = p,)))
  return gates
end

function V_gates()
  gates = Tuple[]
  gates = vcat(gates, ("H", 1))
  gates = vcat(gates, ("CX", (1, 2)))
  return gates
end

N = 2
hilbert = qubits(N)
U = runcircuit(hilbert, U_gates(0.1); device=cu, process=true)
V = runcircuit(hilbert, V_gates(); device=cu, process=true)
H = 1 - (2^(-2.0^N))*abs2(inner(U, V))

println(H)

It outputs:

┌ Warning: Performing scalar indexing on task Task (runnable) @0x00007fe7b2624010.
│ Invocation of getindex resulted in scalar indexing of a GPU array.
│ This is typically caused by calling an iterating implementation of a method.
│ Such implementations *do not* execute on the GPU, but very slowly on the CPU,
│ and therefore are only permitted from the REPL for prototyping purposes.
│ If you did intend to index this array, annotate the caller with @allowscalar.
└ @ GPUArrays ~/.julia/packages/GPUArrays/Zecv7/src/host/indexing.jl:56
0.875

so it seems to be working fine. I'm not sure where the scalar indexing warning is coming from so we may have to investigate that, but it is only an issue if the code is running slower than expected.

I am using the following versions:

julia> versioninfo()
Julia Version 1.7.2
Commit bf53498635 (2022-02-06 15:21 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Xeon(R) E-2176M  CPU @ 2.70GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-12.0.1 (ORCJIT, skylake)
Environment:
  JULIA_EDITOR = vim

julia> using Pkg

julia> Pkg.status(["ITensors", "ITensorGPU", "PastaQ"])
      Status `~/.julia/environments/v1.7/Project.toml`
  [d89171c1] ITensorGPU v0.0.5
  [9136182c] ITensors v0.3.10
  [30b07047] PastaQ v0.0.22

What versions are you using?

ghost · 2022-04-29T21:41:17Z

Strange, we seemingly have the same setup other than I am on Julia 1.6

julia> versioninfo()
Julia Version 1.6.5
Commit 9058264a69 (2021-12-19 12:30 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Xeon(R) Silver 4114  CPU @ 2.20GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-11.0.1 (ORCJIT, skylake-avx512)

julia> using Pkg

julia> Pkg.status(["ITensors", "ITensorGPU", "PastaQ"])
      Status `~/.julia/environments/v1.6/Project.toml`
  [d89171c1] ITensorGPU v0.0.5
  [9136182c] ITensors v0.3.10
  [30b07047] PastaQ v0.0.22

mtfishman · 2022-04-29T21:51:44Z

Very strange. Could you try using Julia 1.7.2 as a sanity check?

Based on the error message, it looks like a certain case of tensor contraction is going through the generic CPU contraction code instead of the specialized GPU contraction code. I don't have a good guess for why that behavior might change based on the Julia version (or other differences in our setups, like maybe the CUDA version).

mtfishman changed the title ~~GPU Example~~ Better interface for runcircuit on GPUs Apr 7, 2022

This was referenced Apr 13, 2022

[ITensors] [ENHANCEMENT] Add some helper functions for converting element types of ITensors ITensor/ITensors.jl#898

Merged

Add support for specifiying the device that runcircuit is run on as well as the element type #282

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Better interface for `runcircuit` on GPUs #274

Better interface for `runcircuit` on GPUs #274

ghost commented Apr 2, 2022

mtfishman commented Apr 2, 2022

ghost commented Apr 3, 2022

ghost commented Apr 6, 2022

mtfishman commented Apr 6, 2022

ghost commented Apr 6, 2022 •

edited by ghost

mtfishman commented Apr 6, 2022

mtfishman commented Apr 6, 2022

ghost commented Apr 7, 2022 •

edited by ghost

mtfishman commented Apr 7, 2022

ghost commented Apr 9, 2022

mtfishman commented Apr 9, 2022

ghost commented Apr 11, 2022

mtfishman commented Apr 11, 2022

mtfishman commented Apr 14, 2022

mtfishman commented Apr 15, 2022

ghost commented Apr 27, 2022

mtfishman commented Apr 27, 2022

ghost commented Apr 29, 2022

mtfishman commented Apr 29, 2022

Better interface for runcircuit on GPUs #274

Better interface for runcircuit on GPUs #274

Comments

ghost commented Apr 2, 2022

mtfishman commented Apr 2, 2022

ghost commented Apr 3, 2022

ghost commented Apr 6, 2022

mtfishman commented Apr 6, 2022

ghost commented Apr 6, 2022 • edited by ghost

mtfishman commented Apr 6, 2022

mtfishman commented Apr 6, 2022

ghost commented Apr 7, 2022 • edited by ghost

mtfishman commented Apr 7, 2022

ghost commented Apr 9, 2022

mtfishman commented Apr 9, 2022

ghost commented Apr 11, 2022

mtfishman commented Apr 11, 2022

mtfishman commented Apr 14, 2022

mtfishman commented Apr 15, 2022

ghost commented Apr 27, 2022

mtfishman commented Apr 27, 2022

ghost commented Apr 29, 2022

mtfishman commented Apr 29, 2022

Better interface for `runcircuit` on GPUs #274

Better interface for `runcircuit` on GPUs #274

ghost commented Apr 6, 2022 •

edited by ghost

ghost commented Apr 7, 2022 •

edited by ghost