Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better interface for runcircuit on GPUs #274

Open
ghost opened this issue Apr 2, 2022 · 19 comments
Open

Better interface for runcircuit on GPUs #274

ghost opened this issue Apr 2, 2022 · 19 comments

Comments

@ghost
Copy link

ghost commented Apr 2, 2022

I would like to experiment with the speedup gained when running PastaQ on a GPU, however it seems that the GPU example provided is out of date and does not run, please could this be updated at some point to demonstrate the basic usage.

@mtfishman
Copy link
Collaborator

That would be great if you could try running it on GPU, I'm curious if circuit evolution gets any speedups.

I just updated the example (https://github.com/GTorlai/PastaQ.jl/blob/master/examples/gpu/2_randomcircuit.jl), please try running it and let me know if there are any issues and also please post any benchmark results that you get (it would be great to know whether or not it is faster on GPU).

@ghost
Copy link
Author

ghost commented Apr 3, 2022

Thanks for doing this, I will leave the issue open and update you later what I find.

@ghost
Copy link
Author

ghost commented Apr 6, 2022

Starting with the GPU example provided, I fixed the depth of the random circuit to equal the number of qubits (n qubits and n layers), and then measured the execution time with the standard code on a CPU (Intel Xeon Silver 4114) vs with the ITensorGPU version (on a Nvidia Tesla V100). It seems like for this circuit structure you gain an advantage using the GPU around 15 qubits, which is roughly what I would have expected.

pastaq_gpu

@mtfishman
Copy link
Collaborator

Cool, thanks! Great to see that there is a speedup.

@ghost
Copy link
Author

ghost commented Apr 6, 2022

I have some Hilbert-Schmidt Inner products to compute between unitaries on reasonably large systems I would also like to try speeding up with the GPU, however I haven't managed to successfully convert it in the same manner.

A small toy example (hopefully correct) would be:

using ITensors
using PastaQ

function U_gates(p)
    gates = Tuple[]
    gates = vcat(gates, ("Rxx", (1, 2), (φ = p,)))
    return gates
end

function V_gates()
    gates = Tuple[]
    gates = vcat(gates, ("H", 1))
    gates = vcat(gates, ("CX", (1, 2)))
    return gates
end

N = 2
hilbert = qubits(N)
U = runcircuit(hilbert, U_gates(0.1); process=true)
V = runcircuit(hilbert, V_gates(); process=true)
H = 1 - (2^(-2.0^N))*abs2(inner(U, V))

println(H)

In the GPU example provided, 'productstate(N)' is moved onto the GPU, however I tried doing the same with 'qubits(N)' but this didn't seem to match the convert_eltype function.

@mtfishman
Copy link
Collaborator

Yeah, unfortunately now you will have to define the product state and move it to GPU yourself, since PastaQ isn't really "GPU aware". We will have to think of an interface in PastaQ for specifying that the circuit evolution should be performed on GPU.

@mtfishman
Copy link
Collaborator

I guess an issue in the example you show is also that you would need to move the gates to GPU yourself as well... There are definitely some improvements we could make on the PastaQ side to make running circuits on GPU more convenient. Does it work to build the product state and gates and move them to GPU yourself for that case? Giacomo and I will have to discuss a good interface for running circuits on GPU.

@ghost
Copy link
Author

ghost commented Apr 7, 2022

I tried to do that like this but I didn't get it working, do you have an alternate suggestion?

using ITensors
using PastaQ

using CUDA: allowscalar
allowscalar(false)

using ITensors.NDTensors: tensor

function convert_eltype(ElType::Type, T::ITensor)
    if eltype(T) == ElType
        return T
    end
    return itensor(ElType.(tensor(T)))
end

device = cu
device_eltype = ComplexF64

function U_gates(p)
    gates = Tuple[]
    gates = vcat(gates, ("Rxx", (1, 2), (φ = p,)))
    return gates
end

function V_gates()
    gates = Tuple[]
    gates = vcat(gates, ("H", 1))
    gates = vcat(gates, ("CX", (1, 2)))
    return gates
end

N = 2
hilbert_cpu = qubits(N)
hilbert = device(convert_eltype.(device_eltype, hilbert_cpu))

U_tensors = map(gate_layer -> device.(convert_eltype.(device_eltype, gate_layer)), build circuit(hilbert, U_gates(0.1)))
V_tensors = map(gate_layer -> device.(convert_eltype.(device_eltype, gate_layer)), build circuit(hilbert, V_gates()))

U = runcircuit(hilbert, U_tensors; process=true)
V = runcircuit(hilbert, V_tensors; process=true)
H = 1 - (2^(-2.0^N))*abs2(inner(U, V))

println(H)

This gave the error

image

@mtfishman
Copy link
Collaborator

Try:

using ITensors
using ITensorGPU
using PastaQ

using CUDA: allowscalar
allowscalar(false)

using ITensors.NDTensors: tensor

function convert_eltype(ElType::Type, T::ITensor)
    if eltype(T) == ElType
        return T
    end
    return itensor(ElType.(tensor(T)))
end

device = cu
device_eltype = ComplexF64

function U_gates(p)
    gates = Tuple[]
    gates = vcat(gates, ("Rxx", (1, 2), (ϕ = p,)))
    return gates
end

function V_gates()
    gates = Tuple[]
    gates = vcat(gates, ("H", 1))
    gates = vcat(gates, ("CX", (1, 2)))
    return gates
end

N = 2
state_cpu = productstate(qubits(N))
state = device(convert_eltype.(device_eltype, state_cpu))

U_tensors = map(gate -> device(convert_eltype(device_eltype, gate)), buildcircuit(state, U_gates(0.1)))
V_tensors = map(gate -> device(convert_eltype(device_eltype, gate)), buildcircuit(state, V_gates()))

U = runcircuit(state, U_tensors; process=true)
V = runcircuit(state, V_tensors; process=true)
H = 1 - (2^(-2.0^N))*abs2(inner(U, V))

println(H)

The reason for your error is that qubits(N) just defines the Hilbert space (the site indices), not the actual tensors, which you were then trying to transfer to GPU. Only the tensor data itself gets sent to the GPU, so you first need to make the state with productstate and then transfer the state to GPU. There were also a few other issues I fixed in that code.

Ultimately I think we should have a function like cu_runcircuit so that you could just run your original script and it would initialize the data on GPU for you.

@mtfishman mtfishman changed the title GPU Example Better interface for runcircuit on GPUs Apr 7, 2022
@ghost
Copy link
Author

ghost commented Apr 9, 2022

Using this code it does successfully run on the GPU and complete without error. I do think however that inputting qubits(N) into productstate() might be computing the wrong quantity, as when using the standard CPU code, the printed output changes value when switching to productstate(qubits(N)) from qubits(N).
Doesn't using productstate() mean we will be computing an inner product with the state/MPS U|0> rather than the unitary/MPO U?

@mtfishman
Copy link
Collaborator

Ah, I didn't appreciate that you were trying to evolve an operator. Probably you want to use productoperator(qubits(N)) which makes an identity MPO.

@ghost
Copy link
Author

ghost commented Apr 11, 2022

Using productoperator(qubits(N)) gives an incorrect answer, it outputs roughly 1e-15 rather than the correct answer of around 0.8...
I will keep trying other methods, but thank you for your help with this problem.

@mtfishman
Copy link
Collaborator

Yeah, I think I would need more details of the actual computation you are trying to do in order to help more. You may need to dig into the runcircuit function to try to reproduce whatever starting state/process it is creating for you and make it yourself so you can transfer it to GPU.

@mtfishman
Copy link
Collaborator

@JoeGibbs88 please take a look at #282. I added an interface where you can specify device=cu in runcircuit to run on GPU.

Additionally you can specify eltype=Float32 or eltype=ComplexF32 since oftentimes single precision calculations are faster on GPU.

Could you test out the branch runcircuit_gpu for your example and see if it works?

@mtfishman
Copy link
Collaborator

@JoeGibbs88 #282 is merged and will be available soon in PastaQ 0.0.22, please try updating to the latest version of PastaQ and try out the new device and eltype keyword arguments in runcircuit.

@ghost
Copy link
Author

ghost commented Apr 27, 2022

@mtfishman thanks for updating the interface, apologies for the late reply. I have tried using the new syntax to run the code I posted previously, however it errored out when running on the GPU.

Code:

using ITensorGPU
using PastaQ

#using CUDA: allowscalar
#allowscalar(false)

function U_gates(p)
    gates = Tuple[]
    gates = vcat(gates, ("Rxx", (1, 2), (φ = p,)))
    return gates
end

function V_gates()
    gates = Tuple[]
    gates = vcat(gates, ("H", 1))
    gates = vcat(gates, ("CX", (1, 2)))
    return gates
end

N = 2
hilbert = qubits(N)
U = runcircuit(hilbert, U_gates(0.1); device=cu, process=true)
V = runcircuit(hilbert, V_gates(); device=cu, process=true)
H = 1 - (2^(-2.0^N))*abs2(inner(U, V))

println(H)

(I wasn't sure if the allowscalar lines were needed but it didn't change the error)

Error:

ERROR: LoadError: Scalar indexing is disallowed.
Invocation of getindex resulted in scalar indexing of a GPU array.
This is typically caused by calling an iterating implementation of a method.
Such implementations *do not* execute on the GPU, but very slowly on the CPU,
and therefore are only permitted from the REPL for prototyping purposes.
If you did intend to index this array, annotate the caller with @allowscalar.
Stacktrace:
 [1] error(s::String)
   @ Base ./error.jl:33
 [2] assertscalar(op::String)
   @ GPUArrays ~/.julia/packages/GPUArrays/Zecv7/src/host/indexing.jl:53
 [3] getindex(xs::CuArray{ComplexF64, 4, CUDA.Mem.DeviceBuffer}, I::Int64)
   @ GPUArrays ~/.julia/packages/GPUArrays/Zecv7/src/host/indexing.jl:86
 [4] getindex
   @ ~/.julia/packages/Strided/7AzbR/src/stridedview.jl:86 [inlined]
 [5] macro expansion
   @ ~/.julia/packages/Strided/7AzbR/src/mapreduce.jl:304 [inlined]
 [6] macro expansion
   @ ./simdloop.jl:77 [inlined]
 [7] macro expansion
   @ ~/.julia/packages/Strided/7AzbR/src/mapreduce.jl:303 [inlined]
 [8] _mapreduce_kernel!(f::Strided.CaptureArgs{typeof(identity), Tuple{Strided.Arg}}, op::Nothing, initop::Nothing, dims::NTuple{4, Int64}, blocks::NTuple{4, Int64}, arrays::Tuple{Strided.StridedView{ComplexF64, 4, CuArray{ComplexF64, 4, CUDA.Mem.DeviceBuffer}, typeof(identity)}, Strided.StridedView{ComplexF64, 4, CuArray{ComplexF64, 4, CUDA.Mem.DeviceBuffer}, typeof(identity)}}, strides::Tuple{NTuple{4, Int64}, NTuple{4, Int64}}, offsets::Tuple{Int64, Int64})
   @ Strided ~/.julia/packages/Strided/7AzbR/src/mapreduce.jl:205
 [9] _mapreduce_block!(f::Any, op::Any, initop::Any, dims::NTuple{4, Int64}, strides::Tuple{NTuple{4, Int64}, NTuple{4, Int64}}, offsets::Tuple{Int64, Int64}, costs::NTuple{4, Int64}, arrays::Tuple{Strided.StridedView{ComplexF64, 4, CuArray{ComplexF64, 4, CUDA.Mem.DeviceBuffer}, typeof(identity)}, Strided.StridedView{ComplexF64, 4, CuArray{ComplexF64, 4, CUDA.Mem.DeviceBuffer}, typeof(identity)}})
   @ Strided ~/.julia/packages/Strided/7AzbR/src/mapreduce.jl:135
[10] _mapreduce_order!(f::Any, op::Any, initop::Any, dims::NTuple{4, Int64}, strides::Tuple{NTuple{4, Int64}, NTuple{4, Int64}}, arrays::Tuple{Strided.StridedView{ComplexF64, 4, CuArray{ComplexF64, 4, CUDA.Mem.DeviceBuffer}, typeof(identity)}, Strided.StridedView{ComplexF64, 4, CuArray{ComplexF64, 4, CUDA.Mem.DeviceBuffer}, typeof(identity)}})
   @ Strided ~/.julia/packages/Strided/7AzbR/src/mapreduce.jl:120
[11] _mapreduce_fuse!(f::Any, op::Any, initop::Any, dims::NTuple{4, Int64}, arrays::Tuple{Strided.StridedView{ComplexF64, 4, CuArray{ComplexF64, 4, CUDA.Mem.DeviceBuffer}, typeof(identity)}, Strided.StridedView{ComplexF64, 4, CuArray{ComplexF64, 4, CUDA.Mem.DeviceBuffer}, typeof(identity)}})
   @ Strided ~/.julia/packages/Strided/7AzbR/src/mapreduce.jl:98
[12] copyto!
   @ ~/.julia/packages/Strided/7AzbR/src/broadcast.jl:34 [inlined]
[13] materialize!
   @ ./broadcast.jl:894 [inlined]
[14] materialize!
   @ ./broadcast.jl:891 [inlined]
[15] permutedims!(R::NDTensors.DenseTensor{ComplexF64, 4, NTuple{4, ITensors.Index{Int64}}, NDTensors.Dense{ComplexF64, CuArray{ComplexF64, 1, CUDA.Mem.DeviceBuffer}}}, T::NDTensors.DenseTensor{ComplexF64, 4, NTuple{4, ITensors.Index{Int64}}, NDTensors.Dense{ComplexF64, CuArray{ComplexF64, 1, CUDA.Mem.DeviceBuffer}}}, perm::NTuple{4, Int64})
   @ NDTensors ~/.julia/packages/NDTensors/hSwie/src/dense.jl:360
[16] contract!!(R::NDTensors.DenseTensor{ComplexF64, 3, Tuple{ITensors.Index{Int64}, ITensors.Index{Int64}, ITensors.Index{Int64}}, NDTensors.Dense{ComplexF64, CuArray{ComplexF64, 1, CUDA.Mem.DeviceBuffer}}}, labelsR::Tuple{Int64, Int64, Int64}, T1::NDTensors.Tensor{Number, 3, NDTensors.Combiner, Tuple{ITensors.Index{Int64}, ITensors.Index{Int64}, ITensors.Index{Int64}}}, labelsT1::Tuple{Int64, Int64, Int64}, T2::NDTensors.DenseTensor{ComplexF64, 4, NTuple{4, ITensors.Index{Int64}}, NDTensors.Dense{ComplexF64, CuArray{ComplexF64, 1, CUDA.Mem.DeviceBuffer}}}, labelsT2::NTuple{4, Int64})
   @ NDTensors ~/.julia/packages/NDTensors/hSwie/src/combiner.jl:114
[17] contract!!(R::NDTensors.DenseTensor{ComplexF64, 3, Tuple{ITensors.Index{Int64}, ITensors.Index{Int64}, ITensors.Index{Int64}}, NDTensors.Dense{ComplexF64, CuArray{ComplexF64, 1, CUDA.Mem.DeviceBuffer}}}, labelsR::Tuple{Int64, Int64, Int64}, T1::NDTensors.DenseTensor{ComplexF64, 4, NTuple{4, ITensors.Index{Int64}}, NDTensors.Dense{ComplexF64, CuArray{ComplexF64, 1, CUDA.Mem.DeviceBuffer}}}, labelsT1::NTuple{4, Int64}, T2::NDTensors.Tensor{Number, 3, NDTensors.Combiner, Tuple{ITensors.Index{Int64}, ITensors.Index{Int64}, ITensors.Index{Int64}}}, labelsT2::Tuple{Int64, Int64, Int64})
   @ NDTensors ~/.julia/packages/NDTensors/hSwie/src/combiner.jl:123
[18] contract(T1::NDTensors.DenseTensor{ComplexF64, 4, NTuple{4, ITensors.Index{Int64}}, NDTensors.Dense{ComplexF64, CuArray{ComplexF64, 1, CUDA.Mem.DeviceBuffer}}}, labelsT1::NTuple{4, Int64}, T2::NDTensors.Tensor{Number, 3, NDTensors.Combiner, Tuple{ITensors.Index{Int64}, ITensors.Index{Int64}, ITensors.Index{Int64}}}, labelsT2::Tuple{Int64, Int64, Int64}, labelsR::Tuple{Int64, Int64, Int64})
   @ NDTensors ~/.julia/packages/NDTensors/hSwie/src/dense.jl:576
[19] contract(T1::NDTensors.DenseTensor{ComplexF64, 4, NTuple{4, ITensors.Index{Int64}}, NDTensors.Dense{ComplexF64, CuArray{ComplexF64, 1, CUDA.Mem.DeviceBuffer}}}, labelsT1::NTuple{4, Int64}, T2::NDTensors.Tensor{Number, 3, NDTensors.Combiner, Tuple{ITensors.Index{Int64}, ITensors.Index{Int64}, ITensors.Index{Int64}}}, labelsT2::Tuple{Int64, Int64, Int64})
   @ NDTensors ~/.julia/packages/NDTensors/hSwie/src/dense.jl:564
[20] _contract(A::NDTensors.DenseTensor{ComplexF64, 4, NTuple{4, ITensors.Index{Int64}}, NDTensors.Dense{ComplexF64, CuArray{ComplexF64, 1, CUDA.Mem.DeviceBuffer}}}, B::NDTensors.Tensor{Number, 3, NDTensors.Combiner, Tuple{ITensors.Index{Int64}, ITensors.Index{Int64}, ITensors.Index{Int64}}})
   @ ITensors ~/.julia/packages/ITensors/uvZHM/src/itensor.jl:1764
[21] _contract(A::ITensors.ITensor, B::ITensors.ITensor)
   @ ITensors ~/.julia/packages/ITensors/uvZHM/src/itensor.jl:1770
[22] contract(A::ITensors.ITensor, B::ITensors.ITensor)
   @ ITensors ~/.julia/packages/ITensors/uvZHM/src/itensor.jl:1872
[23] #225
   @ ~/.julia/packages/ITensors/uvZHM/src/itensor.jl:1958 [inlined]
[24] BottomRF
   @ ./reduce.jl:81 [inlined]
[25] afoldl
   @ ./operators.jl:534 [inlined]
[26] _foldl_impl
   @ ./tuple.jl:263 [inlined]
[27] foldl_impl
   @ ./reduce.jl:48 [inlined]
[28] mapfoldl_impl
   @ ./reduce.jl:44 [inlined]
[29] #mapfoldl#214
   @ ./reduce.jl:160 [inlined]
[30] mapfoldl
   @ ./reduce.jl:160 [inlined]
[31] #foldl#215
   @ ./reduce.jl:178 [inlined]
[32] foldl
   @ ./reduce.jl:178 [inlined]
[33] contract(As::Tuple{ITensors.ITensor, ITensors.ITensor, ITensors.ITensor}; sequence::String, kwargs::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
   @ ITensors ~/.julia/packages/ITensors/uvZHM/src/itensor.jl:1958
[34] contract
   @ ~/.julia/packages/ITensors/uvZHM/src/itensor.jl:1957 [inlined]
[35] #contract#229
   @ ~/.julia/packages/ITensors/uvZHM/src/itensor.jl:1968 [inlined]
[36] contract
   @ ~/.julia/packages/ITensors/uvZHM/src/itensor.jl:1968 [inlined]
[37] #*#231
   @ ~/.julia/packages/ITensors/uvZHM/src/itensor.jl:1978 [inlined]
[38] *
   @ ~/.julia/packages/ITensors/uvZHM/src/itensor.jl:1978 [inlined]
[39] svd(A::ITensors.ITensor, Linds::Vector{ITensors.Index{Int64}}; kwargs::Base.Iterators.Pairs{Symbol, Any, NTuple{6, Symbol}, NamedTuple{(:cutoff, :maxdim, :svd_alg, :tags, :ortho, :alg), Tuple{Float64, Int64, String, String, String, String}}})
   @ ITensors ~/.julia/packages/ITensors/uvZHM/src/decomp.jl:101
[40] factorize_svd(A::ITensors.ITensor, Linds::Vector{ITensors.Index{Int64}}; kwargs::Base.Iterators.Pairs{Symbol, Any, NTuple{5, Symbol}, NamedTuple{(:cutoff, :maxdim, :svd_alg, :tags, :ortho), Tuple{Float64, Int64, String, String, String}}})
   @ ITensors ~/.julia/packages/ITensors/uvZHM/src/decomp.jl:378
[41] factorize(A::ITensors.ITensor, Linds::Vector{ITensors.Index{Int64}}; kwargs::Base.Iterators.Pairs{Symbol, Any, NTuple{5, Symbol}, NamedTuple{(:cutoff, :maxdim, :svd_alg, :tags, :ortho), Tuple{Float64, Int64, String, String, String}}})
   @ ITensors ~/.julia/packages/ITensors/uvZHM/src/decomp.jl:498
[42] ITensors.MPO(A::ITensors.ITensor, sites::Vector{Vector{ITensors.Index{Int64}}}; leftinds::Nothing, orthocenter::Int64, kwargs::Base.Iterators.Pairs{Symbol, Any, Tuple{Symbol, Symbol, Symbol}, NamedTuple{(:cutoff, :maxdim, :svd_alg), Tuple{Float64, Int64, String}}})
   @ ITensors ~/.julia/packages/ITensors/uvZHM/src/mps/abstractmps.jl:1826
[43] setindex!(ψ::ITensors.MPO, A::ITensors.ITensor, r::UnitRange{Int64}; orthocenter::Int64, perm::Nothing, kwargs::Base.Iterators.Pairs{Symbol, Any, Tuple{Symbol, Symbol, Symbol}, NamedTuple{(:cutoff, :maxdim, :svd_alg), Tuple{Float64, Int64, String}}})
   @ ITensors ~/.julia/packages/ITensors/uvZHM/src/mps/abstractmps.jl:1768
[44] setindex!(::ITensors.MPO, ::ITensors.ITensor, ::UnitRange{Int64}, ::Pair{Symbol, Any}, ::Vararg{Pair{Symbol, Any}, N} where N; kwargs::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
   @ ITensors ~/.julia/packages/ITensors/uvZHM/src/mps/abstractmps.jl:1779
[45] setindex!(::ITensors.MPO, ::ITensors.ITensor, ::UnitRange{Int64}, ::Pair{Symbol, Any}, ::Pair{Symbol, Any}, ::Vararg{Pair{Symbol, Any}, N} where N)
   @ ITensors ~/.julia/packages/ITensors/uvZHM/src/mps/abstractmps.jl:1779
[46] product(o::ITensors.ITensor, ψ::ITensors.MPO, ns::Vector{Int64}; move_sites_back::Bool, apply_dag::Bool, kwargs::Base.Iterators.Pairs{Symbol, Any, Tuple{Symbol, Symbol, Symbol}, NamedTuple{(:cutoff, :maxdim, :svd_alg), Tuple{Float64, Int64, String}}})
   @ ITensors ~/.julia/packages/ITensors/uvZHM/src/mps/abstractmps.jl:2003
[47] product(As::Vector{ITensors.ITensor}, ψ::ITensors.MPO; move_sites_back_between_gates::Bool, move_sites_back::Bool, kwargs::Base.Iterators.Pairs{Symbol, Any, NTuple{4, Symbol}, NamedTuple{(:apply_dag, :cutoff, :maxdim, :svd_alg), Tuple{Bool, Float64, Int64, String}}})
   @ ITensors ~/.julia/packages/ITensors/uvZHM/src/mps/abstractmps.jl:2110
[48] runcircuit(M::ITensors.MPO, circuit_tensors::Vector{ITensors.ITensor}; apply_dag::Bool, cutoff::Float64, maxdim::Int64, svd_alg::String, move_sites_back::Bool, eltype::Nothing, device::typeof(identity), kwargs::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
   @ PastaQ ~/.julia/packages/PastaQ/9DfPk/src/circuits/runcircuit.jl:360
[49] runcircuit(hilbert::Vector{ITensors.Index{Int64}}, circuit::Vector{Tuple}; full_representation::Bool, process::Bool, noise::Nothing, eltype::Nothing, device::Function, kwargs::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
   @ PastaQ ~/.julia/packages/PastaQ/9DfPk/src/circuits/runcircuit.jl:139
[50] top-level scope
   @ ~/julia/REFF/HS_simple.jl:28

@mtfishman
Copy link
Collaborator

Hi Joe,

When I run the code:

using ITensorGPU
using PastaQ

function U_gates(p)
  gates = Tuple[]
  gates = vcat(gates, ("Rxx", (1, 2), (ϕ = p,)))
  return gates
end

function V_gates()
  gates = Tuple[]
  gates = vcat(gates, ("H", 1))
  gates = vcat(gates, ("CX", (1, 2)))
  return gates
end

N = 2
hilbert = qubits(N)
U = runcircuit(hilbert, U_gates(0.1); device=cu, process=true)
V = runcircuit(hilbert, V_gates(); device=cu, process=true)
H = 1 - (2^(-2.0^N))*abs2(inner(U, V))

println(H)

It outputs:

┌ Warning: Performing scalar indexing on task Task (runnable) @0x00007fe7b2624010.
│ Invocation of getindex resulted in scalar indexing of a GPU array.
│ This is typically caused by calling an iterating implementation of a method.
│ Such implementations *do not* execute on the GPU, but very slowly on the CPU,
│ and therefore are only permitted from the REPL for prototyping purposes.
│ If you did intend to index this array, annotate the caller with @allowscalar.
└ @ GPUArrays ~/.julia/packages/GPUArrays/Zecv7/src/host/indexing.jl:56
0.875

so it seems to be working fine. I'm not sure where the scalar indexing warning is coming from so we may have to investigate that, but it is only an issue if the code is running slower than expected.

I am using the following versions:

julia> versioninfo()
Julia Version 1.7.2
Commit bf53498635 (2022-02-06 15:21 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Xeon(R) E-2176M  CPU @ 2.70GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-12.0.1 (ORCJIT, skylake)
Environment:
  JULIA_EDITOR = vim

julia> using Pkg

julia> Pkg.status(["ITensors", "ITensorGPU", "PastaQ"])
      Status `~/.julia/environments/v1.7/Project.toml`
  [d89171c1] ITensorGPU v0.0.5
  [9136182c] ITensors v0.3.10
  [30b07047] PastaQ v0.0.22

What versions are you using?

@ghost
Copy link
Author

ghost commented Apr 29, 2022

Strange, we seemingly have the same setup other than I am on Julia 1.6

julia> versioninfo()
Julia Version 1.6.5
Commit 9058264a69 (2021-12-19 12:30 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Xeon(R) Silver 4114  CPU @ 2.20GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-11.0.1 (ORCJIT, skylake-avx512)

julia> using Pkg

julia> Pkg.status(["ITensors", "ITensorGPU", "PastaQ"])
      Status `~/.julia/environments/v1.6/Project.toml`
  [d89171c1] ITensorGPU v0.0.5
  [9136182c] ITensors v0.3.10
  [30b07047] PastaQ v0.0.22

@mtfishman
Copy link
Collaborator

Very strange. Could you try using Julia 1.7.2 as a sanity check?

Based on the error message, it looks like a certain case of tensor contraction is going through the generic CPU contraction code instead of the specialized GPU contraction code. I don't have a good guess for why that behavior might change based on the Julia version (or other differences in our setups, like maybe the CUDA version).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant