Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance drop in QuTiP 5.0.1 vs QuTiP 4.7.6 #2406

Closed
gautierronan opened this issue Apr 24, 2024 · 3 comments · Fixed by #2408
Closed

Performance drop in QuTiP 5.0.1 vs QuTiP 4.7.6 #2406

gautierronan opened this issue Apr 24, 2024 · 3 comments · Fixed by #2408

Comments

@gautierronan
Copy link

gautierronan commented Apr 24, 2024

Bug Description

While benchmarking dynamiqs vs QuTiP, I've noticed a large performance drop of qutip.mesolve when going from v4.7.6 to v5.0.1. On the example below, the benchmarks show:

# QuTiP 4.7.6
# 1.42 s ± 3.17 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
# QuTiP 5.0.1
# 14.5 s ± 2.32 s per loop (mean ± std. dev. of 7 runs, 1 loop each)

Found similar behavior on two different CPUs (mac M2, and AMD Ryzen 7).

Code to Reproduce the Bug

import timeit

import numpy as np
import qutip as qt


def init(
    kappa_2: float = 1.0,
    g_cnot: float = 0.3,
    nbar: float = 4.0,
    num_tsave: int = 100,
    N: int = 16,
):
    # time evolution
    alpha = np.sqrt(nbar)
    gate_time = np.pi / (4 * alpha * g_cnot)
    tlist = np.linspace(0.0, gate_time, num_tsave)

    # operators
    ac = qt.tensor(qt.destroy(N), qt.qeye(N))
    nt = qt.tensor(qt.qeye(N), qt.num(N))

    # Hamiltonian
    H = g_cnot * (ac + ac.dag()) * (nt - nbar)

    # collapse operators
    c_ops = [np.sqrt(kappa_2) * (ac**2 - nbar)]

    # initial state
    plus = (qt.coherent(N, alpha) + qt.coherent(N, -alpha)).unit()
    psi0 = qt.tensor(plus, plus)

    kwargs = {'H': H, 'rho0': psi0, 'tlist': tlist, 'c_ops': c_ops}
    return kwargs

kwargs = init()
%timeit qt.mesolve(**kwargs)

Your Environment

Numpy Version:      1.25.2
Scipy Version:      1.11.2
Cython Version:     0.29.37
Matplotlib Version: 3.7.2
Python Version:     3.11.4
Number of CPUs:     8
BLAS Info:          OPENBLAS
INTEL MKL Ext:      False
Platform Info:      Darwin (arm64)
@nwlambert
Copy link
Member

nwlambert commented Apr 24, 2024

Thanks for pointing this out.. I had a quick play around, this seems to be some overhead from the normalization flag in mesolve. In v5 if I use options = {"normalize_output": False} I see basically same runtimes as v4.7. we should probably think about making this default to false if its having such a big cost.

To make it really "like for like" you can also wrap it so that it uses CSR format for everything

with qt.CoreOptions(default_dtype="csr"):
    kwargs = init()
    options = {"normalize_output": False}
    %timeit qt.mesolve(**kwargs, options = options)

Which data-type is best for is a bit problem dependent. let us know if this gives you comparable results!

On the topic of dynamiqs, you might be interested in trying out the jax/jaxdia data layer, there's some examples here
https://github.com/nwlambert/QuTiP-Jax-GPU-example

We also tried this example with dynamiqs (with its double precision setting), I think you guys came out slightly ahead speed-wise because I think you dont vectorize/use superoperators, which seems to suit jax/diffrax! though if i did that manually with qutip we essentially get the same, which I guess makes sense. The jaxdia data layer lets us go to a few more spins (more with sesolve, not so many with mesolve!)

@gautierronan
Copy link
Author

Thanks, indeed the options = {"normalize_output": False} seems to do the trick, but I don't see why normalizing the output state would take so long? Is it normalized at every time step? The system is not that big, so computing the trace of a bunch of density matrices should be almost instantaneous.

On the topic of dynamiqs, you might be interested in trying out the jax/jaxdia data layer

Yes we're in the process of implementing a sparse DIA format as well. This is something I had suggested to @Ericgig when visiting Sherbrooke in early 2023, but you guys were much faster than us to make it work! I'm quite happy that this is being spread around. Maybe we'll be able to push cuSPARSE for a native DIA format at some point.

@Ericgig
Copy link
Member

Ericgig commented Apr 24, 2024

I looked at it too and the issue is with the norm function. The norm trace function is used (tr(sqrt(A @ A.dag()))) and it's quite slow. Using the normal trace does not cause visible slowdown.
We should be able to fix that on our side,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants