Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support parallel computation in cost functions #718

Open
HDembinski opened this issue Mar 3, 2022 · 1 comment
Open

Support parallel computation in cost functions #718

HDembinski opened this issue Mar 3, 2022 · 1 comment

Comments

@HDembinski
Copy link
Member

People often ask whether iminuit/MINUIT supports parallel computation. MINUITs C++ code does not support it, since the minimization is fundamentally sequential and cannot be parallelized in a straight-forward way.

However, it is comparably easy to calculate the result of the cost function in parallel on several cores. This can be handled entirely on the Python side.

There are two ways to parallelize if you have a single fit.

  • Parallelize the model.
  • Parallelize the cost function.

Parallelize the model

Users of the builtin cost functions need to provide a model pdf or cdf, which is expected to be a vectorised function. Vectorised functions are already embarrassingly parallel, so in theory it might be enough to decorate the model with @numba.njit(parallel=True). Users need to check, however, that they function body is actually parallelizable. Numba does not generate a diagnostic if this does not work, unless special steps are taken.

import numpy as np
import numba as nb
from iminuit.cost import UnbinnedNLL

@nb.njit(parallel=True)
def model_pdf(x, mu, sigma):
    z = (x - mu) / sigma
    return np.exp(-0.5 * z * z - np.log(np.sqrt(2 * np.pi) * sigma))

rng = np.random.default_rng(1)
x = rng.normal(size=1_000_000)

cost = UnbinnedNLL(x, model_pdf)

m = Minuit(cost, mu=0, sigma=1)
m.migrad()

Parallelize the cost function

The drawback of the previous option is that it requires a parallel implementation of the model pdf/cdf. This limits the use of library-provided pdfs, for example, numba_stats only provides non-parallel versions at the moment. An alternative is to implement the parallelization in the cost function, which is also embarrassingly parallel.This approach has obvious advantages:

  • It works with any non-parallel pdf
  • Parallelization could be turned on/off with keyword passed to the cost function
import numpy as np
import numba as nb
from numba_stats import norm

@nb.njit(parallel=True)
def cost_impl(x, mu, sigma):
     r = np.empty(8)  # number of parallel threads
     chunk = len(x) // len(r)
     for i in nb.prange(len(r)):
         xi = x[i * chunk : (i+1) * chunk]
         r[i] = np.sum(norm.logpdf(xi, mu, sigma))
     return -np.sum(r)

rng = np.random.default_rng(1)
x = rng.normal(size=1_000_000)

def cost(mu, sigma):
     return cost_impl(x, mu, sigma)
cost.errordef = Minuit.LIKELIHOOD

m = Minuit(cost, mu=0, sigma=1)
m.migrad()

In theory, both approaches should work equally well. The chunk-size can be optimized to match the size of the CPU cache in the second case.

@HDembinski
Copy link
Member Author

As shown in the new study, the best results are obtained when both the model and the cost function are parallelized.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant