Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Parallel usage of Wavelets results in errors #101

Open
1 of 2 tasks
chaithyagr opened this issue Sep 9, 2019 · 8 comments
Open
1 of 2 tasks

[BUG] Parallel usage of Wavelets results in errors #101

chaithyagr opened this issue Sep 9, 2019 · 8 comments
Labels

Comments

@chaithyagr
Copy link
Contributor

chaithyagr commented Sep 9, 2019

System setup
OS: [e.g] macOS v10.14.1
Python version: [e.g.] v3.6.7
Python environment (if any): [e.g.] conda v4.5.11

Describe the bug
While using Pysap with Parallel in joblib to carry out Wavelet Transforms for various cases, We face issues. The nb_band_per_scale variable is not being populated.

To Reproduce

coeffs, coeffs_shape = \
                zip(*Parallel(n_jobs=self.n_cpu)
                    (delayed(self._op)
                    (data[i], self.transform[i])
                    for i in numpy.arange(self.num_channels)))

with _op function defined as :

def _op(self, data, transform):
        if isinstance(data, numpy.ndarray):
            data = pysap.Image(data=data)
        transform.data = data
        transform.analysis()
        coeffs, coeffs_shape = flatten(transform.analysis_data)
        return coeffs, coeffs_shape

Expected behavior
We expect the adjoint operation to work. But we get random errors, especially that nb_band_per_scale is None.

Module and lines involved
I see that when n_cpu=1, things work smoothly, the issue is when we extend it to have more cores.

Are you planning to submit a Pull Request?

  • Yes ---> If I get some fix that is, but this is at a lower priority for me
  • No
@chaithyagr chaithyagr added the bug label Sep 9, 2019
@zaccharieramzi
Copy link
Contributor

Can you format the code in this issue to make it more readable?

@chaithyagr
Copy link
Contributor Author

Updated codes

@zaccharieramzi
Copy link
Contributor

Cool, can you also provide a minimal failing example so that we can directly copy-paste and investigate easily (it will also potentially be the base for a future unit test)?
Also don't forget to include the error traceback in the issue.

Finally, remember to also format the code when in text.

@chaithyagr
Copy link
Contributor Author

Well, it is not quite direct, but here is the smallest I could get.

import pysap
from pysap.base.utils import flatten
from pysap.base.utils import unflatten
from joblib import Parallel, delayed
import numpy as np

num_channels = 32
n_cpu = 8
N = 64

def op(data, transform):
    if isinstance(data, np.ndarray):
        data = pysap.Image(data=data)
    transform.data = data
    transform.analysis()
    coeffs, coeffs_shape = flatten(transform.analysis_data)
    return coeffs, coeffs_shape

def adj_op(coeffs, coeffs_shape, transform):
    transform.analysis_data = unflatten(coeffs, coeffs_shape)
    image = transform.synthesis()
    return image.data

transform_klass = pysap.load_transform("db4")
transform = np.asarray([transform_klass(nb_scale=4)  for i in np.arange(num_channels)])

data = (np.random.randn(num_channels, N, N) +
        1j * np.random.randn(num_channels, N, N))

coeffs, coeffs_shape =\
    zip(*Parallel(n_jobs=n_cpu)
    (delayed(op)
     (data[i], transform[i])
     for i in np.arange(num_channels)))
coeffs_shape = np.asarray(coeffs_shape)

image = Parallel(n_jobs=n_cpu)(
    delayed(adj_op)
    (coeffs[i], coeffs_shape[i], transform[i])
    for i in np.arange(num_channels))

Note that the test fails if n_cpu>1 with following traceback : (I did not add this earlier as to me it doesn't make a lot of sense)

    transform.analysis_data = unflatten(coeffs, coeffs_shape)
  File "/home/cg260486/cgr_venv/lib/python3.5/site-packages/python_pySAP-0.0.3-py3.5-linux-x86_64.egg/pysap/base/transform.py", line 264, in _set_analysis_data
    if len(analysis_data) != sum(self.nb_band_per_scale):
TypeError: 'NoneType' object is not iterable

For some reason, when we try to run in parallel the nb_band_per_scale is not initialized.

However the above code works great with n_cpu=1, in which case, we are running sequentially

@chaithyagr
Copy link
Contributor Author

It looks like this is related with multiple imports of pysap as the backend is loky. Moving backend to threading solves this issue. I dont think there's much left to address here.
Closing.

@zaccharieramzi
Copy link
Contributor

I don't think we can close this. Indeed if at some point we want to do multi-processing (and not simply multi-threading) we will need to potentially use other backends.

Can you explain what you mean by the multiple imports of pysap?

@chaithyagr
Copy link
Contributor Author

Can you explain what you mean by the multiple imports of pysap?

For each process, a new pysap is loaded. Firstly, this adds to a lot of overhead.
In my opinion, the initializations and communications across multiple processes are not happening right in multi-process cases. We may have to explore deeper, and this could be an issue in joblib (mostly not), or here in pysap.

I am fine with keeping it open, I just felt that this could at this point mean, just merely more debug.

@chaithyagr chaithyagr reopened this Oct 1, 2019
@zaccharieramzi
Copy link
Contributor

Well yes there should only be an overhead but not the error you were mentioning. Let's keep it open for further investigation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants