Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when running AnalysisFromFunction() on more processes than frames #147

Open
luponzo86 opened this issue Dec 22, 2020 · 3 comments
Open
Labels

Comments

@luponzo86
Copy link

Expected behaviour

Successfully running AnalysisFromFunction() on all available CPUs by setting n_jobs=-1 even for very small trajectories.

Actual behaviour

A Warning is raised:

/srv/home/lponzoni/anaconda3/envs/ifpe/lib/python3.7/site-packages/pmda/parallel.py:360: UserWarning: run() uses more blocks than frames: decrease n_blocks
  warnings.warn("run() uses more blocks than frames: "
/srv/home/lponzoni/anaconda3/envs/ifpe/lib/python3.7/site-packages/numpy/core/_asarray.py:83: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray
  return array(a, dtype, copy=False, order=order)

but the code runs anyway until an error is thrown:



---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)

<omissis>

/srv/home/lponzoni/anaconda3/envs/ifpe/lib/python3.7/site-packages/pmda/parallel.py in run(self, start, stop, step, n_jobs, n_blocks)
    398                 # save the frame numbers for all blocks
    399                 self._blocks = _blocks
--> 400                 self._conclude()
    401         # put all time information into the timing object
    402         self.timing = Timing(

/srv/home/lponzoni/anaconda3/envs/ifpe/lib/python3.7/site-packages/pmda/custom.py in _conclude(self)
    101 
    102     def _conclude(self):
--> 103         self.results = np.concatenate(self._results)
    104 
    105 

<__array_function__ internals> in concatenate(*args, **kwargs)

ValueError: all the input arrays must have same number of dimensions, but the array at index 0 has 2 dimension(s) and the array at index 10 has 1 dimension(s)

Code to reproduce the behaviour

I could not find MDs to run an example on (I had problems installing MDAnalysisTests, see issue #3084) but it basically happens when AnalysisFromFunction() is run on a trajectory with n frames and n_jobs is set to a value greater than n, or n_jobs = -1

This is not a big deal, but it was hard to debug and I wanted to report it.

Currently version of MDAnalysis: 1.0.0

pmda version: 0.3.0

@luponzo86
Copy link
Author

A quick fix would be to add the following check:

    # import trajectory
    u = mda.Universe(pdb_file, traj_file)

    # set number of parallel processes
    if n_jobs == -1:
        n_jobs = len(os.sched_getaffinity(0))
    # make sure that n_jobs is not greater than the actual number of frames
    n_total_frames = len(u.trajectory)
    n_actual_frames = len(range(
        start if start else 0,
        min(n_total_frames, stop) if stop else n_total_frames,
        step if step else 1))
    n_jobs = min(n_jobs, n_actual_frames)

@orbeckst
Copy link
Member

orbeckst commented May 4, 2021

Thank you.

@orbeckst orbeckst added the bug label May 4, 2021
@orbeckst
Copy link
Member

@luponzo86 you could create a pull request with your check. We would review, guide you in adding tests, and you'd become an author of PMDA.

Development on PMDA is currently pretty slow because everybody is doing many other things (and in particular, there's a lot of work on MDAnalysis itself). Any help is greatly appreciated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants