Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unifying _single_frame() of AnalysisBase and ParallelAnalysisBase #131

Open
yuxuanzhuang opened this issue Jul 11, 2020 · 5 comments · May be fixed by #128
Open

Unifying _single_frame() of AnalysisBase and ParallelAnalysisBase #131

yuxuanzhuang opened this issue Jul 11, 2020 · 5 comments · May be fixed by #128

Comments

@yuxuanzhuang
Copy link
Contributor

Note this only works after MDAnalysis/mdanalysis#2723 is merged to MDAnalysis

The key idea is not to break user's current codes and apply multiprocessing to AnalysisBase-based methods in MDAnalysis.
i.e. a deliverable like: AnalysisMethod.run(n_jobs=n_cpus).

As far as I am concerned, the key difference between _single_frame() of AnalysisBase and ParallelAnalysisBase is whether it modified the state, or it returns a value (as per discussion in #18). The workaround in the related PR is _single_frame modifies the state (e.g. self._results), while _dask_helper retrieves and returns the state value.

As detailing in the PR #128 and the following showcase:
https://gist.github.com/yuxuanzhuang/937863d018621ad1fd9446c1f4a2bf3b

It is still quite preliminary, all suggestions are welcome.

@yuxuanzhuang yuxuanzhuang linked a pull request Jul 11, 2020 that will close this issue
4 tasks
@yuxuanzhuang
Copy link
Contributor Author

@kain88-de
Copy link
Member

This requires changes on the _single_frame function. It would break existing user code. Which makes this API change more complicated it will likely break downstream mdanalysis packages. Having it in pmda is fine.

@yuxuanzhuang
Copy link
Contributor Author

yuxuanzhuang commented Jul 14, 2020

yeah..I realize it depends on results have to be saved inside _results.
I guess the analysis methods vary too much to reach consensus. What do you think of an API like:

analysis.parallel_run(n_jobs=cpu_count,
                                  result_attr='timeseries') # or maybe a list of attr?
...

so that users only need to fill in one extra thing for their code without touching their _single_frame. And the program becomes aware of what need to be transferred.
Of course, it still only works for some simple workflow. For aggregations, we sure need to write an extra line to sum results up inside _conclude.

@orbeckst
Copy link
Member

This requires changes on the _single_frame function. It would break existing user code. Which makes this API change more complicated it will likely break downstream mdanalysis packages.

Generally speaking, we could break _single_frame() in MDAnalysis 2.0.0 – that update will be a major break anyway. If we have to annoy our users then get it over in one break... I like the PMDA version much better than the MDA one and I would look into switching MDA over to the PMDA model as it seems more promising for the future. But that's just my opinion and perhaps it's not worthwhile breaking downstream packages and code for it.

@orbeckst orbeckst added this to the 0.5 milestone May 12, 2021
@orbeckst
Copy link
Member

We did not break _single_frame() in MDA 2.0.0.

@orbeckst orbeckst removed this from the 0.5 milestone Oct 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants