Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor pmda after universe can be serialized #132

Open
wants to merge 34 commits into
base: master
Choose a base branch
from

Conversation

yuxuanzhuang
Copy link
Contributor

@yuxuanzhuang yuxuanzhuang commented Jul 15, 2020

Fixes #133

Changes made in this Pull Request:

  • refactor each part of pmda (test passed)
    • parallel.py
    • custom.py
    • rmsd
    • rmsf
    • contact
    • Hbond
    • RDF
    • density
    • leaflet

PR Checklist

  • Tests?
  • Docs?
  • CHANGELOG updated?
  • Issue raised/referenced?

@pep8speaks
Copy link

pep8speaks commented Jul 15, 2020

Hello @yuxuanzhuang! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

Line 16:80: E501 line too long (84 > 79 characters)
Line 16:84: W504 line break after binary operator
Line 58:80: E501 line too long (104 > 79 characters)
Line 69:80: E501 line too long (115 > 79 characters)

Comment last updated at 2021-05-12 17:44:51 UTC

@orbeckst orbeckst mentioned this pull request Jul 15, 2020
4 tasks
@orbeckst orbeckst linked an issue Jul 15, 2020 that may be closed by this pull request
Copy link
Member

@orbeckst orbeckst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes the code simpler, nice!

See initial comments.

Docs will also need an update, especially everything that shows how to use ParallelAnalysisBase.

self.kwargs = kwargs

def _prepare(self):
self.results = []

def _single_frame(self, ts, atomgroups):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, cool that this works.

@@ -259,10 +256,15 @@ def __init__(self, atomgroup, delta=1.0, atomselection=None,
elif not updating and atomselection is not None:
raise ValueError("""With updating=False, the atomselection='{}' is
not used and should be None""".format(atomselection))
elif updating and atomselection is not None:
self._select_atomgroup = atomgroup.select_atoms(atomselection,
updating=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do updating AtomGroups work with the serialization?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! Thanks to what has already been implemented by Richard:)

pmda/parallel.py Outdated
np.array([el[4] for el in res]),
np.array([el[5] for el in res]))

# this is crucial if the analysis does not iterate over
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this crucial? What happens otherwise?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because--no sure it should be defined as a bug--
e.g. Density Analysis (both in MDAnalysis and this PR) depends on the current ts of the universe.

def _prepare(self):
        coord = self._select_atomgroup.positions  #   It will change with ts.
        ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And currently, the universe will stay its ending frame after analysis if not being rewinded.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh because this does not return a copy. I would not do the rewind. If people want a copy they should take one. That can be fixed in the density analysis class.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thing is here we are not utilizing FrameIteratorSliced (which does the rewind after iteration) because we want to get accurate timing by self._ts = self._trajectory[i]. So some discrepancy between AnalysisBase and ParallelAnalysisBase:

u = mda.Universe(GRO, XTC)
serial_analysis(u.atoms).run(stop=3)
u.trajectory.ts.frame == 0
...
parallel_analysis(u.atoms).rum(stop=3)
u.trajectory.ts.frame == 3

@orbeckst
Copy link
Member

To get the tests going, change Travis to build and install MDAnalysis from yuxuanzhuang:serialize_io in PR MDAnalysis/mdanalysis#2723 – there's a pip command line/url way to directly use a git branch. I think we used it for PMDA in the past.

pmda/parallel.py Outdated
if(isinstance(item, mda.Universe)):
universe_dict[key] = item
universe_dict.update(base_dict)
return universe_dict
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before we are settled about AtomGroup, here I hack the order of the attribute dict (although it should not be ordered, it somehow matters) so we always pickle Universe before Atomgroup.
Not sure how we should deal with unpicklable attributes. Note cloudpickle which dask uses can literally pickle open file handler.

else:
# raise HalError("I'm sorry Dave, I'm afraid I can't do that")
raise AttributeError("Can't set attribute at this time")
raise AttributeError("Can't set '{}' at this time".format(key))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah just use python 3.6 or newer here f"Can't set {key} at this time"

Copy link
Member

@kain88-de kain88-de left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Great it uses less code now. Is PMDA now actually faster?

pmda/parallel.py Outdated
np.array([el[4] for el in res]),
np.array([el[5] for el in res]))

# this is crucial if the analysis does not iterate over
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh because this does not return a copy. I would not do the rewind. If people want a copy they should take one. That can be fixed in the density analysis class.

@orbeckst
Copy link
Member

You'll also have to update PMDA docs and setup.py to say that this requires MDA 2.0.0 and therefore ≥ python 3.6.

There's a question if we want to also do a PMDA 1.0 with the old MDA 1.0 and then PMDA 2.0 to be in sync with MDA 2.0.

@yuxuanzhuang
Copy link
Contributor Author

I have a question regrading starting a PR based on this PR...is it possible? (a quick search indicates it's not possible in github)
The reason is that the other PR (introducing dask mixin) is still experimental; I opt to separate that from this one.

@orbeckst
Copy link
Member

orbeckst commented Aug 9, 2020

I think you can do a PR that is relative to this one and that would be merged into this one. Check the settings for base branch when you create a new PR.

@yuxuanzhuang
Copy link
Contributor Author

I think the problem is this branch is not under MDAnalysis but my private one, so that PR will be created under my own repo.

@yuxuanzhuang
Copy link
Contributor Author

yuxuanzhuang commented Aug 19, 2020

I disabled DeprecationWarning in this PR temporarily.

The failed test in rdf_s here seems to be related to the discrepancy between PR MDAnalysis/mdanalysis#2812 and #121

@VOD555
Copy link
Collaborator

VOD555 commented Aug 23, 2020

It's very cool that this PR helps get rid of rebuilding the universe, and make the code much simpler.

Yeah, the test failed as we changed the definition of the option density in MDAnalysis PR, but didn't do it in PMDA.

Copy link
Member

@orbeckst orbeckst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looking pretty good. I could only do a superficial read.

Can we do the 0.4 #116 as the last one compatible with MDA 1.x and then we can merge this PR?

EDIT: Other comments from previous code reviews (docs, density analysis) should still be addressed.

# - CONDA_DEPENDENCIES="mdanalysis mdanalysistests dask joblib pytest-pep8 mock codecov cython hypothesis sphinx"
# - CONDA_MDANALYSIS_DEPENDENCIES="cython mmtf-python six biopython networkx scipy griddataformats gsd hypothesis"
- CONDA_MDANALYSIS_DEPENDENCIES="mmtf-python biopython networkx cython matplotlib scipy griddataformats hypothesis gsd"
- CONDA_DEPENDENCIES="${CONDA_MDANALYSIS_DEPENDENCIES} dask distributed joblib pytest-pep8 mock codecov"
- CONDA_CHANNELS='conda-forge'
- CONDA_CHANNEL_PRIORITY=True
# install development version of MDAnalysis (needed until the test
# files for analysis.rdf are available in release 0.19.0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the comment is outdated

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

use serializable Universe
5 participants