Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement MD workflows #1672

Open
wants to merge 72 commits into
base: main
Choose a base branch
from

Conversation

tomdemeyere
Copy link
Contributor

@tomdemeyere tomdemeyere commented Feb 7, 2024

Summary of Changes

Closes #1577.

Before going further a little consultation:

MD users use workflows all the time, just for the NVE example in this commit there is pretty much a thousand ways you can run that; doing a first NVT to relax to the desired temperature, or a first NPT to relax to the desired density, etc...

  1. For example, should we have one NVT workflow with a parameter "thermostat", or one workflow per thermostat?
  2. If you want things to be in recipe.common, atoms will have to be sent in with a calculator already attached, is that what you want?
  3. From what I understand you want to keep things simple (not something like the code here), users then build their workflows to get what they want.
  4. In any case, for now doing these kinds of workflows is difficult, because it is generally hard to transfer results between ASE calculators, which will always happen between runs, this is critical since MD needs momenta and forces to ensure proper continuation. (This is a problem I would like fixed above since it also causes problems when restarting optimization etc)

Checklist

  • I have read the "Guidelines" section of the contributing guide. Don't lie! 😉
  • My PR is on a custom branch and is not named main.
  • I have used black, isort, and ruff as described in the style guide.
  • I have added relevant, comprehensive unit tests.

@buildbot-princeton
Copy link
Collaborator

Can one of the admins verify this patch?

@Andrew-S-Rosen
Copy link
Member

Andrew-S-Rosen commented Feb 7, 2024

@tomdemeyere: Happy to respond to each question but before I do, I just wanted to check that this is the full commit you wanted to share --- just the args/kwargs, right?

(In short, I really would love to have MD flows and am willing to work together to figure out how to get this done the best way possible)

Copy link

codecov bot commented Feb 7, 2024

Codecov Report

Attention: Patch coverage is 98.18182% with 2 lines in your changes missing coverage. Please review.

Project coverage is 98.99%. Comparing base (a73891d) to head (25cb740).

Files Patch % Lines
src/quacc/runners/ase.py 95.91% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1672      +/-   ##
==========================================
- Coverage   99.01%   98.99%   -0.03%     
==========================================
  Files          81       83       +2     
  Lines        3367     3477     +110     
==========================================
+ Hits         3334     3442     +108     
- Misses         33       35       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@tomdemeyere
Copy link
Contributor Author

@tomdemeyere: Happy to respond to each question but before I do, I just wanted to check that this is the full commit you wanted to share --- just the args/kwargs, right?

(In short, I really would love to have MD flows and am willing to work together to figure out how to get this done the best way possible)

Yes, before doing more I wanted to make sure we are on the same page.

I guess my main question here is: how to make these flows common to all codes? Should they call code specific MD jobs? Or, should a calculator be attached to the Atoms object being sent to the MD flow?

Once I know this we can work on details

@Andrew-S-Rosen
Copy link
Member

Andrew-S-Rosen commented Feb 8, 2024

I guess my main question here is: how to make these flows common to all codes? Should they call code specific MD jobs? Or, should a calculator be attached to the Atoms object being sent to the MD flow?

@tomdemeyere: Good question.

Personally, I think what would likely make the most sense is to make a run_md function in quacc.runners.ase that would take an Atoms object with an attached calculator just like the other methods in there. You likely were aware of that already.

From there, we get to the recipes, and my answer is: "it depends." If it is just a simple @job (i.e. not a @flow), then this is no different than a relaxation job --- one would just be calling run_md instead of run_calc or run_ase_opt in a given code's module (e.g. quacc.recipes.emt). Since there wouldn't be much logic beyond defining default arguments and calling run_md, this is fine and doesn't introduce much duplication. I would treat the individual MD jobs in the same way as a relaxation --- we have a runner, which is called in the various recipes where it is appropriate. It does mean that there would be multiple MD modules, but we would need that anyway because every code will have different parameters (defaults) that are needed to properly run an MD calculation.

If we are talking about a @flow where there are multiple Jobs being stitched together, and we anticipate this pattern (i.e. the workflow DAG) to be calculator-agnostic, then we'll put that in common. This is the philosophy behind the existing common recipes. But we would still need individual Jobs for each code. In general, users aren't expected to call the common workflow directly. Aside from usability aspects, there are some other nuances about (de)serialization needed by workflow engines for why I say this, but I won't bore you with those details here.

To get started, I would suggest making quacc.runners.ase.run_md and then a simple demo for a cheap-to-run calculator, like EMT or LJ. That will likely be illustrative and help guide the design further.

I should also note that we can always refactor later if there is something we can refactor. No need to prematurely optimize.

For example, should we have one NVT workflow with a parameter "thermostat", or one workflow per thermostat?

If the workflow logic is staying the same regardless of thermostat, then I would suggest one job/workflow where thermostat would be a keyword argument with a sensible default.

If you want things to be in recipe.common, atoms will have to be sent in with a calculator already attached, is that what you want?

See comment above. If we are talking about a @job, then that would call run_md and be specific for the code since we need to set some sensible default parameters. If it's a @flow pattern, that can go in common but would not require passing in Atoms objects with attached calculators --- it would require passing in one or more Jobs.

From what I understand you want to keep things simple (not something like the code here), users then build their workflows to get what they want.

Hard for me to tell from the commit you shared here, but I am not opposed to complexity. In fact, we should not rely on the users to build something complex. If the @flow pattern is "trivial", like a relax then an MD run, it is debatable if we should make that its own @flow or not (honestly, it can go either way). But if it's more complex, for sure we should.

In any case, for now doing these kinds of workflows is difficult, because it is generally hard to transfer results between ASE calculators, which will always happen between runs, this is critical since MD needs momenta and forces to ensure proper continuation. (This is a problem I would like fixed above since it also causes problems when restarting optimization etc)

This is a good point. However, I think it is solvable. Let's circle back to this when the time is right.

@tomdemeyere
Copy link
Contributor Author

@Andrew-S-Rosen Thanks for all these details. I will come back to it at some point

@tomdemeyere
Copy link
Contributor Author

@Andrew-S-Rosen I think this is now a good point to have your input

Copy link
Member

@Andrew-S-Rosen Andrew-S-Rosen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for kicking this off, @tomdemeyere! Really appreciate the contribution!

This is looking good. I have some comments that are mostly related to some minor restructuring.

Starting with this isolated EMT example is very helpful for me in understanding the process and figuring out what should be refactored, so I thank you for doing that.

src/quacc/recipes/emt/core.py Outdated Show resolved Hide resolved

if temperature:
MaxwellBoltzmannDistribution(
atoms, temperature_K=temperature, **initial_temperature_params
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the fixcm and fixrot as dedicated keyword arguments, the **kwargs here can be more clearly described as simply being the **maxwell_boltzmann_distribution_kwargs. That is a mouthful (although nobody is calling the name directly). Could try something like **maxwell_boltzmann_kwargs. Either way, the point is that both the docstring and name would be much clearer this way. At a glance, it's not clear what an **initial_temperature_params is.

src/quacc/recipes/emt/core.py Outdated Show resolved Hide resolved
src/quacc/recipes/emt/core.py Outdated Show resolved Hide resolved
src/quacc/runners/ase.py Outdated Show resolved Hide resolved
src/quacc/schemas/_aliases/ase.py Outdated Show resolved Hide resolved
src/quacc/schemas/_aliases/ase.py Outdated Show resolved Hide resolved
src/quacc/schemas/ase.py Outdated Show resolved Hide resolved
src/quacc/schemas/ase.py Outdated Show resolved Hide resolved
src/quacc/recipes/emt/core.py Outdated Show resolved Hide resolved
@tomdemeyere
Copy link
Contributor Author

tomdemeyere commented Feb 15, 2024

Thank you for kicking this off, @tomdemeyere! Really appreciate the contribution!

This is looking good. I have some comments that are mostly related to some minor restructuring.

Starting with this isolated EMT example is very helpful for me in understanding the process and figuring out what should be refactored, so I thank you for doing that.

I see that your comment is different than the one I received by email from git. But yes, originally my plan was to have these "_base" functions. Either per calculator or in common like you previously said.

After discussing that along with other details I was then planning to introduce recipes for the other ensembles (NVT, NPT).

Didn't see your comments originally (github mobile) will work on it at some point

@Andrew-S-Rosen
Copy link
Member

I see that your comment is different than the one I received by email from git. But yes, originally my plan was to have these "_base" functions. Either per calculator or in common like you previously said.

Apologies about the edits! I was trying to sort out where to place them and was going back-and-forth so decided to just save it for a future discussion to avoid confusion. But you got the right idea anyway.

After discussing that along with other details I was then planning to introduce recipes for the other ensembles (NVT, NPT).

That would be great!

Didn't see your comments originally (github mobile) will work on it at some point

Thank you!

@tomdemeyere
Copy link
Contributor Author

tomdemeyere commented Feb 15, 2024

Few comments:

  • Without run_kwargs there is no way to pass things to dyn.run()
  • There is still a check_convergence, although it does not make much sense for MD. By definition you reached the max number of steps, but there is a case where it makes sense:

We are currently working on a error-free quacc as part of another project. We make sure that quacc always reaches the point of writing to the database: it writes the errors that occurred, along with the available results, if any (the errors might happen at some point during the MD, previous results should be returned, actually in some cases, errors might even be part of your journey (although this might not be what Quacc was originally made for)). Currently if one is running a bunch of simulations, if errors occur, there will be a bunch of directories with no easy way to figure things out.

Because you said you might be interested in error handling I am leaving that there.

  • I changed how temperature/time etc... is written to results, feel free to tell me what you think

@Andrew-S-Rosen
Copy link
Member

Andrew-S-Rosen commented Feb 15, 2024

Without run_kwargs there is no way to pass things to dyn.run()

The fmax and steps are already passed in quacc.runners.ase.run_opt, and .run() doesn't take any other keyword arguments, so we don't need it.

There is still a check_convergence, although it does not make much sense for MD. By definition you reached the max number of steps, but there is a case where it makes sense:

I agree that a check_convergence doesn't make much sense for this job type. We should remove that logic entirely.

We are currently working on a error-free quacc as part of another project. We make sure that quacc always reaches the point of writing to the database: it writes the errors that occurred, along with the available results,

One could catch the typical RuntimeError errors when ASE is called and try to give the user some flexibility via the global settings about how to handle things with the CHECK_CONVERGENCE and/or some other global parameter. But of course, feel free to continue doing what you're doing!

Currently if one is running a bunch of simulations, if errors occur, there will be a bunch of directories with no easy way to figure things out.

Indeed. This is a limitation of Parsl/Dask, which is really best thought of a task manager. Other true workflow tools like Prefect or Covalent or FireWorks have a separate database for the task metadata and would highlight when/where the job fails. But since Parsl does not have a task database of its own, there is no mechanism to easily store that info. The recommended approach would be to launch your calculations in a for loop and log any errors from the Parsl AppFuture. But of course, your mechanism is fine as well.

I changed how temperature/time etc... is written to results, feel free to tell me what you think

I like that a lot more!

@tomdemeyere
Copy link
Contributor Author

This should be close to go, there is still something essential before being able to run MD correctly: restarts. They are still difficult to do (impossible in some cases), but that's an ASE problem. I suspect this will take ages to fix; for now I deleted everything related to restart while we wait to see what solutions will get accepted upstream (if any).

I looked at automate2, there are indeed some interesting things, like the possibility to specify a temperature/pressure gradient. I will charge back with my idea of "patches" to accomplish that, this would look like:

steps = np.arange(1000, 10000, 1)
temperatures = function_of_steps(steps)

my_temp_gradient = TemperatureGradient(steps, temperatures)
my_file_copier = FileCopier(interval = 10)
my_press_gradient = PressureGradient(...)

md_job(... patches = [my_temp_gradient, my_file_copier, my_pressure_gradient])

This is achievable by tuning the generator as you did previously (although it might get packed), for some reason I love this "patching" concept 😅. People could fine-tune their MD, picking what triggers when, Quacc could even have a section in the documentation explaining how to do complex things as we already discussed. Anyway, daydreaming here 🙃

@Andrew-S-Rosen
Copy link
Member

Thanks for your persistence with this! I will try to look at this within the next few days (please ping me if I forget). I'm definitely interested to chat more about the patches idea here; perhaps within the context of MD, it could be an interesting proof-of-concept to think about. I'll revisit this when I have the time to give this a proper review!

@Andrew-S-Rosen Andrew-S-Rosen changed the title Draft: MD workflows Implement MD workflows Apr 10, 2024
@tomdemeyere
Copy link
Contributor Author

Some news on this? Just to know if it is waiting on https://gitlab.com/ase/ase/-/merge_requests/3310

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

Add some general, ASE-based MD compute jobs
3 participants