Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Roadmap and ideas for Molly.jl development #2

Open
jgreener64 opened this issue Oct 29, 2018 · 42 comments
Open

Roadmap and ideas for Molly.jl development #2

jgreener64 opened this issue Oct 29, 2018 · 42 comments
Labels

Comments

@jgreener64
Copy link
Collaborator

jgreener64 commented Oct 29, 2018

This issue is a roadmap for Molly.jl development. Feel free to discuss things here or submit a PR. Bear in mind that significant refactoring will probably occur as the package develops.

Low hanging fruit

Want to get involved? These issues might be the place to start:

  • Speedups to the code. Both on the level of individual functions and the overall algorithm. [Getting there]
  • Look over the code to find mistakes in implementation. There will be some because...
  • Test coverage is minimal. We need tests for the 'core' functions that are unlikely to change their return values, e.g. the force functions. [Done]

Bigger things to tackle

  • Energy minimisation by gradient descent [Done].
  • Add other forcefields. [Partly done]
  • Add other thermostats/ensembles. [Partly done]
  • Get it working on the GPU. [Done]
  • Line by line checking of how a mature MD package works to see where we have missed things and to get an idea of how to organise the codebase in future. [Partly done]
  • Neighbour list in a cubic lattice, and use of group lists. [Partly done]
  • Setup of MD simulations, e.g. adding hydrogen, solvent. [Partly done]
  • Trajectory/topology file format readers and writers. The current topology reader is not very robust. [Partly done]

Projects that will require more planning or will cause significant changes

  • Abstracting the forcefield from the integration to allow different forcefields and integrators to be combined. [Done]
  • Allow definition of custom forcefields. [Done]
  • Parallelisation. [Done]

Okay, so this is less of a roadmap and more of a list of ideas to work on. But most or all of the above would be required to make this a package suitable for general use, so in a sense this is the roadmap.

@Zarasthustra
Copy link

To do: Add module for calculating Free Energy via Alchemical change ;)

@jgreener64
Copy link
Collaborator Author

"Abstracting the forcefield from the integration to allow different forcefields and integrators to be combined" and "Allow definition of custom forcefields" are now done.

@jarvist
Copy link

jarvist commented Aug 14, 2019

It may be interesting to see whether you can either directly use https://github.com/libAtoms/JuLIP.jl as a library (also https://github.com/libAtoms/NeighbourLists.jl), or at least be inspired by some of the code therein.

@jgreener64
Copy link
Collaborator Author

Thanks Jarvist, I have had a brief look at the code there but will dive a bit deeper.

@Zarasthustra
Copy link

Zarasthustra commented Aug 14, 2019 via email

@jgreener64
Copy link
Collaborator Author

jgreener64 commented Aug 14, 2019

Would you be interested in it?

Yes, definitely. If you could open a PR when you get a chance that would be great. I understand how it's hard to find time though, I only seem to get a few hours a month on this package these days.

@jgreener64
Copy link
Collaborator Author

There is a Julia interface to Chemfiles - building Molly on top of those types would allow easy reading and writing.

@hypnopump
Copy link

what are your thoughts on GPU acceleration or at least multi-core in CPU? In python it's really easy with joblib, cupy, jax... Don't know about the julia set of tools for that but might be worth an exploration

@jgreener64
Copy link
Collaborator Author

GPU acceleration is feasible using CuArrays.jl - it would probably involve refactoring into a style that uses broadcasting rather than for loops, but that's no bad thing anyway.

Multi-core CPU support has improved greatly in the recent Julia v1.3 release so this is another avenue to explore, though I personally don't have much experience there.

Another point to add here is the ability to differentiate through molecular simulations using Zygote.jl, which is an exciting use case and something that Julia is well-suited for.

@cortner
Copy link
Member

cortner commented Mar 28, 2020

I was unfortunately unaware of this package until now. (Or maybe I saw it and have forgotten...) Anyhow, I wonder whether there is any chance to combine Molly and JuLIP, or at least make them compatible.

With JuLIP I deliberately stayed quite close to ASE, to make sure that I can have a relatively thin interface. But this has led to some design decision that I‘ve since regretted. So in principle at least I‘m quite open to restructuring JuLIP to make it work with Molly.

Another possibility would be to make JuLIP really library for potentials, but let Molly focus on managing configurations, MD, geometry optimisation, etc ...

But maybe it is easiest to leave them separate and not worry too much about this. I‘m curious though what people think.

@cortner
Copy link
Member

cortner commented Mar 28, 2020

Regarding the neighbourlist - this is surely something that we can share. I believe my list is fairly fast, it is based on matscipy. The main downside for Molly is that it uses the same cutoff for all atoms. It would need to be generalised so that different atoms can have a different cutoff.

I‘ve also started playing around with implementing various iterators over the neighbourlist to have a more functional style in implementing potentials. I‘m less sure how well that worked.

@jgreener64
Copy link
Collaborator Author

Thanks for commenting @cortner. Once I get some time for this package, diving into JuLIP is definitely on the list of priorities to see what can be combined. Certainly with things you have optimised, like the neighbour list, there is room for Molly to use or build on that.

@cortner
Copy link
Member

cortner commented Apr 2, 2020

or extract shared functionality so that JuLIP and Molly become effectively frontends...

@AlfTetzlaff
Copy link

Hi! I observed some things in your code which could be optimized or written the "julia way".

  1. Use StaticArrays to store positions and velocities, instead of your self-defined mutable structs. This might speed things up quite a lot due to allowing SIMD.
  2. Implement a dist(a,b,box) function specialized for static arrays which takes the boundary conditions into account.
  3. r2 = dx*dx+... can then be written as r^2. The pow function has specializations for 1,2 and 3, so this produces no overhead compared to C.
  4. If you're using Atom/Juno you can use the Juno.@profiler to find bottlenecks
  5. There is a discourse thread about "struct of arrays vs array of structs". I dont remember who won, but maybe it's instructive to read.
  6. @inline the update_accelerations! functions
  7. Use structs with parametric types, so that you can easily switch to e.g Float32 for GPUs. (SVectors are parametric.)
  8. In theory, just an idea, one could leave the time-integration to another package like DifferentialEquations. They provide a whole lot of explicit or implicit solvers (but velocity verlet is missing I think) with adaptable or fixed time steps.

Kind regards, Max

@cortner
Copy link
Member

cortner commented May 7, 2020

Some of this I do in JuLIP, though not yet as well as I’d like, it needs more cleanup to get proper type-agnostic code eg. - Another reason maybe to join forces at some pointZ

@jgreener64
Copy link
Collaborator Author

Thanks for the feedback @AlfTetzlaff, I would definitely like to get some of this in. I have tried a few of the suggestions previously, for example the positions and velocities are currently mutable static vectors (sub-types of FieldVector) which should hopefully be giving some of the speedups you talk about.

The types should be made parametric and it would help with GPU compatibility, though at the minute the code is not written in a GPU-friendly way so that would need to change first. I would like to use the DifferentialEquations integrators in the long-term since they make so many available, but for the proof-of-concept here I just implemented a simple velocity Verlet.

I have opened a Discourse post with a specific MWE based on the current non-bonded force code to solicit feedback - please feel free to comment more there.

@Zarasthustra
Copy link

Zarasthustra commented May 11, 2020 via email

@jgreener64
Copy link
Collaborator Author

Great, thanks for the input.

@jgreener64
Copy link
Collaborator Author

Here's a sneak preview of some non-bonded force optimisations I've been working on: faster single-CPU code, a multithreaded CPU implementation and a GPU implementation.

Hoping to refactor the package to allow these from the same API within a couple of weeks.

image

@AlfTetzlaff
Copy link

Could it be that you confused us with ms in the above graph by accident?

@jgreener64
Copy link
Collaborator Author

It's ms but for a 1000-step simulation, so it would be us per step. The blue dot for 100 atoms is the simulation benchmark time from the Discourse post.

@jgreener64
Copy link
Collaborator Author

Some changes I added this weekend:

  • SVectors for coordinates and velocities, allowing arbitrary simulation dimensions.
  • Multithreading of non-bonded force calculations.

@AlfTetzlaff
Copy link

Cool! Have you noticed any single threaded speedup due to SVectors?

@jgreener64
Copy link
Collaborator Author

Yes, the Discourse thread benchmark went from 45 ms to 36 ms using SVectors, rather than the mutable FieldVectors used before.

Making Simulation immutable also sped things up.

The cell neighbour list would be a big algorithmic speedup (as discussed above).

@jgreener64
Copy link
Collaborator Author

jgreener64 commented May 18, 2020

My rough feature list for a v0.1 release is:

  • Cell neighbour list [Done].
  • Type parameterisation (to allow Float32). [Done]
  • A few more thermostats [Done].
  • A few more simulators, e.g. minimisation/Langevin dynamics. [Done]
  • Some unit tests. [Done]
  • More thorough docs and docstrings. [Done]

@Chenghao-Wu
Copy link

May I have your thoughts on the parallelization via domain decomposition in Julia?

@jgreener64
Copy link
Collaborator Author

It's not really my area of expertise, but I guess it can be done and it looks like it could be a useful algorithm. I don't know how well that type of simulation would work in the Molly framework currently though.

@cortner
Copy link
Member

cortner commented Jan 13, 2022

It would be an interesting proposition and a small step towards replacing lammps

@jgreener64
Copy link
Collaborator Author

jgreener64 commented Apr 16, 2022

I thought I'd just update this issue with the state of the project.

In the last few months Molly has been under heavy development and has seen many new features including Unitful.jl support, more interactions, an OpenMM XML force field reader, AtomsBase.jl integration, differentiable simulation with Zygote.jl on CPU and GPU, a new general interaction type, implicit solvation models, more simulators, energy minimisation, more analysis functions and better test coverage. See the release notes for more details.

Future development directions include:

  • Optimisation of the GPU code path to make performance competitive to related software. Differentiable GPU kernels #60 discusses one way to do this, but any approach will probably use GPU kernels. The challenge is to make the kernels work with gradients and to maintain flexibility while allowing the specialisations required for high performance.
  • More simulators such as replica exchange MD and Langevin variants [done].
  • Support for constrained bond lengths and angles, which are used in most macromolecular simulations these days.
  • More options and testing for simulation setup including residue template matching.
  • Pressure coupling methods for NPT simulation [done].
  • Support for non-cubic boundary conditions and no periodic boundaries in given dimensions [done].
  • Particle mesh Ewald for long-range summation of electrostatic interactions.
  • Parallel domain decomposition, or at least better use of CPU parallelism.

A few groups are interested in using Molly for a diverse range of simulations, so the intention is that it will stay general enough to be widely useful whilst also reaching high performance. My personal research uses Molly for differentiable simulations of proteins, so this will continue to dictate the features I work on.

@uliano
Copy link

uliano commented Jul 18, 2022

May be of interest a native Julia (as in no-external-dependencies) .xtc reader and writer?
It is twice as fast than c implementation used by mdtraj.

@jgreener64
Copy link
Collaborator Author

That would be great. Currently we are using Chemfiles for that kind of thing but a fast native Julia approach would be welcome. It could even be its own package?

@uliano
Copy link

uliano commented Jul 18, 2022

At the moment it is just a file with a bunch of functions.

It started as a proof of concept, learning Julia, while showing its potential to both speed and low level pesky tasks with the far future eventual goal to assemble something resembling the bad copy of MDAnalysis or MDtraj.

I'm, at the moment, too ignorant of Julia packaging system, and of git and github machinery, to be able to provide it. However with some help I'd be glad to contribute.

My original goal was to be able to natively read and write trajectory in the formats I use, namely gromacs .xtc, charmm namd .dcd, amber .nc and desmond .dtr. After the unexpected (at least speedwise) success with .xtc I got quickly stranded by netcdf3 underlying .nc and it all went on temporary hold.

The the functions are pretty solid as I tested low level ones against the c implementation while I was able to read and write real .xtc file with no diff from original and to compare the read coordinates with MDtraj's. It would be a pity not to use them, also given the effort done in understanding and optimizing the original c code (c implementation slowness is obviously due to implementation itself and not to the language)

I'm writing too much here is the thing

https://github.com/uliano/MDTools.jl

I'm willing to help in my spare time but keep in mind that at the moment I'm not proficient enough in both Julia and github packages machinery to be autonomous.

@jarvist
Copy link

jarvist commented Jul 18, 2022

I think it looks pretty good! It wouldn't be too much effort to rename your repo 'xtcreader.jl' or similar & then add the boilerplate to make it an independent package.

@uliano
Copy link

uliano commented Jul 18, 2022

It actually writes, also!

@uliano
Copy link

uliano commented Jul 19, 2022

So I did my homework and now we have a Julia package with xtc read and write functionality

https://github.com/uliano/MDTrajectoryFiles.jl

I added minimal docstrings on the exported functions and a very basic test: reads and writes then compares. I will eventually implement other file formats so I choose a more generic name than XTC.

@uliano
Copy link

uliano commented Jul 22, 2022

This is the last comment for a while, I promise!
I just wanted to say that MDTrajectoryFiles.jl has been accepted in the Julia registry.

Foreseeable changes in the next few weeks:

  • native Julia read & write of CHARMM/NAMD .dcd files
  • possibility to load a subset of atoms from each frame.

@CoryKornowicz
Copy link

Using Metal.jl, I was able to swap all the CuArrays to MtlArrays for better performance on Apple M-series chips. Might be something to look into as a 'low hanging fruit.'

@jgreener64
Copy link
Collaborator Author

That's interesting, thanks. Once the kernels branch is merged (with >10x speedups on GPU) it would be worth allowing and documenting this.

See #99 for work towards generic array types.

rkp2k00 added a commit to rkp2k00/Molly.jl that referenced this issue Mar 25, 2023
@exAClior
Copy link
Contributor

Hi, I notice that Molly uses Zygote for AD. Not sure if it's causing issue performance-wise. If so, a question may be: Will there be any plans to switching to alternatives like Enzyme.jl? Are there any significant road-blocks that will require an extensive work on this subject?

@jgreener64
Copy link
Collaborator Author

Yes, see the kernels branch for a lot of progress on this. It uses Enzyme.jl for AD in the slow parts and writes GPU kernels, increasing performance and memory utilisation dramatically.

Hoping to merge and release soon, just waiting on some docs cleanup and a new Enzyme.jl release.

@jgreener64
Copy link
Collaborator Author

jgreener64 commented Apr 19, 2023

v0.15.0 is out with many improvements including GPU speed and memory usage.

Priorities for future development are:

  • High GPU performance.
  • Particle mesh Ewald summation.
  • Constraints.
  • Pressure coupling, this will require the concept of molecules to be added as molecules need to be moved together [done].
  • More robust and flexible setup from files, essential for biomolecular systems.

Hopefully there will be a GSoC student(s) working on Molly this summer too.

@jgreener64
Copy link
Collaborator Author

Pressure coupling added via the Monte Carlo barostat.

@JaydevSR will be looking at GPU performance this summer as part of JSoC.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

10 participants