Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when using pytorch lightning #607

Open
kryczko opened this issue Dec 14, 2021 · 6 comments
Open

Error when using pytorch lightning #607

kryczko opened this issue Dec 14, 2021 · 6 comments

Comments

@kryczko
Copy link

kryczko commented Dec 14, 2021

I am trying to define an ANI model along with the AEVComputer (with cuda enabled) module within a Pytorch Lightning Module, but I am getting the following error:

RuntimeError: coordinates, species, and aev_params should be on the same device

I have seen that some of the parameters are registered as buffers, but some are not. Please let me know what you think.

Kev

@not-matt
Copy link

not-matt commented Jan 26, 2022

Similar issue, probably related?

num_repeats = torch.where(pbc, num_repeats, num_repeats.new_zeros(()))
                  ~~~~~~~~~~~ <--- HERE
    r1 = torch.arange(1, num_repeats[0].item() + 1, device=cell.device)
    r2 = torch.arange(1, num_repeats[1].item() + 1, device=cell.device)
RuntimeError: Expected condition, x and y to be on the same device, but condition is on cpu and x and y are on cuda:0 and cuda:0 respectively

Clean conda environment on Ubuntu, installed packages:

openmm                    7.7.0            py39h792354b_0    conda-forge
openmm-torch              0.5             cuda112py39hb628e3f_0    conda-forge
openmmml                  1.0                      pypi_0    pypi
pytorch                   1.10.0          cuda112py39h3ad47f5_1    conda-forge
pytorch-gpu               1.10.0          cuda112py39h0bbbad9_1    conda-forge
torchani                  2.2.3.dev2+g3dfbaf4          pypi_0    pypi

@yueyericardo
Copy link
Contributor

Hi, thanks for the report! Could you provide a minimal example to reproduce this?

@not-matt
Copy link

not-matt commented Jan 27, 2022

It might be more suitable for a separate issue since I'm using an openmm stack.

See the full output of the code here:

https://github.com/meyresearch/ANI-Peptides/blob/main/demos/ANI_minimal.ipynb

Setup

  1. Install openmm and pytorch
conda install -c conda-forge openmm openmm-torch pytorch cudatoolkit=11.5
  1. In bashrc set CUDA_HOME to /usr/local/cuda and add /usr/local/cuda to PATH
  2. Install torchani with cuaev:
git clone https://github.com/aiqm/torchani
cd torchani
python setup.py install --cuaev
  1. Install openmm-ml
git clone https://github.com/openmm/openmm-ml
pip install openmm-ml/.
  1. Fetch sample peptide
wget -q https://github.com/meyresearch/ANI-Peptides/raw/main/pdbs/aaa.pdb

Code

# Import libraries
from openmm.app import *
from openmm import *
from openmm.unit import *
from openmmml import MLPotential
import sys

# Setup
pdb = PDBFile("aaa.pdb")
potential = MLPotential('ani2x')
system = potential.createSystem(pdb.topology)
integrator = LangevinIntegrator(
    300 * kelvin, 
    1 / picosecond, 
    1.0 * femtosecond,
)
simulation = Simulation(
    pdb.topology,
    system,
    integrator,
    Platform.getPlatformByName("CUDA"),
)
simulation.context.setPositions(pdb.positions)

# Minimize and run
simulation.minimizeEnergy()
simulation.step(1000)
print("done")

@yueyericardo
Copy link
Contributor

yueyericardo commented Jan 27, 2022

Hi, the error came from the openmm-ml wrapper. A temp fixed version work ONLY for GPU could be found at: yueyericardo/openmm-ml@1d1d3f2#diff-911692ca194bf903c77d038662969ad3277dcf2fa8b3b3048d95a5aa3af59de1

It is using cuaev use_cuda_extension for aev calculation, but it currently does not support pbc, so if you want to use cuaev, you have to change your script slightly to

pdb = PDBFile("aaa.pdb")
# add this line
pdb.topology.setPeriodicBoxVectors(None)
potential = MLPotential('ani2x')

Our internal version has some other updates to make it faster, but it currently is not open source yet.
In the meanwhile, openmm team is building NNPOPS for ani and schnet, you could track the progress here Add example of using NNPOps with openmm-torch?!

Edit:
BTW, our conda-forge package includes the latest public build with cuaev:
you could install it directly by

conda install -c conda-forge torchani

@not-matt
Copy link

Fantastic! Thank you for looking into this and getting back to me so quickly.

@kryczko
Copy link
Author

kryczko commented Mar 2, 2022

I am still getting the same issue I showed above while using an ANI model within pytorch lightning. Any ideas how to fix it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants