Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't Allocate Memory Issue #604

Open
bartdemooij opened this issue Dec 7, 2021 · 6 comments
Open

Can't Allocate Memory Issue #604

bartdemooij opened this issue Dec 7, 2021 · 6 comments

Comments

@bartdemooij
Copy link

bartdemooij commented Dec 7, 2021

Dear,

What would be the best way to perform high-performance molecular dynamics with ANI on a cluster? We run torchANI in combination with ASE. Currently, when running a box of 1000 ethanol molecules gives the following error when performing the BFGS optimisation:

warnings.warn( Traceback (most recent call last): File "/home/bmooij/ANI_quality_check/MD_ethanol_quality_check_ANI.py", line 49, in <module> opt.run(fmax=1.0) File "/home/bmooij/.conda/envs/py9/lib/python3.9/site-packages/ase/optimize/optimize.py", line 269, in run return Dynamics.run(self) File "/home/bmooij/.conda/envs/py9/lib/python3.9/site-packages/ase/optimize/optimize.py", line 156, in run for converged in Dynamics.irun(self): File "/home/bmooij/.conda/envs/py9/lib/python3.9/site-packages/ase/optimize/optimize.py", line 122, in irun self.atoms.get_forces() File "/home/bmooij/.conda/envs/py9/lib/python3.9/site-packages/ase/atoms.py", line 788, in get_forces forces = self._calc.get_forces(self) File "/home/bmooij/.conda/envs/py9/lib/python3.9/site-packages/ase/calculators/abc.py", line 23, in get_forces return self.get_property('forces', atoms) File "/home/bmooij/.conda/envs/py9/lib/python3.9/site-packages/ase/calculators/calculator.py", line 737, in get_property self.calculate(atoms, [name], system_changes) File "/home/bmooij/.conda/envs/py9/lib/python3.9/site-packages/torchani/ase.py", line 82, in calculate energy = self.model((species, coordinates), cell=cell, pbc=pbc).energies File "/home/bmooij/.conda/envs/py9/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "/home/bmooij/.conda/envs/py9/lib/python3.9/site-packages/torchani/models.py", line 106, in forward species_aevs = self.aev_computer(species_coordinates, cell=cell, pbc=pbc) File "/home/bmooij/.conda/envs/py9/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "/home/bmooij/.conda/envs/py9/lib/python3.9/site-packages/torchani/aev.py", line 533, in forward aev = compute_aev(species, coordinates, self.triu_index, self.constants(), self.sizes, (cell, shifts)) File "/home/bmooij/.conda/envs/py9/lib/python3.9/site-packages/torchani/aev.py", line 288, in compute_aev atom_index12, shifts = neighbor_pairs(species == -1, coordinates_, cell, shifts, Rcr) File "/home/bmooij/.conda/envs/py9/lib/python3.9/site-packages/torchani/aev.py", line 171, in neighbor_pairs shifts_all = torch.cat([shifts_center, shifts_outside]) RuntimeError: [enforce fail at CPUAllocator.cpp:71] . DefaultCPUAllocator: can't allocate memory: you tried to allocate 26243892000 bytes. Error code 12 (Cannot allocate memory)

It works fine if the box has few molecules in it (i.e. 125 ethanol), but starts to give this error for larger systems (i.e. 750 or 1000 ethanol). A system of 500 ethanol also seems to work, but is terribly slow.

Some reproducible code (where file 'ethanol_1000.pdb' is a box of 1000 ethanol made with packmol):

from ase import Atoms
from ase.optimize import BFGS
import torch
import torchani
from Bio import PDB

#Load box of ethanol
parser = PDB.PDBParser()
io = PDB.PDBIO()
struct = parser.get_structure('ethanol_1000', 'ethanol_1000.pdb')
pos = []
for model in struct:
    for chain in model:
        for residue in chain:
            for atom in residue:
                x,y,z = atom.get_coord()
                pos.append([x,y,z])
ethanol = Atoms(1000*"COH3CH3", positions=pos)
ethanol.set_cell((46, 46, 46))
ethanol.set_pbc(True)

#Setup calculator
calculator = torchani.models.ANI1ccx().ase()
ethanol.set_calculator(calculator)

#Minimize the structure
print("Begin minimizing...")
opt = BFGS(ethanol)
opt.run(fmax=1.0)
print()

Best regards,

Bart

@isayev
Copy link
Member

isayev commented Dec 7, 2021

Dear @bartdemooij , thanks for reporting. It seems your system is too large to fit into your GPU memory. At this point, we implemented only direct algorithms that would be memory-bound depending on matrix sizes. Therefore you have two choices: i) reduce system size, ii) get a GPU with more memory.

We are working to make simulations plugins into LAMMPS and AMBER, with domain decomposition you would be able to run much larger systems in a distributed fashion.

@bartdemooij
Copy link
Author

Dear @isayev, thanks for the swift reply. So we ran this system on CPU with 64gb ram. Would you say it is to be expected that this all get used up by a 1000 ethanol molecules (9000 atoms)? Perhaps this is a trivial question, but in what way does memory usage scale with system size?
We are now looking into performing simulations with torchani using openmm-ml, do you think memory usage is more friendly here?

@isayev
Copy link
Member

isayev commented Dec 10, 2021

Since your code invokes a CUDA memory error, I would assume you need to check your run script and check its correctness. It seems to be still running on a GPU. Typical suspects are CUDA_VISIBLE_DEVICES variable and torch.device definition in your code.

@zubatyuk
Copy link
Contributor

zubatyuk commented Dec 12, 2021 via email

@bartdemooij
Copy link
Author

Thank you, the memory error is clear.

@zubatyuk The only thing I don't understand is how you get 18 cells for PBC in three dimensions as I thought it would be 26. Am I right that you get 18 by
3x3x3 - 1(original) - 8(the corner boxes) = 18.
If this is the case, why are you allowed to omit the corner boxes? If not, how do you get 18 images?

@zubatyuk
Copy link
Contributor

zubatyuk commented Dec 17, 2021 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants