Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Default settings of Structure.relax() fails to synchronize tensor locations (CPU/GPU) on GPU-enabled environments #3715

Open
jsukpark opened this issue Mar 26, 2024 · 1 comment
Labels

Comments

@jsukpark
Copy link

Python version

Python 3.9.18

Pymatgen version

2023.12.18

Operating system version

Ubuntu 22.04.4 LTS

Current behavior

Running the relax() method of pymatgen.core.Structure object with default settings on a GPU-enabled environment raises RuntimeError, saying the tensors involved in computation are not on the same device.

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/workspace/miniconda3/envs/myenv/lib/python3.9/site-packages/pymatgen/core/structure.py", line 4323, in relax
    return self._relax(
  File "/workspace/miniconda3/envs/myenv/lib/python3.9/site-packages/pymatgen/core/structure.py", line 776, in _relax
    dyn = opt_class(ecf, **opt_kwargs)
  File "/workspace/miniconda3/envs/myenv/lib/python3.9/site-packages/ase/optimize/fire.py", line 54, in __init__
    Optimizer.__init__(self, atoms, restart, logfile, trajectory,
  File "/workspace/miniconda3/envs/myenv/lib/python3.9/site-packages/ase/optimize/optimize.py", line 234, in __init__
    self.set_force_consistent()
  File "/workspace/miniconda3/envs/myenv/lib/python3.9/site-packages/ase/optimize/optimize.py", line 325, in set_force_consistent
    self.atoms.get_potential_energy(force_consistent=True)
  File "/workspace/miniconda3/envs/myenv/lib/python3.9/site-packages/ase/constraints.py", line 2420, in get_potential_energy
    atoms_energy = self.atoms.get_potential_energy(
  File "/workspace/miniconda3/envs/myenv/lib/python3.9/site-packages/ase/atoms.py", line 728, in get_potential_energy
    energy = self._calc.get_potential_energy(
  File "/workspace/miniconda3/envs/myenv/lib/python3.9/site-packages/ase/calculators/calculator.py", line 709, in get_potential_energy
    energy = self.get_property('energy', atoms)
  File "/workspace/miniconda3/envs/myenv/lib/python3.9/site-packages/ase/calculators/calculator.py", line 737, in get_property
    self.calculate(atoms, [name], system_changes)
  File "/workspace/miniconda3/envs/myenv/lib/python3.9/site-packages/matgl/ext/ase.py", line 177, in calculate
    energies, forces, stresses, hessians = self.potential(graph, lattice, state_attr_default)
  File "/workspace/miniconda3/envs/myenv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/workspace/miniconda3/envs/myenv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/workspace/miniconda3/envs/myenv/lib/python3.9/site-packages/matgl/apps/pes.py", line 120, in forward
    property_offset = torch.squeeze(self.element_refs(g))
  File "/workspace/miniconda3/envs/myenv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/workspace/miniconda3/envs/myenv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/workspace/miniconda3/envs/myenv/lib/python3.9/site-packages/matgl/layers/_atom_ref.py", line 78, in forward
    offset = property_offset_batched * one_hot
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

Expected Behavior

The structural relaxation would run without error, with all intermediate tensors copied to/from GPU as needed to ensure all operations occur within the same device.

Minimal example

import numpy as np
from pymatgen.core import Structure

struct = Structure(  # diamond
    np.array([[0, 1, 1], [1, 0, 1], [1, 1, 0]]) * 1.786855,
    ['C'] * 2,
    np.array([[.25, .25, .25], [0.0, 0.0, 0.0]]),
)
struct.relax()  # uses default calculator 'm3gnet'


### Relevant files to reproduce this bug

_No response_
@jsukpark jsukpark added the bug label Mar 26, 2024
@jsukpark
Copy link
Author

ADD: The matgl package installed is of version 1.0.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant