Default settings of `Structure.relax()` fails to synchronize tensor locations (CPU/GPU) on GPU-enabled environments #3715

jsukpark · 2024-03-26T23:00:06Z

Python version

Python 3.9.18

Pymatgen version

2023.12.18

Operating system version

Ubuntu 22.04.4 LTS

Current behavior

Running the relax() method of pymatgen.core.Structure object with default settings on a GPU-enabled environment raises RuntimeError, saying the tensors involved in computation are not on the same device.

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/workspace/miniconda3/envs/myenv/lib/python3.9/site-packages/pymatgen/core/structure.py", line 4323, in relax
    return self._relax(
  File "/workspace/miniconda3/envs/myenv/lib/python3.9/site-packages/pymatgen/core/structure.py", line 776, in _relax
    dyn = opt_class(ecf, **opt_kwargs)
  File "/workspace/miniconda3/envs/myenv/lib/python3.9/site-packages/ase/optimize/fire.py", line 54, in __init__
    Optimizer.__init__(self, atoms, restart, logfile, trajectory,
  File "/workspace/miniconda3/envs/myenv/lib/python3.9/site-packages/ase/optimize/optimize.py", line 234, in __init__
    self.set_force_consistent()
  File "/workspace/miniconda3/envs/myenv/lib/python3.9/site-packages/ase/optimize/optimize.py", line 325, in set_force_consistent
    self.atoms.get_potential_energy(force_consistent=True)
  File "/workspace/miniconda3/envs/myenv/lib/python3.9/site-packages/ase/constraints.py", line 2420, in get_potential_energy
    atoms_energy = self.atoms.get_potential_energy(
  File "/workspace/miniconda3/envs/myenv/lib/python3.9/site-packages/ase/atoms.py", line 728, in get_potential_energy
    energy = self._calc.get_potential_energy(
  File "/workspace/miniconda3/envs/myenv/lib/python3.9/site-packages/ase/calculators/calculator.py", line 709, in get_potential_energy
    energy = self.get_property('energy', atoms)
  File "/workspace/miniconda3/envs/myenv/lib/python3.9/site-packages/ase/calculators/calculator.py", line 737, in get_property
    self.calculate(atoms, [name], system_changes)
  File "/workspace/miniconda3/envs/myenv/lib/python3.9/site-packages/matgl/ext/ase.py", line 177, in calculate
    energies, forces, stresses, hessians = self.potential(graph, lattice, state_attr_default)
  File "/workspace/miniconda3/envs/myenv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/workspace/miniconda3/envs/myenv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/workspace/miniconda3/envs/myenv/lib/python3.9/site-packages/matgl/apps/pes.py", line 120, in forward
    property_offset = torch.squeeze(self.element_refs(g))
  File "/workspace/miniconda3/envs/myenv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/workspace/miniconda3/envs/myenv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/workspace/miniconda3/envs/myenv/lib/python3.9/site-packages/matgl/layers/_atom_ref.py", line 78, in forward
    offset = property_offset_batched * one_hot
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

Expected Behavior

The structural relaxation would run without error, with all intermediate tensors copied to/from GPU as needed to ensure all operations occur within the same device.

Minimal example

import numpy as np
from pymatgen.core import Structure

struct = Structure(  # diamond
    np.array([[0, 1, 1], [1, 0, 1], [1, 1, 0]]) * 1.786855,
    ['C'] * 2,
    np.array([[.25, .25, .25], [0.0, 0.0, 0.0]]),
)
struct.relax()  # uses default calculator 'm3gnet'



### Relevant files to reproduce this bug

_No response_

The text was updated successfully, but these errors were encountered:

jsukpark · 2024-03-26T23:02:09Z

ADD: The matgl package installed is of version 1.0.0.

jsukpark added the bug label Mar 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Default settings of `Structure.relax()` fails to synchronize tensor locations (CPU/GPU) on GPU-enabled environments #3715

Default settings of `Structure.relax()` fails to synchronize tensor locations (CPU/GPU) on GPU-enabled environments #3715

jsukpark commented Mar 26, 2024

jsukpark commented Mar 26, 2024

Default settings of Structure.relax() fails to synchronize tensor locations (CPU/GPU) on GPU-enabled environments #3715

Default settings of Structure.relax() fails to synchronize tensor locations (CPU/GPU) on GPU-enabled environments #3715

Comments

jsukpark commented Mar 26, 2024

Python version

Pymatgen version

Operating system version

Current behavior

Expected Behavior

Minimal example

jsukpark commented Mar 26, 2024

Default settings of `Structure.relax()` fails to synchronize tensor locations (CPU/GPU) on GPU-enabled environments #3715

Default settings of `Structure.relax()` fails to synchronize tensor locations (CPU/GPU) on GPU-enabled environments #3715