generate dataset for torchani #622

MichailDanikas · 2022-07-17T16:10:51Z

Hi,
I have a problem creating my own dataset to use them later for training. I'm a begginer with h5py but I don't understand how the datasets should be formated. I am trying to use the last part of #611 where my species look like this:
array([['O', 'C', 'O'], ['O', 'C', 'O'],... ['O', 'C', 'O']])
for one molecule. The coordinates are in the from:
[array([[[ 0. , 0. , 1.237479], [ 0. , 0. , -0.3 ], [ 0. , 0. , -1.237479]]]),...]
and the energies:
[array(226.56324331), array(208.34163576), array(191.23083335),...]
I've also tried other formats which I saved them using:
torchani.data._pyanitools.datapacker('./path_to_file', mode = 'w')
which after load them with: torchani.data.load('./path_to_file') they were tranformed as dictionaries as the examples in ani_gdb_s01.h5 do. However, in the training part the following error is prompted:

If you have any suggestion please let me know.
Thank you in advance.

The text was updated successfully, but these errors were encountered:

jvita · 2023-08-30T20:32:43Z

Probably a bit late for the original poster, but here's what I do to convert from a list of ASE.Atoms objects. I'm not sure if it's 100% correct, but it seems to work fine.

# `train` is a list of ASE.Atoms objects
with h5py.File('train.hdf5', 'w') as hdf5:
    for i, atoms in enumerate(train):
        natoms = len(atoms)
        
        g = hdf5.create_group(str(i))
        
        g.create_dataset('energies', data=np.atleast_1d(atoms.info['energy']))
        g.create_dataset('cell', data=np.array(atoms.cell).reshape((1, 3, 3)))
        g.create_dataset('coordinates', data=atoms.positions.reshape((1, natoms, 3)))
        g.create_dataset('force', data=atoms.arrays['forces'].reshape((1, natoms, 3)))
        g.create_dataset('species', data=[b'C']*natoms)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

generate dataset for torchani #622

generate dataset for torchani #622

MichailDanikas commented Jul 17, 2022

jvita commented Aug 30, 2023

generate dataset for torchani #622

generate dataset for torchani #622

Comments

MichailDanikas commented Jul 17, 2022

jvita commented Aug 30, 2023