Training with heterogeneous systems #69

nec4 · 2024-01-19T10:51:17Z

Hello - Thanks for developing and maintaining the tools. I realize from experience this is a vague question, but are there certain architectural/hyperparameter considerations to take into account when training on datasets with many different systems (in both size and chemical composition)?

I have been having a hard time cracking < 50 kcal/mol energy validation MAE (after converting both energies and forces from eV) on this dataset using a config file similar to aspirin example in this repo (see below). I should note that I am working on the develop branch of nequip to take advantage of HDF5 datasets (though I have used also the extxyz format and see the same results).

There are many things that can go wrong in building ML potentials of course, and there is always the issue of hyperparameter search - are there certain options in this package that work better for datasets with varying system composition?

root: results/protein_frags_solv_train_24500_rcut_6
run_name: protein_frags_solv_train_24500_rcut_6

seed: 123456

dataset_seed: 123456

append: true

default_dtype: float32

model_builders:
 - allegro.model.Allegro
 # the typical model builders from `nequip` can still be used:
 - PerSpeciesRescale
 - ForceOutput
 - RescaleEnergyEtc

r_max: 6.0
avg_num_neighbors: auto
BesselBasis_trainable: true
PolynomialCutoff_p: 6
l_max: 2
parity: o3_full
num_layers: 2
env_embed_multiplicity: 128
embed_initial_edge: true
two_body_latent_mlp_latent_dimensions: [128, 256, 512, 1024]
two_body_latent_mlp_nonlinearity: silu
two_body_latent_mlp_initialization: uniform
latent_mlp_latent_dimensions: [1024, 1024, 1024]
latent_mlp_nonlinearity: silu
latent_mlp_initialization: uniform
latent_resnet: true
env_embed_mlp_latent_dimensions: []
env_embed_mlp_nonlinearity: null
env_embed_mlp_initialization: uniform

edge_eng_mlp_latent_dimensions: [128, 64]
edge_eng_mlp_nonlinearity: null
edge_eng_mlp_initialization: uniform

wandb: true
wandb_project: solvated_protein_fragments
verbose: debug

n_train: 22050
n_val: 2450

batch_size: 5
max_epochs: 1000000
learning_rate: 0.001
train_val_split: random
shuffle: true
metrics_key: validation_loss
use_ema: true
ema_decay: 0.99
ema_use_num_updates: true
loss_coeffs:
  forces: 1.
  total_energy:
    - 1.
    - PerAtomMSELoss

optimizer_name: Adam
optimizer_params:
  amsgrad: false
  betas: !!python/tuple
  - 0.9
  - 0.999
  eps: 1.0e-08
  weight_decay: 0.
lr_scheduler_name: ReduceLROnPlateau
lr_scheduler_patience: 10
lr_scheduler_factor: 0.5

early_stopping_upper_bounds:
  cumulative_wall: 604800.
early_stopping_lower_bounds:
  LR: 1.0e-5
early_stopping_patiences:
  validation_loss: 100

dataset: hdf5
dataset_file_name: path_to_my_h5
dataset_Atomicdata_options:
  r_max: 6.0

chemical_symbol_to_type:
  H: 0
  C: 1
  N: 2
  O: 3
  S: 4

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training with heterogeneous systems #69

Training with heterogeneous systems #69

nec4 commented Jan 19, 2024

Training with heterogeneous systems #69

Training with heterogeneous systems #69

Comments

nec4 commented Jan 19, 2024