Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How can I modify Training and loss forces of my system specially Li atoms in my LGPO system? #76

Open
ElhamPisheh opened this issue Mar 1, 2024 · 6 comments

Comments

@ElhamPisheh
Copy link

Dear all,
I have recently Used Nequip-Allegro Framework to retrain DFT data in the LGPO systems. I have used 30000 configurations for retraining DFT data.By using the ASE calculator generated ML forces to compare the forces with DFT forces .Unfortunately the Loss of forces did not improve at all. I have changed rcutoff from 5 to 7 and 14.I have Changed Max epoch from 100 to 200. I have changed batch_size from 1 to 4 and 6 . I have changed different splits (80-20 and 70-30) for training and validations.I have checked Whether to shuffle the training data or not. I have checked Themathematical expression for the overall LOSS and changed the force loss coefficient of 1.0 to 100. . I have checked different seeds to have different training and validation sets. I have checked different lmax=1 and 2.

I tried to do anything to modify ML forces (loss_f,loss_e and loss) specially for Li atoms.
The total loss still remains near 23 and loss_f is near 0.23 with the force coefficient of 100. and the total loss can be modified to 0.23 with the force coefficient of 1. and loss_f is still unchanged and near 0.23 .However,these improvements seem not successful and still large.

Do you have any new ideas to help me improve forces and overall results?
Name Epoch wal (hours) LR loss_f loss_e loss f_mae f_rmse e_mae e/N_mae
Train 200 9.969141667 0.002 0.23 0.0368 23.0 0.213 0.485 0.867 0.0173
Validation 200 9.969141667 0.002 0.204 0.000193 20.4 0.200 0.456 0.699 0.014

@ElhamPisheh
Copy link
Author

No suggestion for addressing my issue???

@DavidW99
Copy link

DavidW99 commented Mar 4, 2024

From my personal experience, keeping batch size small like 1 or 4 is good practice in this framework and I have seen increasing batch size decreasing the performance. I would suggest keeping l_max = 2 or 3 as more angular resolution provides better accuracy.

You may then try tuning the architecture by
increasing num_layers: 2, 4;
increasing num_tensor_features: 32, 64 and also adjusting two_body_latent_mlp_latent_dimensions and latent_mlp_latent_dimensions accordingly; This will provide more channels.
adjusting learning_rate: 0.005, 0.001;

Hope this can help!

@ElhamPisheh
Copy link
Author

Dear David,

I will check all of them and let all know the outcome.

Thanks a lot,
Best Regards,
Elham

@ElhamPisheh
Copy link
Author

From my personal experience, keeping batch size small like 1 or 4 is good practice in this framework and I have seen increasing batch size decreasing the performance. I would suggest keeping l_max = 2 or 3 as more angular resolution provides better accuracy.

You may then try tuning the architecture by increasing num_layers: 2, 4; increasing num_tensor_features: 32, 64 and also adjusting two_body_latent_mlp_latent_dimensions and latent_mlp_latent_dimensions accordingly; This will provide more channels. adjusting learning_rate: 0.005, 0.001;

Hope this can help!

Dear David,

I have a problem due to the time limitation of my system.It is 2 days and after that, the server will stop our job.

For example I could only calculate 80 epochs of 200 after 2 days. Is there any way to start the job from where it left off??? To start from 81 epochs for example?

Thanks for your time,
Elham

@DavidW99
Copy link

Hi Elham,

Thanks for your question! Allegro will restart from the best model saved from the previous run when you keep the same run_name in your config file. In your case, the best model should be result at the 80th epoch as I assume the loss has not plateaued yet. You can also set append: true as in the example.yaml so the log will be appended.

@ElhamPisheh
Copy link
Author

Dear David,

Thanks for your guidance and your time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants