Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large performance gap in MD17/22 dataset #12

Open
TommyDzh opened this issue Apr 25, 2024 · 4 comments
Open

Large performance gap in MD17/22 dataset #12

TommyDzh opened this issue Apr 25, 2024 · 4 comments

Comments

@TommyDzh
Copy link

Thank you for the great work EquiformerV2. When I test its performance on MD17/22 dataset, I find it lags far behind SOTA models like VisNet. For example, in MD22_AT_AT, when VisNet val loss for E converges to 0.14, F converges to 0.17. While for EqV2 E val loss is 4.7 for E and 5.1 for F.
I follow the setting in oc20/configs/s2ef/all_md/equiformer_v2/equiformer_v2_N@8_L@4_M@2_31M.yml. Are there things I need to modify for adopting EqV2 in MD datasets? Thanks.

@yilunliao
Copy link
Member

yilunliao commented Apr 25, 2024

Hi @TommyDzh

  1. Can you check whether the training loss/MAE of EquiformerV2 matches that of VisNet?
    As in the config you mentioned, we used regularizations like Dropout (alpha_drop) and stochastic depth (drop_path).
    These regularizations help in OC20 but can prevent training to converge in other datasets.
    You can check the paper of Equiformer to see how I set some hyper-parameters.

  2. Moreover, for fair comparison, it would be simpler to use the same radial basis functions and cutoff radius.

  3. I think you are using gradient methods to predict forces.
    If yes, I think you need to remove .detach() as here, here and here.
    These detach() can make the gradients with respect to relative positions to zeros and make the network only use the relative distance (the magnitude of relative positions) to predict forces (we still have gradients in the radial basis functions).

Feel free to ask if you have other specific questions.

@TommyDzh
Copy link
Author

Thank you for your reply!

  1. For VisNet I use both MSE for E and F. For EqV2, I have tried both settings for VisNet and the one in oc20/configs/s2ef/all_md/equiformer_v2/equiformer_v2_N@8_L@[4_M@2_31M.yml. But the trends are similar. I will further check and follow the hyper-parameters in Equiformer.
  2. I will check it.
  3. I have tried both regress_force in EqV2 and gradient to predict foreces and see similar gaps. I will remove y .detach() and try gradient-based method again. But I wonder, according to your experience, how much will regress_force lags behind gradient-based method? Will different force prediction methods cause such a large gap?

Anyway, your prompt response is greatly appreciated! I will give you further feedbacks when I have corrected all the things above!

@TommyDzh
Copy link
Author

For your reference, here is the val loss curve. Blue line is EqV2 using regress forces, green line is VisNet using gradient-based force prediction
image

@yilunliao
Copy link
Member

For VisNet I use both MSE for E and F. For EqV2, I have tried both settings for VisNet and the one in oc20/configs/s2ef/all_md/equiformer_v2/equiformer_v2_N@8_L@[4_M@2_31M.yml.

I don't understand this. Also the link is broken. What I said is strong regularization can prevent fitting the training set, so you need to check the results in the training set not the validation set.

  1. Using direct methods is better than gradient methods as mentioned by some work on OC20. I don't think there should be such a gap if there is no bug.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants