Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat pt: Support multi structures input for single frame #3566

Open
wants to merge 9 commits into
base: devel
Choose a base branch
from

Conversation

Chengqian-Zhang
Copy link
Collaborator

Support multi structures input for one frame:

background:
When we predict the properties of small molecules, sometimes multiple 3D conformations are generated from the SMILES format structure, but their targets are the same. In each epoch we would like to pick only one of the 3D configurations as input.

Impletation:
When the file "multistru" is present in the data folder (like nopbc), "coord.npy" can be written in the format (nframe * struc_num * natom * 3), where struc_num is the number of 3D conformations for each structure.

deepmd/utils/data.py Outdated Show resolved Hide resolved
pre-commit-ci bot and others added 5 commits March 19, 2024 10:47
Signed-off-by: Chenqqian Zhang <100290172+Chengqian-Zhang@users.noreply.github.com>
if self.multistru:
size = frames["coord"].shape[1]
rng = np.random.default_rng()
sample_idx = rng.integers(low=0, high=size, size=1)[0]
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it really random?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use the random generator in deepmd.utils.rndom. The generator here cannot be controlled.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have change it to
from deepmd.utils import random as dp_random
sample_idx = dp_random.choice(range(num_stru), size=1)[0]
But I find that when num_stru = 11, the sample_idx is 1,4,8,10 all the time. I hope that the value is randomly selected from range(0--10). How can I solve it? @njzjz

Copy link

codecov bot commented Mar 19, 2024

Codecov Report

Attention: Patch coverage is 72.91667% with 13 lines in your changes are missing coverage. Please review.

Project coverage is 77.47%. Comparing base (2851fb9) to head (fcce4df).
Report is 1 commits behind head on devel.

Files Patch % Lines
deepmd/utils/data.py 76.19% 10 Missing ⚠️
deepmd/pt/utils/neighbor_stat.py 50.00% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##            devel    #3566      +/-   ##
==========================================
- Coverage   77.49%   77.47%   -0.02%     
==========================================
  Files         432      432              
  Lines       37164    37185      +21     
  Branches     1620     1620              
==========================================
+ Hits        28801    28810       +9     
- Misses       7495     7507      +12     
  Partials      868      868              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't it be documented? Otherwise no one knows how to use it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I add doc to deepmd/doc. Should I add doc to other places?

if self.multistru:
size = frames["coord"].shape[1]
rng = np.random.default_rng()
sample_idx = rng.integers(low=0, high=size, size=1)[0]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use the random generator in deepmd.utils.rndom. The generator here cannot be controlled.

elif data.size == nframes * natoms * ndof_:
if output_natoms_for_type_sel:
pass
if key == "coord" and self.multistru:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we assume aparam and other properties are the same for all conformations?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't thought of a situation where there are different other properties(aparam fparam) of all conformations. The reason I add this feature is that when one only have molecules in SMILES format, one may generate several possible conformations using tools such as rdkit. Do you have more better ideas?

@github-actions github-actions bot added the Docs label Mar 20, 2024
Copy link
Collaborator

@wanghan-iapcm wanghan-iapcm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not find the the feature is necessary. if different structures have different labels, they should not be placed in one dataset, because the dp model cannot fit to diverging labels.

@Chengqian-Zhang
Copy link
Collaborator Author

Chengqian-Zhang commented Mar 20, 2024

@wanghan-iapcm Different 3D conformations of a molecule have exactly the same label, and each epoch takes one of all 3D conformations of this molecule. Because molecule may only has SMILES, the exact 3D sturcture is unknown.

@wanghan-iapcm
Copy link
Collaborator

@wanghan-iapcm Different 3D conformations of a molecule have exactly the same label, and each epoch takes one of all 3D conformations of this molecule. Because molecule may only has SMILES, the exact 3D sturcture is unknown.

I do not find any reason why different 3D configurations cannot be place in one batch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants