Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can I find code for LIT-PCBA dataset's 3D coordinates generation? #232

Open
Sangyeup opened this issue Oct 24, 2022 · 1 comment
Open

Can I find code for LIT-PCBA dataset's 3D coordinates generation? #232

Sangyeup opened this issue Oct 24, 2022 · 1 comment

Comments

@Sangyeup
Copy link

Hi, did you guys test GEM-2 model on LIT-PCBA by generating 3D coordinates from SMILES string?

If then, can I find a code for it?

Thank you.

@Noisyntrain
Copy link
Collaborator

Noisyntrain commented Oct 26, 2022

Hi Sangyeup, we are organizing the training code for LIT-PCBA and will update it later. For now, you can

  1. Implement the LitPCBADataset class with reference to
    class PCQMv2Dataset(paddle.io.Dataset):
  2. Replace the PCQM4Mv2 dataset with the newly implemented LitPCBADataset in function load_data:
    raw_dataset = PCQMv2Dataset(dataset_config)
  3. Add litpcba dataset config to the folder configs/dataset_configs (you need to specify where the raw litpcba dataset is like the pcqmv2.json do)
  4. Now you can run the train_gem2.py to generate the 3d data and train GEM-2 with LIT-PCBA. Note that processed data is stored in the data_cache_dir that you pass to the script.
    Hope this can be helpful to you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants