Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Result Problem #19

Open
PikaQiu520521 opened this issue Mar 20, 2023 · 13 comments
Open

Result Problem #19

PikaQiu520521 opened this issue Mar 20, 2023 · 13 comments

Comments

@PikaQiu520521
Copy link

Hello, thank you for your previous question. Unfortunately, I was not able to fully replicate the process, but I used the same calculation method to evaluate the ligands generated by the model. I have two more questions now. 1. The generated ligand files have "row" and "processed" versions, as well as another folder containing paths and scores. I am not sure what they specifically mean. Could you provide some explanation? 2. The generated ligands do not have the bond relationships between atoms, so they are actually discrete points. How do you think about this issue?

@arneschneuing
Copy link
Owner

Hello, thanks for your questions.

  1. When running the test.py script you have a number of post-processing options (see here). You can for example relax the generated molecules in a force field and remove disconnected fragments. You will find the final molecules in the processed/ folder. However, we also save the same molecules without any post-processing applied (apart from adding bonds which does not change the atoms' chemical types or coordinates) in the raw/ folder. These can be used for different kinds of analyses later on or to explore alternative post-processing options.
    We also measure the time it takes to generate ligands for each test set pocket. These measurements are stored in the pocket_times/ directory.
  2. In the current version, we are inferring chemical bonds from the generated point clouds but I agree that more realistic molecules might be generated if we allowed the model to generate bonds explicitly. We are currently working on that.

I hope this answers your questions.

@PikaQiu520521
Copy link
Author

Hi, thank you very much for your reply. The first reply answered my question very well, but for the second question, I still want to know more about it, I wonder if it is convenient. Observing the files generated by raw and those generated by processed, it can be found that processed only uses the first line of data of each processing result in the raw folder. I haven't seen the processing process yet, and I am confused about it. When visualizing the data, I found that these points are very discrete. Using openbabel for post-processing, I can’t generate new molecules. I don’t know where the problem is, can you answer it?

@pearl-rabbit
Copy link

pearl-rabbit commented Apr 10, 2023

Hello, what was the result of the 'Diversity' indicator when you evaluate it?
Whether reading molecules from "raw" or "processed", my result is 0, and the value of SA is much smaller than the results in the paper.

The following are the results of evaluating the molecules in raw:

QED: 0.495 \pm 0.08
SA: 0.234 \pm 0.03
LogP: 0.021 \pm 1.10
Lipinski: 4.883 \pm 0.34
Diversity: 0.000 \pm 0.00

@arneschneuing
Copy link
Owner

Hello @pearl-rabbit, the diversity is usually zero when only a single molecule is generated for each protein pocket because we compute this value per target (and afterwards mean and standard deviation across all targets).
How many molecules did you generate?
Also, which model did you use? Did you train one yourself or did you use one of the provided checkpoints?

@pearl-rabbit
Copy link

pearl-rabbit commented Apr 11, 2023

Hi @arneschneuing , thank you for your timely response. I set n_samples=100 and batch_ size=60 in test.py. If I want to generate multiple molecules for a protein pocket, should I set the 'n_samples' value higher?
I use the retrained all atom model, and except for the cutoff value set to 1, all other parameters have not been changed.The dataset is CrossDocked Benchmark.

@arneschneuing
Copy link
Owner

Hi, n_samples=100 should be fine. It means that 100 valid molecules are generated per pocket. How did you provide those molecules to the evaluate() function? Did you create a nested list?

A cutoff value of 1 [Å] seems very low. It is less than a typical bond length. Maybe you should consider a higher threshold to create sufficient edges.

@arneschneuing
Copy link
Owner

@PikaQiu520521, I'm sorry but I don't really understand the question.. What do you mean by 'discrete points'? Could you provide an example?

@pearl-rabbit
Copy link

pearl-rabbit commented Apr 11, 2023

I will try training again. Is it appropriate to set the cutoff to 3 (my server may not be able to support larger cutoff values)?
This is the code for reading molecules:

from analysis.metrics import MoleculeProperties
from rdkit import Chem
import os

filePath = 'outdir/raw'
filenames = os.listdir(filePath)
pocket_list = []
for filename in filenames:
    suppl = Chem.SDMolSupplier(filePath + filename, sanitize=False)
    mols = [mol for mol in suppl if mol]
    pocket_list.append(mols)
mol_metrics = MoleculeProperties()
all_qed, all_sa, all_logp, all_lipinski, per_pocket_diversity = mol_metrics.evaluate(pocket_list)

'filePath' is the path of the test set sampling results, storing 100 sdf files, each containing the coordinates and atomic types of molecules. (Due to some issues with the server where I store the files, I am unable to provide an example of sdf here temporarily.)

@arneschneuing
Copy link
Owner

A cutoff of 3.0 still seems rather low but it might work because information is also propagated through several layers of message passing. However, I haven't tried this value myself yet and can therefore not say if the scores will be similar to the ones from the paper.

Your code for reading molecules looks fine to me. Could you please check how many items your pocket_list and each list within pocket_list contain before you pass them to evaluate()?

print(len(pocket_list))
print([len(x) for x in pocket_list])

@pearl-rabbit
Copy link

pearl-rabbit commented Apr 14, 2023

The output is:

100
[120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120]

If there is no problem, it may be due to the value of 'cutoff'.

@arneschneuing
Copy link
Owner

It's hard to tell. The cutoff value could cause your molecules to be less realistic but I'm still surprised by the 0.0 diversity value. Have you visually inspected some of the generated molecules? Do they look more or less like molecules (several atoms connected by bonds) or is there some obvious failure mode (e.g. it always outputs disconnected point clouds)?

@pearl-rabbit
Copy link

I didn't carefully analyze the reason, just replaced the original method of calculating diversity with the code in the original annotation section, and obtained a non zero value.
https://github.com/arneschneuing/DiffSBDD/blob/main/analysis/metrics.py#L196
I checked the sdf file and found that the generated molecules have no edges.
AndI retrained the model and obtained effective results evaluating with original code.

By the way, in the text, regarding ca_only model doesn't have a cutoff set, but it's too big for me , I set a limit on the cutoff and achieved effective results. Does this value only affect the number of edges when generating the graph?

@arneschneuing
Copy link
Owner

Yes, it determines which atoms (nodes) are connected in the graph that the neural network processes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants