Result Problem #19

PikaQiu520521 · 2023-03-20T02:11:18Z

Hello, thank you for your previous question. Unfortunately, I was not able to fully replicate the process, but I used the same calculation method to evaluate the ligands generated by the model. I have two more questions now. 1. The generated ligand files have "row" and "processed" versions, as well as another folder containing paths and scores. I am not sure what they specifically mean. Could you provide some explanation? 2. The generated ligands do not have the bond relationships between atoms, so they are actually discrete points. How do you think about this issue?

arneschneuing · 2023-03-24T10:52:58Z

Hello, thanks for your questions.

When running the test.py script you have a number of post-processing options (see here). You can for example relax the generated molecules in a force field and remove disconnected fragments. You will find the final molecules in the processed/ folder. However, we also save the same molecules without any post-processing applied (apart from adding bonds which does not change the atoms' chemical types or coordinates) in the raw/ folder. These can be used for different kinds of analyses later on or to explore alternative post-processing options.
We also measure the time it takes to generate ligands for each test set pocket. These measurements are stored in the pocket_times/ directory.
In the current version, we are inferring chemical bonds from the generated point clouds but I agree that more realistic molecules might be generated if we allowed the model to generate bonds explicitly. We are currently working on that.

I hope this answers your questions.

PikaQiu520521 · 2023-03-27T03:53:37Z

Hi, thank you very much for your reply. The first reply answered my question very well, but for the second question, I still want to know more about it, I wonder if it is convenient. Observing the files generated by raw and those generated by processed, it can be found that processed only uses the first line of data of each processing result in the raw folder. I haven't seen the processing process yet, and I am confused about it. When visualizing the data, I found that these points are very discrete. Using openbabel for post-processing, I can’t generate new molecules. I don’t know where the problem is, can you answer it?

pearl-rabbit · 2023-04-10T11:51:14Z

Hello, what was the result of the 'Diversity' indicator when you evaluate it?
Whether reading molecules from "raw" or "processed", my result is 0, and the value of SA is much smaller than the results in the paper.

The following are the results of evaluating the molecules in raw:

QED: 0.495 \pm 0.08
SA: 0.234 \pm 0.03
LogP: 0.021 \pm 1.10
Lipinski: 4.883 \pm 0.34
Diversity: 0.000 \pm 0.00

arneschneuing · 2023-04-10T16:12:43Z

Hello @pearl-rabbit, the diversity is usually zero when only a single molecule is generated for each protein pocket because we compute this value per target (and afterwards mean and standard deviation across all targets).
How many molecules did you generate?
Also, which model did you use? Did you train one yourself or did you use one of the provided checkpoints?

pearl-rabbit · 2023-04-11T02:02:00Z

Hi @arneschneuing , thank you for your timely response. I set n_samples=100 and batch_ size=60 in test.py. If I want to generate multiple molecules for a protein pocket, should I set the 'n_samples' value higher?
I use the retrained all atom model, and except for the cutoff value set to 1, all other parameters have not been changed.The dataset is CrossDocked Benchmark.

arneschneuing · 2023-04-11T08:00:21Z

Hi, n_samples=100 should be fine. It means that 100 valid molecules are generated per pocket. How did you provide those molecules to the evaluate() function? Did you create a nested list?

A cutoff value of 1 [Å] seems very low. It is less than a typical bond length. Maybe you should consider a higher threshold to create sufficient edges.

arneschneuing · 2023-04-11T08:06:59Z

@PikaQiu520521, I'm sorry but I don't really understand the question.. What do you mean by 'discrete points'? Could you provide an example?

pearl-rabbit · 2023-04-11T12:37:02Z

I will try training again. Is it appropriate to set the cutoff to 3 (my server may not be able to support larger cutoff values)?
This is the code for reading molecules:

from analysis.metrics import MoleculeProperties
from rdkit import Chem
import os

filePath = 'outdir/raw'
filenames = os.listdir(filePath)
pocket_list = []
for filename in filenames:
    suppl = Chem.SDMolSupplier(filePath + filename, sanitize=False)
    mols = [mol for mol in suppl if mol]
    pocket_list.append(mols)
mol_metrics = MoleculeProperties()
all_qed, all_sa, all_logp, all_lipinski, per_pocket_diversity = mol_metrics.evaluate(pocket_list)

'filePath' is the path of the test set sampling results, storing 100 sdf files, each containing the coordinates and atomic types of molecules. (Due to some issues with the server where I store the files, I am unable to provide an example of sdf here temporarily.)

arneschneuing · 2023-04-13T17:12:30Z

A cutoff of 3.0 still seems rather low but it might work because information is also propagated through several layers of message passing. However, I haven't tried this value myself yet and can therefore not say if the scores will be similar to the ones from the paper.

Your code for reading molecules looks fine to me. Could you please check how many items your pocket_list and each list within pocket_list contain before you pass them to evaluate()?

print(len(pocket_list))
print([len(x) for x in pocket_list])

pearl-rabbit · 2023-04-14T04:02:24Z

The output is:

100
[120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120]

If there is no problem, it may be due to the value of 'cutoff'.

arneschneuing · 2023-04-28T07:31:05Z

It's hard to tell. The cutoff value could cause your molecules to be less realistic but I'm still surprised by the 0.0 diversity value. Have you visually inspected some of the generated molecules? Do they look more or less like molecules (several atoms connected by bonds) or is there some obvious failure mode (e.g. it always outputs disconnected point clouds)?

pearl-rabbit · 2023-04-28T09:02:13Z

I didn't carefully analyze the reason, just replaced the original method of calculating diversity with the code in the original annotation section, and obtained a non zero value.
https://github.com/arneschneuing/DiffSBDD/blob/main/analysis/metrics.py#L196
I checked the sdf file and found that the generated molecules have no edges.
AndI retrained the model and obtained effective results evaluating with original code.

By the way, in the text, regarding ca_only model doesn't have a cutoff set, but it's too big for me , I set a limit on the cutoff and achieved effective results. Does this value only affect the number of edges when generating the graph?

arneschneuing · 2023-04-28T14:05:09Z

Yes, it determines which atoms (nodes) are connected in the graph that the neural network processes.

minju-hits mentioned this issue Jul 25, 2023

Reproducing the paper's result #35

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Result Problem #19

Result Problem #19

PikaQiu520521 commented Mar 20, 2023

arneschneuing commented Mar 24, 2023

PikaQiu520521 commented Mar 27, 2023

pearl-rabbit commented Apr 10, 2023 •

edited

arneschneuing commented Apr 10, 2023

pearl-rabbit commented Apr 11, 2023 •

edited

arneschneuing commented Apr 11, 2023

arneschneuing commented Apr 11, 2023

pearl-rabbit commented Apr 11, 2023 •

edited

arneschneuing commented Apr 13, 2023

pearl-rabbit commented Apr 14, 2023 •

edited

arneschneuing commented Apr 28, 2023

pearl-rabbit commented Apr 28, 2023

arneschneuing commented Apr 28, 2023

Result Problem #19

Result Problem #19

Comments

PikaQiu520521 commented Mar 20, 2023

arneschneuing commented Mar 24, 2023

PikaQiu520521 commented Mar 27, 2023

pearl-rabbit commented Apr 10, 2023 • edited

arneschneuing commented Apr 10, 2023

pearl-rabbit commented Apr 11, 2023 • edited

arneschneuing commented Apr 11, 2023

arneschneuing commented Apr 11, 2023

pearl-rabbit commented Apr 11, 2023 • edited

arneschneuing commented Apr 13, 2023

pearl-rabbit commented Apr 14, 2023 • edited

arneschneuing commented Apr 28, 2023

pearl-rabbit commented Apr 28, 2023

arneschneuing commented Apr 28, 2023

pearl-rabbit commented Apr 10, 2023 •

edited

pearl-rabbit commented Apr 11, 2023 •

edited

pearl-rabbit commented Apr 11, 2023 •

edited

pearl-rabbit commented Apr 14, 2023 •

edited