You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am currently trying to train the NNScore and PLECScore models for ligand scoring. So far I have not found a way to train the model "purposefully" and have resorted to run scorer.load() without any arguments, which starts the training of the scoring function. However, I don't know which version of PDBBind this is using as a result (I assume v2016?).
where path is just a directory on my machine and get the following error
---------------------------------------------------------------------------
Exception Traceback (most recent call last)
Cell In[10], line 23
21 rescorers = {'nnscore':NNScore.nnscore()}
22 scorer = rescorers['nnscore']
---> 23 scorer.gen_training_data(pdbbind_dir='/home/tony/CADD22/software/pdbbind', pdbbind_versions=[2016], use_proteins=False)
File ~/.conda/envs/wocondock/lib/python3.8/site-packages/oddt/scoring/functions/NNScore.py:63, in nnscore.gen_training_data(self, pdbbind_dir, pdbbind_versions, home_dir, use_proteins)
60 home_dir = dirname(__file__) + '/NNScore'
61 filename = path_join(home_dir, 'nnscore_descs.csv')
---> 63 super(nnscore, self)._gen_pdbbind_desc(
64 pdbbind_dir=pdbbind_dir,
65 pdbbind_versions=pdbbind_versions,
66 desc_path=filename,
67 use_proteins=use_proteins
68 )
File ~/.conda/envs/wocondock/lib/python3.8/site-packages/oddt/scoring/__init__.py:94, in scorer._gen_pdbbind_desc(self, pdbbind_dir, pdbbind_versions, desc_path, include_general_set, use_proteins, **kwargs)
92 df = None
93 for pdbbind_version in pdbbind_versions:
---> 94 p = pdbbind('%s/v%i/' % (pdbbind_dir, pdbbind_version),
95 version=pdbbind_version,
96 opt=opt)
97 # Core set
99 for set_name in p.pdbind_sets:
File ~/.conda/envs/wocondock/lib/python3.8/site-packages/oddt/datasets.py:85, in pdbbind.__init__(self, home, version, default_set, opt)
82 self.sets[pdbind_set] = dict(zip(self._set_ids[pdbind_set],
83 self._set_act[pdbind_set]))
84 if len(self.sets) == 0:
---> 85 raise Exception('There is no PDBbind set availabe')
Exception: There is no PDBbind set availabe
Additionally, when I then score ligands, the performance of these models is very poor (Enrichment Factor at 1% of around 0-2%) when compared to other scoring functions (as implemented in GNINA for example) achieving ~20% enrichment.
Therefore I am wondering if there is a tutorial/notebook that explains how to train these models using the gen_training_data() or fit() methods.
I was also wondering if it was possible to use a more updated version of the PDBBind data, such as version 2020, and how hard that would be to implement.
I am happy to provide the dataset I am using for comparison of the performance of these scoring functions (aldr dataset from DUD-E).
The text was updated successfully, but these errors were encountered:
Thanks for your answer. When using the load method without arguments, it starts training the model. However, I believe this is what I was suing previously and was getting low enrichment with. I will retrain now and update you. Where would i find the bundled models? I can only find .csv files in oddt/scoring/functions/NNScore/, should I be using the load() method with those?
Update : I've managed to load the pretrained model for linear PLECScore from the one bundled in ODDT. However, I would still like to understand how to train the models myself in order to use the MLP or RF version, perhaps on PDBbindv2020 and how to load the model for NNScore
I am currently trying to train the NNScore and PLECScore models for ligand scoring. So far I have not found a way to train the model "purposefully" and have resorted to run scorer.load() without any arguments, which starts the training of the scoring function. However, I don't know which version of PDBBind this is using as a result (I assume v2016?).
I have tried the following for example :
scorer = NNScore.nnscore() scorer.gen_training_data(pdbbindir=$PATH$, pdbbind_versions=2016)
where path is just a directory on my machine and get the following error
Additionally, when I then score ligands, the performance of these models is very poor (Enrichment Factor at 1% of around 0-2%) when compared to other scoring functions (as implemented in GNINA for example) achieving ~20% enrichment.
Therefore I am wondering if there is a tutorial/notebook that explains how to train these models using the gen_training_data() or fit() methods.
I was also wondering if it was possible to use a more updated version of the PDBBind data, such as version 2020, and how hard that would be to implement.
I am happy to provide the dataset I am using for comparison of the performance of these scoring functions (aldr dataset from DUD-E).
The text was updated successfully, but these errors were encountered: