Spleen tyrosine kinase (Syk) is an intracellular protein expressed in various immune cells, playing a crucial role in inflammatory reactions. Its hyperactivation is associated with numerous autoimmune, allergic, and inflammatory diseases, making Syk an attractive therapeutic target.
Immune thrombocytopenia, a rare autoimmune disorder, is one condition where new Syk inhibitors are particularly needed. Despite the development of several Syk inhibitors, including the approved drug Fostamatinib, challenges persist in achieving optimal efficacy and safety profiles.
To address these challenges, computational methods and machine learning approaches have been increasingly utilized in drug discovery. This study introduces a novel approach using generative models based on reinforcement learning to obtain novel Syk inhibitors molecules. Moreover our approach demonstrates a methodology for adapting generative algorithms to design inhibitors against specific therapeutic targets.
An open database of medicinal molecules (ChEMBL), were used as data sources for collecting the dataset. all_mols.csv
After initial processing presented in the file Data_processing.ipynb
the dataset contained
To construct the QSAR model, we evaluated five molecular representation methods using the PyCaret autoML framework. This process presented in the file Molecules_representations.ipynb
. Extended-connectivity fingerprints (ECFPs) demonstrated the best performance metrics. The dataset prepared for training, with loaded descriptors, is presented in the Data folder df_fp.csv
.
The process of model training is presented in the file Predicted_model.ipynb
.
Evaluation of the generated molecules and generation approaches is presented in the file Generation_analysis.ipynb
.
Analysis of the properties of the obtained molecules, as well as their comparison with ChEMBL inhibitors is presented in the file Property_analysis.ipynb
.
The notebook IC50_predictor.ipynb
provides an interactive tool for predicting the