Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regression Task with Biological Activity Data Using Pretrained Chemformer #33

Open
wdon021 opened this issue Mar 28, 2024 · 2 comments
Open

Comments

@wdon021
Copy link

wdon021 commented Mar 28, 2024

Dear Chemformer Team,

I am currently embarking on a project aiming to perform regression analysis using biological activity data (specifically, pXC50 values) with the pretrained Chemformer model. The objective is to predict activity values based on SMILES strings.

In the process of setting up my environment and preparing for fine-tuning, I encountered a closed issue #13
and a fork of the repository, which provided clear examples and scripts for fine-tuning Chemformer on regression tasks. Notably, these resources referenced RegPropDataModule(_AbsDataModule) in finetune_regression_modules.py, suggesting it as a viable option for regression with Chemformer.

However, upon revisiting the Chemformer repository, it appears that the finetune_regression directory and RegPropDataModule class are no longer present in the example_scripts folder, which has left me uncertain about the best approach to undertake my regression task with the latest codebase.

With the above context, I am reaching out to seek your guidance on several points:

  • Current Recommended DataModule: Given the removal of RegPropDataModule and associated fine-tuning examples, could you advise on which DataModule in the current code structure is best suited for handling a dataset of SMILES strings with pXC50 values for regression analysis?

  • Script Selection: Among the scripts present in the repository (e.g., fine_tune.py, inference_score.py, predict.py), which would you recommend for fine-tuning the pretrained model on a regression dataset and for making subsequent predictions?

  • Further Recommendations: If there are any specific recommendations regarding data preprocessing, hyperparameter selection, or other considerations to optimize the use of Chemformer for this regression task, I would be grateful for your insights.

Thank you very much for your time and support !

@anniewesterlund
Copy link
Collaborator

Hi,
As you noticed, we have removed support for regression in the newer Chemformer version. However, you can have a look at this old release: https://github.com/MolecularAI/Chemformer/releases/tag/1.0

It may contain the scripts and datamodules you refer to.

@wdon021
Copy link
Author

wdon021 commented Apr 29, 2024

Thank you for your response @anniewesterlund, may I ask why the new version removed the support for regression?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants