Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About the Training Process #3

Open
AmingWu opened this issue Apr 21, 2024 · 6 comments
Open

About the Training Process #3

AmingWu opened this issue Apr 21, 2024 · 6 comments

Comments

@AmingWu
Copy link

AmingWu commented Apr 21, 2024

Dear Authors,

Do you release the training code? How long does this method need to train?

@AlphaZYL
Copy link

Dear Authors,
I ran the code using NVIDIA GTX 5000 Ada for a long time but it has not finished training. Could you please tell me if I made any mistakes or if the training process itself takes a long time?

@LirongWu
Copy link
Owner

The training code has been released. The training time depends on the used dataset and hardware platform. I would suggest that you first train on the smallest (fastest) SHS27k dataset, which he can be done in a couple hours on the NVIDIA A100. As for the largest STRING dataset, despite the fact that we did a lot of speedups, unfortunately it does require a very long training time to finish.

@AlphaZYL
Copy link

AlphaZYL commented Apr 27, 2024 via email

@LirongWu
Copy link
Owner

The topic studied in this work is interaction category prediction, i.e. multi-label classification. The numerical values are results on the test datasets. This work can be easily extended to predict whether two proteins interact (0/1), a binary classification problem. One possible measure would be to modify the output dimension to 2 and train with the corresponding labeled data. Furthermore, as a general method, we believe that it has the potential to be extended to other species, but only if one preprocesses those data and uses them to re-train the model.

@AlphaZYL
Copy link

AlphaZYL commented Apr 28, 2024 via email

@LirongWu
Copy link
Owner

“all_assign.txt” is a file describing the physicochemical features of each amino acid, proven valid by a previous work (https://github.com/zqgao22/HIGH-PPI).

“{}_ppi.pkl” and “{}_ppi_label.pkl” are extracted from “protein.actions. STRING” and "protein.STRING.sequences.dictionary" by runing "dataloader.py".The PDB files only characterize each protein and are not directly related to PPIs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants