Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

training very slow on GPU #35

Open
jiayeguo opened this issue Nov 19, 2020 · 4 comments
Open

training very slow on GPU #35

jiayeguo opened this issue Nov 19, 2020 · 4 comments

Comments

@jiayeguo
Copy link

Hi I am trying to reproduce your results in Alanine_dipeptide_multiple_files on a single NVIDIA GeForce GTX 1080 Ti GPU and it took ~ 5h to finish all 10 attempts. I was using tensorflow-gpu v1.9.0, cuda/9.0 and cudnn/7.0. As comparison, I also ran the jupyter-notebook on my laptop CPU and it was faster than GPU (~ 3h, but still very slow!). In the Nature Comm. paper, you mentioned that depending on the system, each run takes between 20s and 180s. Since I didn't change the code, I am wondering why there's such a big discrepancy in speed compared to the paper. Do you have any insight on why my training is so slow? Thanks!

@amardt
Copy link

amardt commented Nov 19, 2020

Hi,
the reason for the slow speed is, that in this notebook we don't load the data into memory before training. Instead it is loaded for every batch from the hard drive. This is supposed to simulate the situation, where the whole dataset does not fit into memory. However, reading from the hard drive is slow and if you are using the GPU, it also has to be transferred to that, which I guess is the reason why it is even slower on your desktop. The time is consumed by loading and transferring data.
For the paper we simply used only one trajectory and loaded it into memory before training (see the notebook without multiple files).
Anyhow, a colleague of mine is developing a new library with the implementation of VAMPnets in Pytorch, which will be more up-to-date. I will post a link here as soon as it is released.
I hope this answers your question!
Best
Andreas

@jiayeguo
Copy link
Author

Thanks for the clarification! That makes sense. Looking forward to trying out the PyTorch version.
Best,
Jiaye

@clonker
Copy link
Member

clonker commented Nov 20, 2020

Hi Jiaye,

colleague developing the new library here. Coincidentally it is also called deeptime. If you are feeling adventurous and want to play around with it, you can find it here: https://github.com/deeptime-ml/deeptime (and documentation for vampnets in the new deeptime)
I have set up a small notebook for you demonstrating how you can use it to train vampnets. Training takes 2 - 2:30 min on my machine for 60 epochs. There are two training routines, the 2:30 one is more top-level and easier to implement, the 2:00 min one is more optimized for data that can be held in memory in their entirety.

Cheers,
Moritz

ala2-vampnets.zip

@jiayeguo
Copy link
Author

Hi Moritz! Thanks for pointing me to this new repo. I will take a look and play around with it.
Best,
Jiaye

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants