Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Added support for timm in unet #3717

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

madhavajay
Copy link
Contributor

This PR attempts to add timm models to the unet_learner, as per conversations during Live Coding 17: https://forums.fast.ai/t/live-coding-17/97166

I have several issues so far:

  1. I don't know if theres a way to get the cut preferences from timm, I have added some code which tries to match them to fastai model types as a backup but they don't seem to match up so perhaps this is pointless and instead we either get them somewhere else or rely on manual user input

  2. I tried training with them, but it seems like they use up a tonne of memory meaning my batch size can only be 1, not sure whats going on but something seems wrong especially considering i tried a smaller timm model resnet18 than my default unet fastai model resnet32.

I am sure I have done something wrong, and would appreciate some direction on what to do next.

@madhavajay madhavajay requested a review from jph00 as a code owner June 30, 2022 04:43
@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@jph00 jph00 marked this pull request as draft June 30, 2022 04:45
@madhavajay
Copy link
Contributor Author

So, I tried training it just to make sure its still working.

I got this far before my paperspace machine shutdown:
Screen Shot 2022-07-02 at 7 56 00 am

So I guess its definitely training but extremely slowly. For comparison I can do the same dataset, resnet34 with batch size 4 and get epochs of about 6 minutes on that free paperspace gpu. So assuming the architecture was equally complex and the batch size was 4 times lower that should only take 24 minutes per epoch.

It seems like when the model is allocated theres only 7% gpu memory in use, and then once training with 1 batch starts it goes to 87+%
Screen Shot 2022-07-01 at 6 06 02 pm

I guess perhaps I don't understand the model cutting code and how to use it with timm models. Any advice on how to debug this?

@madhavajay
Copy link
Contributor Author

Okay, I have changed the code use timm.create_model features_only=True.
So far it seems to be training a convnext_tiny
Screen Shot 2022-08-08 at 11 36 37 am

@madhavajay madhavajay marked this pull request as ready for review August 8, 2022 22:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant