Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What should I do after train_ofa_net #57

Open
detectRecog opened this issue Jul 20, 2021 · 5 comments
Open

What should I do after train_ofa_net #57

detectRecog opened this issue Jul 20, 2021 · 5 comments

Comments

@detectRecog
Copy link

detectRecog commented Jul 20, 2021

I run train_ofa_net.py and there is three folders under 'exp/': 'kernel2kernel_depth', 'kernel_depth2kernel_depth_width', 'normal2kernel'. Then, what should I do next? There are 'checkpoint logs net.config net_info.txt run.config' under each exp subfolder after training. Anybody knows how should I deal with it?

I can not find any relations between the training exp results and 'eval_ofa_net.py'. Please help this poor kid. \doge

@Bixiii
Copy link

Bixiii commented Jul 22, 2021

As far as I can tell, the folders are the different stages of the progressive shrinking algorithm, kernel2kernel_depth is the training step from elastic kernel to elastic kernel and elastic depth. In the checkpoint folder you can find the trained models , model_best.pth.tar should be the final model for that step. When you want to evaluate the model you trained yourself, you have to load them in the eval_ofa_net.py script.
For that you can just replace

ofa_network = ofa_net(args.net, pretrained=True)

with something to load your own network. Maybe something like this would work:

ofa_network = OFAMobileNetV3(
    ks_list=[3, 5, 7],
    expand_ratio_list=[3, 4, 6],
    depth_list=[2, 3, 4],
)       
init = torch.load('exp/kernel_depth2kernel_depth_width/phase2/checkpoint/model_best_pth.tar',map_location='cpu')['state_dict']
ofa_network.load_state_dict(init)

@detectRecog
Copy link
Author

detectRecog commented Jul 23, 2021

As far as I can tell, the folders are the different stages of the progressive shrinking algorithm, kernel2kernel_depth is the training step from elastic kernel to elastic kernel and elastic depth. In the checkpoint folder you can find the trained models , model_best.pth.tar should be the final model for that step. When you want to evaluate the model you trained yourself, you have to load them in the eval_ofa_net.py script.
For that you can just replace

ofa_network = ofa_net(args.net, pretrained=True)

with something to load your own network. Maybe something like this would work:

ofa_network = net = OFAMobileNetV3(
    ks_list=[3, 5, 7],
    expand_ratio_list=[3, 4, 6],
    depth_list=[2, 3, 4],
)       
init = torch.load('exp/kernel_depth2kernel_depth_width/phase2/checkpoint/model_best_pth.tar',map_location='cpu')['state_dict']
ofa_network.load_state_dict(init)

You're so kind. Thank you very much for your reply as I'm waiting for someone to save me everyday. Does this mean I should train for different stages sequentially with resuming the best checkpoint of the previous stage? Currently, I train different stages in parallel. And this is why I struggled to find the relations between checkpoints at different stages.

@Bixiii

@Jon-drugstore
Copy link

As far as I can tell, the folders are the different stages of the progressive shrinking algorithm, kernel2kernel_depth is the training step from elastic kernel to elastic kernel and elastic depth. In the checkpoint folder you can find the trained models , model_best.pth.tar should be the final model for that step. When you want to evaluate the model you trained yourself, you have to load them in the eval_ofa_net.py script.
For that you can just replace

ofa_network = ofa_net(args.net, pretrained=True)

with something to load your own network. Maybe something like this would work:

ofa_network = net = OFAMobileNetV3(
    ks_list=[3, 5, 7],
    expand_ratio_list=[3, 4, 6],
    depth_list=[2, 3, 4],
)       
init = torch.load('exp/kernel_depth2kernel_depth_width/phase2/checkpoint/model_best_pth.tar',map_location='cpu')['state_dict']
ofa_network.load_state_dict(init)

Do you have any ideas for the detail of latency predictor model? how to build the network ? Thanks for your replay!

@pyjhzwh
Copy link

pyjhzwh commented Sep 7, 2021

As far as I can tell, the folders are the different stages of the progressive shrinking algorithm, kernel2kernel_depth is the training step from elastic kernel to elastic kernel and elastic depth. In the checkpoint folder you can find the trained models , model_best.pth.tar should be the final model for that step. When you want to evaluate the model you trained yourself, you have to load them in the eval_ofa_net.py script.
For that you can just replace

ofa_network = ofa_net(args.net, pretrained=True)

with something to load your own network. Maybe something like this would work:

ofa_network = net = OFAMobileNetV3(
    ks_list=[3, 5, 7],
    expand_ratio_list=[3, 4, 6],
    depth_list=[2, 3, 4],
)       
init = torch.load('exp/kernel_depth2kernel_depth_width/phase2/checkpoint/model_best_pth.tar',map_location='cpu')['state_dict']
ofa_network.load_state_dict(init)

Do you have any ideas for the detail of latency predictor model? how to build the network ? Thanks for your replay!

In my understanding, once-for-all/ofa/nas/efficiency_predictor/latency_lookup_table.py describes how do they estimate the latency. For ResNet50, they just count FLOPs to represent latency

@pyjhzwh
Copy link

pyjhzwh commented Sep 7, 2021

As far as I can tell, the folders are the different stages of the progressive shrinking algorithm, kernel2kernel_depth is the training step from elastic kernel to elastic kernel and elastic depth. In the checkpoint folder you can find the trained models , model_best.pth.tar should be the final model for that step. When you want to evaluate the model you trained yourself, you have to load them in the eval_ofa_net.py script.
For that you can just replace

ofa_network = ofa_net(args.net, pretrained=True)

with something to load your own network. Maybe something like this would work:

ofa_network = net = OFAMobileNetV3(
    ks_list=[3, 5, 7],
    expand_ratio_list=[3, 4, 6],
    depth_list=[2, 3, 4],
)       
init = torch.load('exp/kernel_depth2kernel_depth_width/phase2/checkpoint/model_best_pth.tar',map_location='cpu')['state_dict']
ofa_network.load_state_dict(init)

You're so kind. Thank you very much for your reply as I'm waiting for someone to save me everyday. Does this mean I should train for different stages sequentially with resuming the best checkpoint of the previous stage? Currently, I train different stages in parallel. And this is why I struggled to find the relations between checkpoints at different stages.

@Bixiii

I guess so. from task 'kernel' to 'depth', the depth list has more choices, from 'depth' to 'expand', the depth_list has more choices. I guess we should run task 'kernel' first, then 'depth', finally 'expand'.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants