Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replicate the pretrained model #8

Open
tzt101 opened this issue May 6, 2021 · 2 comments
Open

Replicate the pretrained model #8

tzt101 opened this issue May 6, 2021 · 2 comments

Comments

@tzt101
Copy link

tzt101 commented May 6, 2021

Hi, great work!

I try to train the model with the default settings on ADE20K dataset, but find that the performance of this model is lower than the given pretrained model (FID/mIoU/acc: 29/37/80 vs. 27/45/82). Since the random seed is fixed, I'm not sure why the performance of trained model is different with the pretrained one. The following are the losses of my experiment and the pretrained one:
图片
图片
Any idea why?

@SushkoVadim
Copy link
Contributor

Hi,

First of all, a couple of questions:

  1. Did you change the code after downloading the repo, so do you use exactly the version we released?
  2. Do you use the exact copy of our pip/conda environment with correct versions of packages, for which our code was tested?
  3. What is the batch size you use and on how many GPUs do you train?
  4. What was the command you used to launch this experiment?
    Could you please post your opt.txt for this experiment?

Regarding the ideas "why":
In the losses you posted there is a flat area in the beginning (up to 30k iterations). We observed such behavior in some of our ablations, particularly for the ones trained without the 3d noise. This happens due to a large Adam momentum we used (beta2=0.999). This value leads to a better convergence in the end, but without some parameter tuning may cause a slow down in the beginning . For our final model on Ade20k, we tuned parameters in a way that the optimizer setting does not cause this slow-down. As the seed is fixed, this should not be happening for a model with default settings.

That being said, could you please verify whether you did not change the way the 3d noise is injected in the model, or any of the hyperparameters?

@tzt101
Copy link
Author

tzt101 commented May 7, 2021

Sure, the version of PyTorch I used is 1.6.0, not 1.0.0, and then I use 8 GPUs (but the batch size is still 32). I also change the EMA operation to speedup the training through avoiding the additional inference process (this change will not affect the loss I think).

So, it seems that the environment may cause this difference. I will try it again. Thank you!

This is the opt.txt on my experiment.
图片

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants