Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should we prevent over-regularization? #5

Open
walkerning opened this issue May 18, 2019 · 1 comment
Open

Should we prevent over-regularization? #5

walkerning opened this issue May 18, 2019 · 1 comment
Assignees
Labels
hold Hold for now

Comments

@walkerning
Copy link
Owner

walkerning commented May 18, 2019

As in every one-shot parameter training step, only a subset of parameters are active, especially when mepa_sample_size is small. We by default apply weight decay to all super net's parameters in every training step, is this an "over-regularization" or a desired behavior (which i will refer to "auto-regularization"). When some parameters are not active in any of the sampled architecture, maybe they should not be regularized, at least in the very begining of the training. As this might cause this unsampled path to be under trained, and the architecture that is sampled more is trained even better. This could lead to unsufficient exploration maybe?

However, when the controller is somehow well trained, the less sampled path means it just does not work well in the architecture, and thus the less training and over regularizaiton these paths get is an "auto-regularization" of this super network. (But do we really need this auto-regularization in this super network, as the only usage of the supernetwork is to be an performance indicator of it sub networks.

@walkerning walkerning self-assigned this May 18, 2019
@walkerning
Copy link
Owner Author

Not so important for now...

@walkerning walkerning added the hold Hold for now label May 24, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hold Hold for now
Projects
None yet
Development

No branches or pull requests

1 participant