Should we prevent over-regularization? #5

walkerning · 2019-05-18T15:51:44Z

As in every one-shot parameter training step, only a subset of parameters are active, especially when mepa_sample_size is small. We by default apply weight decay to all super net's parameters in every training step, is this an "over-regularization" or a desired behavior (which i will refer to "auto-regularization"). When some parameters are not active in any of the sampled architecture, maybe they should not be regularized, at least in the very begining of the training. As this might cause this unsampled path to be under trained, and the architecture that is sampled more is trained even better. This could lead to unsufficient exploration maybe?

However, when the controller is somehow well trained, the less sampled path means it just does not work well in the architecture, and thus the less training and over regularizaiton these paths get is an "auto-regularization" of this super network. (But do we really need this auto-regularization in this super network, as the only usage of the supernetwork is to be an performance indicator of it sub networks.

The text was updated successfully, but these errors were encountered:

walkerning · 2019-05-21T13:06:04Z

Not so important for now...

walkerning self-assigned this May 18, 2019

walkerning added the hold Hold for now label May 24, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Should we prevent over-regularization? #5

Should we prevent over-regularization? #5

walkerning commented May 18, 2019 •

edited

walkerning commented May 21, 2019

Should we prevent over-regularization? #5

Should we prevent over-regularization? #5

Comments

walkerning commented May 18, 2019 • edited

walkerning commented May 21, 2019

walkerning commented May 18, 2019 •

edited