Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues about optimizing other parameters besides learning rate #7

Open
nicozorza opened this issue May 16, 2018 · 1 comment
Open

Comments

@nicozorza
Copy link

I have emailed Luca Franceschi about some issues with this library, and he asked me to share it.
I've been working on a MLP and wanted to optimize the following parameters, but found some problems:

  • Keep probability of a dropout layer: Luca explained to me that this is not posible since it has some non differentiable points.
  • Regularization beta: We are using tf.nn.l2_loss, but can't optimize the beta.
  • AdamOptimizer: When we tried to use far.AdamOptimizer() for the inner optimizer the code started crashing. Apparently there are some undefined variables: _beta1_power and _beta2_power. I think this is an error in the library.
    Until now we have been able to optimize only the learning rate. It would be great if there could be a list of the things you can and can't do with this library.

Best regards,
Nicolás Zorzano.

@lucfra
Copy link
Owner

lucfra commented May 17, 2018

Ciao Nicolas,

I've pushed an update that fixes the problems for AdamOptimizer. In newer versions of TensorFlow the protected variables _beta1_power and _beta2_power changed name; from there the error!

For the regularization parameter, could you please be more specific?
Something like this:

w = .. your variable ..
rho = far.get_hyperparameter('rho', -3.)
l2_loss = tf.exp(rho)*tf.nn.l2_loss(w)

should allow you to optimize rho, (the exp is to ensure positive values of the regularization coefficient).

For dropout, it is not a totally trivial problem, and may be a topic of research. Anyway (I did not mention it in the email) under some assumptions, dropout can approximately be replaced by multiplicative Gaussian noise: see http://proceedings.mlr.press/v28/wang13a.pdf . This might suggest treating the variance of the noise as an hyperparameter, that could be optimized by g.d. with this package.

As soon as I have time, I will add an IPython book with a list of things that you can and cannot do, as you suggest!

Cheers,

Luca

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants