Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementing Leaky Relu, parametric and other forms of Relu #322

Open
naveenjafer opened this issue Mar 10, 2020 · 4 comments
Open

Implementing Leaky Relu, parametric and other forms of Relu #322

naveenjafer opened this issue Mar 10, 2020 · 4 comments
Labels
enhancement Feature requests and improvements feat / layers Weights layers, transforms, combinators, wrappers

Comments

@naveenjafer
Copy link
Contributor

I am working on an implementation of LeakyRelu. I would like some input on how to go about with the implementation of the same. There are 2 options.

  1. A separate layer named LeakyRelu, ParamRelu etc for each of the relu variations.
  2. One single Relu layer that instead takes optional params and implements them. (This would greatly reduce duplicated code, but also does reduce the visibility of the layer to end users unless they spend some time on the documentation).

Keras and Pytorch seem to have separate layers for each of the Relu variations, but I am inclined more towards a single relu with the right parameters. What would you guys suggest?

@svlandeg svlandeg added enhancement Feature requests and improvements feat / layers Weights layers, transforms, combinators, wrappers labels Mar 10, 2020
@honnibal
Copy link
Member

Thanks for the question, I think it's definitely something to think about.

Currently in Thinc we use a single layer definition for the weights and the activation. This helps to set the initialization defaults a little bit smarter, because the choice of activation typically impacts the best initialization strategy. This does make it awkward to keep accumulating these activation variants though.

  • How many variants would we want?
  • Are they all important to have, or are some strictly inferior?
  • Do people mostly think of them as the same activation (relu), or do they think of it as a different thing?

Another awkward problem with putting it all in the Relu layer is defaults. Presumably if people do use LeakyRelu they mostly use the same leak parameter, right? We can't have a helpful default for that if we instead default that parameter to 0. And I don't want to have both a flag and a separate parameter for the leak.

@naveenjafer
Copy link
Contributor Author

naveenjafer commented Mar 11, 2020

@honnibal The regular Relu is only a special case of the Leaky Relu where the alpha parameter is 0. So what I have done as of now, is kept the default as 0. When they do need a leakyRely, what they do is
Relu(alphaLeaky=0.1)

But again, this might bloat up or conflict if in the future someone would like to implement the other Relus or some future variations to come for that matter. https://keras.io/layers/advanced-activations/

Yeah keeping both the flag and Param is a very terrible idea, in some cases we can do away with an explicit flag and use the params to make an inference, but again, things might conflict in the future.

@naveenjafer
Copy link
Contributor Author

Hi @honnibal any update on this? Would love to complete this with all the extra time the lockdowns are giving us.

@kadarakos
Copy link
Contributor

Hey @naveenjafer,

We have not implemented parametric ReLU functions, but added a bunch of activations since:

  1. Swish
  2. Gelu
  3. Dish (this is our custom more efficient Swish using sqrt instead of exp)
  4. HardSwish
  5. HardSwishMobilenet
  6. HardSigmoid
  7. HardTanh

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Feature requests and improvements feat / layers Weights layers, transforms, combinators, wrappers
Projects
None yet
Development

No branches or pull requests

4 participants