Implementing Leaky Relu, parametric and other forms of Relu #322

naveenjafer · 2020-03-10T08:15:29Z

I am working on an implementation of LeakyRelu. I would like some input on how to go about with the implementation of the same. There are 2 options.

A separate layer named LeakyRelu, ParamRelu etc for each of the relu variations.
One single Relu layer that instead takes optional params and implements them. (This would greatly reduce duplicated code, but also does reduce the visibility of the layer to end users unless they spend some time on the documentation).

Keras and Pytorch seem to have separate layers for each of the Relu variations, but I am inclined more towards a single relu with the right parameters. What would you guys suggest?

honnibal · 2020-03-11T10:22:12Z

Thanks for the question, I think it's definitely something to think about.

Currently in Thinc we use a single layer definition for the weights and the activation. This helps to set the initialization defaults a little bit smarter, because the choice of activation typically impacts the best initialization strategy. This does make it awkward to keep accumulating these activation variants though.

How many variants would we want?
Are they all important to have, or are some strictly inferior?
Do people mostly think of them as the same activation (relu), or do they think of it as a different thing?

Another awkward problem with putting it all in the Relu layer is defaults. Presumably if people do use LeakyRelu they mostly use the same leak parameter, right? We can't have a helpful default for that if we instead default that parameter to 0. And I don't want to have both a flag and a separate parameter for the leak.

naveenjafer · 2020-03-11T12:18:57Z

@honnibal The regular Relu is only a special case of the Leaky Relu where the alpha parameter is 0. So what I have done as of now, is kept the default as 0. When they do need a leakyRely, what they do is
Relu(alphaLeaky=0.1)

But again, this might bloat up or conflict if in the future someone would like to implement the other Relus or some future variations to come for that matter. https://keras.io/layers/advanced-activations/

Yeah keeping both the flag and Param is a very terrible idea, in some cases we can do away with an explicit flag and use the params to make an inference, but again, things might conflict in the future.

naveenjafer · 2020-03-28T14:18:42Z

Hi @honnibal any update on this? Would love to complete this with all the extra time the lockdowns are giving us.

kadarakos · 2022-08-19T14:13:42Z

Hey @naveenjafer,

We have not implemented parametric ReLU functions, but added a bunch of activations since:

Swish
Gelu
Dish (this is our custom more efficient Swish using sqrt instead of exp)
HardSwish
HardSwishMobilenet
HardSigmoid
HardTanh

svlandeg added enhancement Feature requests and improvements feat / layers Weights layers, transforms, combinators, wrappers labels Mar 10, 2020

naveenjafer mentioned this issue Mar 10, 2020

leakyRelu feature as a part of Relu layer. #323

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implementing Leaky Relu, parametric and other forms of Relu #322

Implementing Leaky Relu, parametric and other forms of Relu #322

naveenjafer commented Mar 10, 2020

honnibal commented Mar 11, 2020

naveenjafer commented Mar 11, 2020 •

edited

naveenjafer commented Mar 28, 2020

kadarakos commented Aug 19, 2022

Implementing Leaky Relu, parametric and other forms of Relu #322

Implementing Leaky Relu, parametric and other forms of Relu #322

Comments

naveenjafer commented Mar 10, 2020

honnibal commented Mar 11, 2020

naveenjafer commented Mar 11, 2020 • edited

naveenjafer commented Mar 28, 2020

kadarakos commented Aug 19, 2022

naveenjafer commented Mar 11, 2020 •

edited