-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Test and fix Backward/Deriv #3478
Conversation
This issue has been automatically marked as stale because it has not had any recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions! 👍 |
mlpack-bot closed this one for inactivity, but if this does eventually get finished (be it in this PR or another one) I think it would be a valuable addition. |
Sorry, I have been pretty busy past month. Anyway, I have time now to complete this. Here's mine initial findings. mlpack/src/mlpack/methods/ann/activation_functions/gelu_function.hpp Lines 54 to 55 in 97fa8af
I think the best way to move forward with fixing derivative of these functions, would be to:
To me this solution feel like going against the flow, but results in a more coherent interface in terms of how the activation functions are implemented. There is another solution, which would involve not changing the BaseLayer but implementing another class like BaseLayer, that calculates the derivative in the forward and stores it in the class. So, the functions which are not invertible can be implemented using this class and not BaseLayer. This would result in activation layer being implemented using two types of layers, 1. layer whose derivative is calculated using output 2. layer whose derivative is calculated using input.
Regardless of whichever solution is chosen, there will be clear difference in how the The solutions are non-trivial (another one of the reason this PR got pushed so long), so I thought I should get your opinion on this? So, what do you think @rcurtin? |
@mrdaybird I'm sorry this sat for so long; things have been busy for me over the summer and fall. This also came up in #3551 and so I want to say thank you so much for the clear explanation here, and the effort of digging into each layer; it helped me get a complete handle on the issue quickly. I'm not sure yet what the best solution is, but you're still interested and have time let's discuss in #3551 and figure out how we want to solve it. 👍 |
Background
The Backward/Deriv method has several times being misused or incorrectly implemented.(See #3471, #3469) So, to make it clear:
The
Backward
method of a layer(except loss functions) expects the first argument to be the output of the Forward method. The same is true for activation functions, in whichDeriv
method expects the first argument to be the output ofFn
.To take an example of a correctly implemented function, let's look at LogisticFunction(logistic_function.hpp). Let's first look at the
Fn
method:Now look at the
Deriv
method:Observe that derivative in the
Deriv
method is written in terms of output, not in terms of input.To put it mathematically,
If$y=f(x)$ is the given function, then the derivative should be function of $y$ i.e.
$$\frac{dy}{dx} = g(y)$$
In the example, our function$y=f(x)$ is defined as follows,
$$f(x) = \frac{1}{1 + e^{-x} }$$
Now, if we simply differentiate it w.r.t.$x$ ,
$$f'(x) = \frac{e^{-x}}{(1+e^{-x})^2}$$
If we arrange the terms, we can write the derivative as,
$$f'(x) = f(x)(1-f(x))$$
$$\frac{dy}{dx} = y(1-y)$$
substitute f(x) with y,
which is our required form of derivative.
Add test and fix issues
Since JacobianTest is fixed(#3471), all layers need to be tested against JacobianTest(and/or its siblings). After adding JacobianTest for activation functions, lots of the test failed. Also potentially, other layers may be buggy.
The aim of this PR, is to:
The following layers/functions fail the JacobianTest and need to fix their Backward/Deriv method:
(The list is WIP and may be updated)
Softplus function