Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[General Question on NN] Problem with Loss function and it's influence on backpropagation #86

Open
Memnarch opened this issue Mar 9, 2022 · 4 comments
Assignees
Labels
question Further information is requested

Comments

@Memnarch
Copy link

Memnarch commented Mar 9, 2022

Hi,
This ticket isn't about your project, directly. However I started lookig into writing a (C)NN and got a bit stuck on something.
I started on this topic with this article collection(for context): https://victorzhou.com/series/neural-networks-from-scratch/

For backpropagation, someone needs to calculate the initial gradient that is feed into the Outputnode and proceed from there back to the first. If I understood this correctly, the initial gradient for Backpropagation is calculated using the Output of a feedforward and running it through the derivative of the loss function.

As I have a bit of a hard time deriving the loss functions myself, like binary cross entropy (i am really rusty on this topic), I thought I should peek around how other frameworks solve this. And right now I am confused. Frameworks like Keras allow you to write custom loss functions, but have no interface for delivering the derivative?

In your repository, I found the LossFN property on your network, but again, no sign for the derivative.
So either this means I got some fundamentals wrong, or I am blind.

I hope you have some time to for this. Sorry in advance for opening a ticket here and add to the noise.

@joaopauloschuler
Copy link
Owner

@Memnarch, your questions are very good. In Keras, this is how the gradient is calculated (you won't find layer specific derivative calculations as in CAI and convnetjs):
https://www.tensorflow.org/guide/autodiff

In CAI, you'll find explicit code for derivatives. I coded CAI with one screen on wikipedia (https://en.wikipedia.org/wiki/Backpropagation) and another on Lazarus.

In convnetjs, you'll find the softmax forward and backward code at:
https://github.com/karpathy/convnetjs/blob/master/src/convnet_layers_loss.js

I haven't checked for a while, but the implementation at CAI should be equivalent to the convnetjs:

procedure TNNetLayer.ComputeOutputErrorWith(pOutput: TNNetVolume);
  {$IFDEF CheckRange}var MaxError:TNeuralFloat; {$ENDIF}
begin
  if
    (pOutput.Size = FOutput.Size) and
    (pOutput.Size = FOutputError.Size) then
  begin
    FOutputError.CopyNoChecks(FOutput);
    FOutputError.Sub(pOutput);

    {$IFDEF CheckRange}
    MaxError := FPrevLayer.OutputError.GetMax();
    if MaxError > 1 then
    begin
      FOutputError.Divi(MaxError);
    end;
    {$ENDIF}
  end else
  begin
    FErrorProc
    (
      'ComputeOutputErrorWith should have same sizes.' +
      'Neurons:' + IntToStr(FNeurons.Count) +
      ' Output:' + IntToStr(FOutput.Size) +
      ' Expected output:' + IntToStr(pOutput.Size) +
      ' Error:' + IntToStr(FOutputError.Size) +
      ' Error times Deriv:' + IntToStr(FOutputErrorDeriv.Size)
    );
  end;
end;

Sometimes, in the literature, the error is called DELTA. In CAI, when you find a variable named OutputErrorDeriv, it means the error multiplied by the derivative of the output. In other APIs, you may find that the variable used for ERROR starts with the character "D" standing for "Delta".

In short, for calculating the SOFTMAX error, it's just the difference from the desired output to the current output. It's easy to spend hours of meditation thinking that what drives the learning is an error.

I hope it helps and may the source be with you.

@joaopauloschuler joaopauloschuler added the question Further information is requested label Mar 9, 2022
@joaopauloschuler joaopauloschuler self-assigned this Mar 9, 2022
@Memnarch
Copy link
Author

Wow, thanks for the response. I'll read through the docs once I got some sleep (1:40 am 😅)
But I just had to get my first try on Convolution with backpropagation working (and get some featuremaps of it)
I'll come back, if I have more questions, if it's ok? (Btw do you know of any communyties, for writing these so I don't have to flood your inbox^^")

grafik

@joaopauloschuler
Copy link
Owner

joaopauloschuler commented Mar 10, 2022

I can see that you classified a cat with 97% of probability. Please feel free to ask. If it's in the scope of CAI API or neural networks in general, I can try to reply.

@Memnarch
Copy link
Author

I read the doc regarding the gradient tape. really interesting, but I'd say not in scope for implementing in Delphi, sadly (for now).
So i'll keep the manual approach as it's simpler for me (and needs me to refresh some old math stuff. Read some more about deriving a function and it's clicking again).

One topic I read about the other day was about the different methods of optimizing weights. SGD being what I do right now. Adam, however, seems to be a popular one. Looking into it, it said that it keeps track of past weights to influence future calculations. Something that did not seem to be explained explicitly was the scope of storing weights. My current gues is, within the Dense layer it is per "Neuron" and within a ConvolutionLayer it's per kernel (not filter). Do you have any clue if i am right?
And does Adam need to get the Bias as one element of the list of weights in one go? (Neuron having 3 weights and a bias would give it w1, w2, w3 and b1 in one list and gradients gw1, gw2, gw3 and gb1 as the second.

While weights and biases have different names, it seems, when it comes to value changes, they run through the pipeline at the same time. Which means Adam would take all 4 into it's calculation for later use.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants