Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement e-prop #226

Open
Jegp opened this issue Jul 20, 2021 · 9 comments
Open

Implement e-prop #226

Jegp opened this issue Jul 20, 2021 · 9 comments
Labels
codejam Issues to be addresses during Code Jam enhancement New feature or request
Milestone

Comments

@Jegp
Copy link
Member

Jegp commented Jul 20, 2021

Implement e-prop as in https://arxiv.org/pdf/1901.09049.pdf, see also https://github.com/IGITUGraz/eligibility_propagation for an implementation

@Jegp Jegp added enhancement New feature or request hackathon codejam Issues to be addresses during Code Jam and removed hackathon labels Jul 20, 2021
@Huizerd
Copy link
Contributor

Huizerd commented Aug 8, 2021

I think this can be implemented rather easily by adding a detach item to e.g. LIFParameters, where detach=False would mean BPTT, and detach=True would mean e-prop or local learning. As demonstrated here, inside the LIF function we would then have a detached variant of the previous spikes that is then used for the reset (and for the recurrent connections if there are any). It would mean an if statement in each neuron function though; possibly this could be moved to the SNNCell etc. parents.

Let me know what you think, I can then work on a draft.

@skiwan
Copy link

skiwan commented Aug 17, 2021

I think the general idea could work
Im not sure if we would need the detach for anything but the 'if' statement

but with the current norse architecture we could probably follow the super spike approach

https://github.com/norse/norse/blob/master/norse/torch/functional/threshold.py

adding eprop as a new method for the threshold function

Than in the forward we just save the input tensor for the backward pass
and in the backward of the eprop we could do the computation of the pseudo derivative and the eligibility vector and then combine the vector with the learning signal which can be passed into the backward function.

Does that sounds feasible to you @Huizerd

With that approach we would also avoid the case where we have to add something to all SNNCell types but instead let the autograd function take care of this.

@Huizerd
Copy link
Contributor

Huizerd commented Aug 17, 2021

@skiwan I don't follow you, could you elaborate/give an example? :) How could you specify when to block/not block gradients with the same call to the threshold?

@skiwan
Copy link

skiwan commented Aug 17, 2021

Im also quite new to this (eprop as well as how pytorch and autograd works) so I might not be 100% correct with how I am thinking

but right now for example with the LIF layer
Within the LIF parameters we could say
LIFParameter(..., method='eprop')

Than similar to the superspike method https://github.com/norse/norse/blob/master/norse/torch/functional/superspike.py

We can do all the calculations needed for eprop there. Does that make more sense? Or could you explain a bit more in detail from your side how you would approach this

@Jegp
Copy link
Member Author

Jegp commented Aug 20, 2021

If we're keen on stopping gradients, I agree this could/should be handled on a module level. And we're actually right now forcing the voltage tensor to retain the gradient graph on a module level. What I'm not 100% clear on is how it composes with other layers. If you detach the gradient, won't it kill any contribution going forward? Meaning, you won't be able to update, say, a Linear layer that precedes a spiking layer.

If that's the case, could it be solved by storing the previous tensor, cloning it in an E-Prop module, and then reusing the previous tensor in a future backwards step, like so? Is that close to what you meant @Huizerd?

             Linear                     LIF                   Output
Forward:        x ---------------------> y           --------> z
                      |
                      \--------> a = x.detach()

                                       
                           /-- e = lif_grad_output   <------- z.backward(loss)
                           |
                     a.backward(e)
Backward:     <--------/

@Huizerd
Copy link
Contributor

Huizerd commented Aug 20, 2021

e-prop only blocks gradients between timesteps (for the reset for instance), not between layers. Think of it as a 1-step truncated BPTT. So @Jegp I think this shouldn't be a problem for other layers in the network.

@Jegp I was always wondering about that state.v.requires_grad = True actually. Could you explain why it's there?

@skiwan The SuperSpike in Norse is just the shape of their surrogate gradient, not the actual learning method they implement in that paper. Your're right in that the learning methods of SuperSpike and e-prop are similar. I didn't look into their implementation though; I only know that e-prop can be done by blocking gradients in certain places.

@Jegp
Copy link
Member Author

Jegp commented Aug 21, 2021

I see, that makes a lot of sense, actually. And if it's only related to the previous spikes, wouldn't it be possible to even create a wrapper module that "just" detaches the output spikes and caches them in the state?

The line exists purely for convenience. The state tensors are, theoretically, leaf tensors in that they are not strictly associated with the module and their gradients won't accumulate. This is basically one way of forcing Torch to include the state tensors in the gradient computations (https://pytorch.org/docs/stable/notes/autograd.html). I'm not ecstatic about the solution, so I'm curious to know whether there are better ways of achieving the same.

@Huizerd
Copy link
Contributor

Huizerd commented Aug 24, 2021

I see, that makes a lot of sense, actually. And if it's only related to the previous spikes, wouldn't it be possible to even create a wrapper module that "just" detaches the output spikes and caches them in the state?

I guess there could be an extended state that also includes detached spikes, because the problem is that you need both attached and detached spikes (following the original e-prop implementation). Not sure if this would be nice, and you would still need some check in the neuron function on which spikes to use where I think. Another solution could be to have an entirely different function for e-prop, say lif_step_eprop, but that would mean adding such a function for each neuron type, which seems a bit ugly.

The line exists purely for convenience. The state tensors are, theoretically, leaf tensors in that they are not strictly associated with the module and their gradients won't accumulate. This is basically one way of forcing Torch to include the state tensors in the gradient computations (https://pytorch.org/docs/stable/notes/autograd.html). I'm not ecstatic about the solution, so I'm curious to know whether there are better ways of achieving the same.

I noticed that, for e.g. _lif_feed_forward_step_jit, if I compute current first and then use the new current in the voltage computation (instead of voltage first with old current, and then update current), the line is not needed for gradients to be tracked. I guess this is because the input_tensor is the result of a multiplication with an nn.Parameter in the nn.Linear before, and if we immediately use this in current and voltage, everything is propagated correctly, whereas doing voltage first with a leaf current, this won't happen.

@cpehle
Copy link
Member

cpehle commented Feb 23, 2022

This https://github.com/ChFrenkel/eprop-PyTorch, might be a good starting point.

@cpehle cpehle added this to the Release 0.1.1 milestone Jan 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
codejam Issues to be addresses during Code Jam enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants