Implement e-prop #226

Jegp · 2021-07-20T11:50:52Z

Implement e-prop as in https://arxiv.org/pdf/1901.09049.pdf, see also https://github.com/IGITUGraz/eligibility_propagation for an implementation

Huizerd · 2021-08-08T15:28:06Z

I think this can be implemented rather easily by adding a detach item to e.g. LIFParameters, where detach=False would mean BPTT, and detach=True would mean e-prop or local learning. As demonstrated here, inside the LIF function we would then have a detached variant of the previous spikes that is then used for the reset (and for the recurrent connections if there are any). It would mean an if statement in each neuron function though; possibly this could be moved to the SNNCell etc. parents.

Let me know what you think, I can then work on a draft.

skiwan · 2021-08-17T08:11:19Z

I think the general idea could work
Im not sure if we would need the detach for anything but the 'if' statement

but with the current norse architecture we could probably follow the super spike approach

https://github.com/norse/norse/blob/master/norse/torch/functional/threshold.py

adding eprop as a new method for the threshold function

Than in the forward we just save the input tensor for the backward pass
and in the backward of the eprop we could do the computation of the pseudo derivative and the eligibility vector and then combine the vector with the learning signal which can be passed into the backward function.

Does that sounds feasible to you @Huizerd

With that approach we would also avoid the case where we have to add something to all SNNCell types but instead let the autograd function take care of this.

Huizerd · 2021-08-17T09:47:09Z

@skiwan I don't follow you, could you elaborate/give an example? :) How could you specify when to block/not block gradients with the same call to the threshold?

skiwan · 2021-08-17T10:33:39Z

Im also quite new to this (eprop as well as how pytorch and autograd works) so I might not be 100% correct with how I am thinking

but right now for example with the LIF layer
Within the LIF parameters we could say
LIFParameter(..., method='eprop')

Than similar to the superspike method https://github.com/norse/norse/blob/master/norse/torch/functional/superspike.py

We can do all the calculations needed for eprop there. Does that make more sense? Or could you explain a bit more in detail from your side how you would approach this

Jegp · 2021-08-20T10:01:00Z

If we're keen on stopping gradients, I agree this could/should be handled on a module level. And we're actually right now forcing the voltage tensor to retain the gradient graph on a module level. What I'm not 100% clear on is how it composes with other layers. If you detach the gradient, won't it kill any contribution going forward? Meaning, you won't be able to update, say, a Linear layer that precedes a spiking layer.

If that's the case, could it be solved by storing the previous tensor, cloning it in an E-Prop module, and then reusing the previous tensor in a future backwards step, like so? Is that close to what you meant @Huizerd?

             Linear                     LIF                   Output
Forward:        x ---------------------> y           --------> z
                      |
                      \--------> a = x.detach()

                                       
                           /-- e = lif_grad_output   <------- z.backward(loss)
                           |
                     a.backward(e)
Backward:     <--------/

Huizerd · 2021-08-20T10:36:03Z

e-prop only blocks gradients between timesteps (for the reset for instance), not between layers. Think of it as a 1-step truncated BPTT. So @Jegp I think this shouldn't be a problem for other layers in the network.

@Jegp I was always wondering about that state.v.requires_grad = True actually. Could you explain why it's there?

@skiwan The SuperSpike in Norse is just the shape of their surrogate gradient, not the actual learning method they implement in that paper. Your're right in that the learning methods of SuperSpike and e-prop are similar. I didn't look into their implementation though; I only know that e-prop can be done by blocking gradients in certain places.

Jegp · 2021-08-21T11:34:28Z

I see, that makes a lot of sense, actually. And if it's only related to the previous spikes, wouldn't it be possible to even create a wrapper module that "just" detaches the output spikes and caches them in the state?

The line exists purely for convenience. The state tensors are, theoretically, leaf tensors in that they are not strictly associated with the module and their gradients won't accumulate. This is basically one way of forcing Torch to include the state tensors in the gradient computations (https://pytorch.org/docs/stable/notes/autograd.html). I'm not ecstatic about the solution, so I'm curious to know whether there are better ways of achieving the same.

Huizerd · 2021-08-24T05:23:07Z

I see, that makes a lot of sense, actually. And if it's only related to the previous spikes, wouldn't it be possible to even create a wrapper module that "just" detaches the output spikes and caches them in the state?

I guess there could be an extended state that also includes detached spikes, because the problem is that you need both attached and detached spikes (following the original e-prop implementation). Not sure if this would be nice, and you would still need some check in the neuron function on which spikes to use where I think. Another solution could be to have an entirely different function for e-prop, say lif_step_eprop, but that would mean adding such a function for each neuron type, which seems a bit ugly.

The line exists purely for convenience. The state tensors are, theoretically, leaf tensors in that they are not strictly associated with the module and their gradients won't accumulate. This is basically one way of forcing Torch to include the state tensors in the gradient computations (https://pytorch.org/docs/stable/notes/autograd.html). I'm not ecstatic about the solution, so I'm curious to know whether there are better ways of achieving the same.

I noticed that, for e.g. _lif_feed_forward_step_jit, if I compute current first and then use the new current in the voltage computation (instead of voltage first with old current, and then update current), the line is not needed for gradients to be tracked. I guess this is because the input_tensor is the result of a multiplication with an nn.Parameter in the nn.Linear before, and if we immediately use this in current and voltage, everything is propagated correctly, whereas doing voltage first with a leaf current, this won't happen.

cpehle · 2022-02-23T12:29:13Z

This https://github.com/ChFrenkel/eprop-PyTorch, might be a good starting point.

Jegp added enhancement New feature or request hackathon codejam Issues to be addresses during Code Jam and removed hackathon labels Jul 20, 2021

cpehle added this to the Release 0.1.1 milestone Jan 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement e-prop #226

Implement e-prop #226

Jegp commented Jul 20, 2021

Huizerd commented Aug 8, 2021 •

edited

skiwan commented Aug 17, 2021

Huizerd commented Aug 17, 2021

skiwan commented Aug 17, 2021

Jegp commented Aug 20, 2021

Huizerd commented Aug 20, 2021

Jegp commented Aug 21, 2021

Huizerd commented Aug 24, 2021

cpehle commented Feb 23, 2022

Implement e-prop #226

Implement e-prop #226

Comments

Jegp commented Jul 20, 2021

Huizerd commented Aug 8, 2021 • edited

skiwan commented Aug 17, 2021

Huizerd commented Aug 17, 2021

skiwan commented Aug 17, 2021

Jegp commented Aug 20, 2021

Huizerd commented Aug 20, 2021

Jegp commented Aug 21, 2021

Huizerd commented Aug 24, 2021

cpehle commented Feb 23, 2022

Huizerd commented Aug 8, 2021 •

edited