Implementing an adjoint calculation for backprop-ing through time #1

ianwilliamson · 2019-04-24T16:33:57Z

Should consider the performance benefit of implementing an adjoint calculation for the backward pass through the forward() method in WaveCell. This would potentially save us on memory during gradient computation because pytorch doesn't need to construct as large of a graph.

The approach is described here: https://pytorch.org/docs/stable/notes/extending.html

The text was updated successfully, but these errors were encountered:

parenthetical-e · 2019-08-25T17:56:32Z

Sorry to pop in, but on the off and maybe small chance you folks haven’t seen this lib/paper:

https://github.com/rtqichen/torchdiffeq
https://arxiv.org/pdf/1806.07366.pdf

Implements a ODE solver and uses adjoint methods for the backward pass. This is what you need?

I was already thinking about porting WaveCell to it for my own use. Collaborate?

ianwilliamson · 2019-08-26T16:08:39Z

Thanks for your interest in this! We are aware of that paper, but unfortunately we can't apply the scheme they propose here because the wave equation with loss (from the absorbing layer) is not reversible.

The "adjoint calculation" I'm referring to here is basically just hard coding the gradient for the time step using the pytorch API documented here: https://pytorch.org/docs/stable/notes/extending.html The motivation for this is that we can potentially save a bunch of memory because pytorch doesn't need to store the fields at every sub-operation of each time step. However, it still needs to store the fields at each time step (there's no getting around this when the differential equation isn't reversible.) In contrast, the neural ODE paper reconstructs these fields by reversing the forward equation during backpropagation, thus, avoiding the need to store the fields from the forward pass.

We actually have this adjoint approach implemented, I just need to push the commits to this repository.

ianwilliamson · 2019-08-26T16:09:53Z

I'm definitely interested to learn about your project and what you hope to do. We would certainly be open to collaboration if there's an opportunity.

parenthetical-e · 2019-08-26T17:05:09Z

Thanks for your interest in this! We are aware of that paper, but unfortunately we can't apply the scheme they propose here because the wave equation with loss (from the absorbing layer) is not reversible.

The "adjoint calculation" I'm referring to here is basically just hard coding the gradient for the time step using the pytorch API documented here: https://pytorch.org/docs/stable/notes/extending.html The motivation for this is that we can potentially save a bunch of memory because pytorch doesn't need to store the fields at every sub-operation of each time step. However, it still needs to store the fields at each time step (there's no getting around this when the differential equation isn't reversible.) In contrast, the neural ODE paper reconstructs these fields by reversing the forward equation during backpropagation, thus, avoiding the need to store the fields from the forward pass.

Ah. I understand. Thanks for the explanation.

parenthetical-e · 2019-08-26T17:23:05Z

I sent you an email about the project I'm pondering. :)

twhughes · 2019-08-26T19:27:02Z

Hey Eric could you forward that email to me as well please? Im interested in what you have planned. Thanks!

…

On Tue, Aug 27, 2019, 2:23 AM Erik ***@***.***> wrote: I sent you an email about the project I'm pondering. :) — You are receiving this because you were assigned. Reply to this email directly, view it on GitHub <#1?email_source=notifications&email_token=ABLIFNMGO4P43WSW5JW5JJ3QGQGPVA5CNFSM4HIF33BKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5FBLUA#issuecomment-524948944>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABLIFNLAPRTLAKHHUS3IY4LQGQGPVANCNFSM4HIF33BA> .

parenthetical-e · 2019-08-27T18:50:27Z

Done, @twhughes

ianwilliamson · 2019-09-20T21:38:10Z

This is now partially implemented. Currently, the individual time step is a primitive. This seems to help with memory utilization during training, especially with nonlinearity. Perhaps we could investigate if there would be significant performance benefits from adjoint-ing the time loop as well.

ianwilliamson added the enhancement New feature or request label Apr 24, 2019

twhughes self-assigned this Apr 30, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implementing an adjoint calculation for backprop-ing through time #1

Implementing an adjoint calculation for backprop-ing through time #1

ianwilliamson commented Apr 24, 2019 •

edited

parenthetical-e commented Aug 25, 2019

ianwilliamson commented Aug 26, 2019

ianwilliamson commented Aug 26, 2019

parenthetical-e commented Aug 26, 2019

parenthetical-e commented Aug 26, 2019

twhughes commented Aug 26, 2019 via email

parenthetical-e commented Aug 27, 2019

ianwilliamson commented Sep 20, 2019

Implementing an adjoint calculation for backprop-ing through time #1

Implementing an adjoint calculation for backprop-ing through time #1

Comments

ianwilliamson commented Apr 24, 2019 • edited

parenthetical-e commented Aug 25, 2019

ianwilliamson commented Aug 26, 2019

ianwilliamson commented Aug 26, 2019

parenthetical-e commented Aug 26, 2019

parenthetical-e commented Aug 26, 2019

twhughes commented Aug 26, 2019 via email

parenthetical-e commented Aug 27, 2019

ianwilliamson commented Sep 20, 2019

ianwilliamson commented Apr 24, 2019 •

edited