The E-step and stop_gradient #1

Germanunkol · 2017-11-24T13:04:46Z

Nice to see there's already an implementation of this!

I just stumbled across tensorflow's "stop_gradient" function. In the examples of where the function might be needed, they mention "The EM algorithm where the M-step should not involve backpropagation through the output of the E-step."

Does this also apply when using the EM algorithm for routing? I don't think I read anything about this in the paper, but then again the paper is very sparse with information about the backpropagation...
Not calculating the gradients for the E-step might considerably speed up training, I believe.
Thoughts?

gyang274 · 2017-11-24T17:20:04Z

I agree, there should be no backpropagation through the EM algorithm, other than learning the beta_a and beta_v. Will modify. Thanks.

andrewsonga · 2020-08-02T06:39:54Z

@Germanunkol @gyang274 may I ask, why would not calculating gradients for the E-step might considerably speed up training? If we're unrolling multiple EM iterations, wouldn't blocking gradients to E-step prevent gradients from flowing to earlier EM iterations?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The E-step and stop_gradient #1

The E-step and stop_gradient #1

Germanunkol commented Nov 24, 2017 •

edited

gyang274 commented Nov 24, 2017

andrewsonga commented Aug 2, 2020

The E-step and stop_gradient #1

The E-step and stop_gradient #1

Comments

Germanunkol commented Nov 24, 2017 • edited

gyang274 commented Nov 24, 2017

andrewsonga commented Aug 2, 2020

Germanunkol commented Nov 24, 2017 •

edited