Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial augmented state on time #218

Open
rkqzw opened this issue Feb 10, 2023 · 2 comments
Open

Initial augmented state on time #218

rkqzw opened this issue Feb 10, 2023 · 2 comments

Comments

@rkqzw
Copy link

rkqzw commented Feb 10, 2023

Hi,

Thanks for your great work!
I am trying to understand why initial augmented state on time is $-\frac{\partial L}{\partial t_1}$, not $\frac{\partial L}{\partial t_1}$.
(For me, $\frac{\partial L}{\partial t_1}$ seems to be reasonable as the initial augmented state on time.)

I have checked Algorithm 2 in the original paper, codes in this repository and some codes and documents written by other people, but can't find explanations on the $-\frac{\partial L}{\partial t_1}$.

Could you explain the reason?

Thanks!

similar issue: #199

@rtqichen
Copy link
Owner

rtqichen commented Mar 6, 2023

Does the conversation from #166 help?

The quantity you're seeing is for computing dL/dt0, which intuitively, has the opposite gradient direction compared to t1, because t0 shortens the integration time interval when t0 is increased. The initial value for this gradient, when t0 = t1, is the negative of dL/dt1.

@rkqzw
Copy link
Author

rkqzw commented Mar 10, 2023

Thank you so much for your reply!

Does the conversation from #166 help?

The quantity you're seeing is for computing dL/dt0, which intuitively, has the opposite gradient direction compared to t1, because t0 shortens the integration time interval when t0 is increased. The initial value for this gradient, when t0 = t1, is the negative of dL/dt1.

This helps me to understand the initial value for dL/dt0 calculation includes the negative of dL/dt1, and raises another question. Let me explain it.

L described in Eq(3) in the original paper depends on z(t0) and the integration of f.

$L({\bf z}(t_1)) = L \left( {\bf z}(t_0) + \int_{t_0}^{t_1} f({\bf z}(t), t, \theta) dt \right)$

I think that t1 affects only the integration but t0 affects both z(t0) and the integration, so gradients via z(t0) should be considered in the initial value for dL/dt0 calculation.
To my understanding, the conversation from #166 and your reply refer to gradients via the integration only.

Could you explain why the gradients via z(t0) can be ignored?

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants