Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Examples of using differentiable least squares #254

Open
zeroAska opened this issue Jul 3, 2023 · 16 comments
Open

Examples of using differentiable least squares #254

zeroAska opened this issue Jul 3, 2023 · 16 comments
Labels
question Further information is requested

Comments

@zeroAska
Copy link

zeroAska commented Jul 3, 2023

馃摎 The doc issue

In the provided examples, the least square problem optimizes over all the parameters. However, in some applications, parts of the parameters are from the neural network and should be optimized with SGD, while the others can be directly optimized by the least square solvers. In Theseus, this is specified by the "inner loop" and "outer loop". Does the current version of PyPose support this?

Suggest a potential alternative/fix

Provide an example that the state space is a neural network to learn and the pose is to be optimized by least square solvers.

@wang-chen
Copy link
Member

@zeroAska Yes, it is supported. You may do something like

opt1 = SGD(net1.parameters())
opt2 = LM(net2, strategy=strategy)
for i in range(epochs):
    opt1.step()
    for j in range(iterations):
        opt2.step(input)

Bi-level optimization like this will be directly supported in a future release.

@zeroAska
Copy link
Author

zeroAska commented Jul 3, 2023

Thanks for the quick response. If net1 and net2 is the same nn.Module layer with different subset of parameters, is there a way to specify which subset of parameters is used for LM and SGD repectively?

@wang-chen
Copy link
Member

You can use net.module1.parameters() and net.module2 to achieve this.

@zeroAska
Copy link
Author

zeroAska commented Jul 5, 2023

Thanks!! In the above net.module2's least square problem, there is a pose LieTensor in nn.Module.parameters whose initial value might need manual assignment for each problem and for each training example. An example of such application is the visual odometry where we will need to train the image encoder and perform least squares over the poses. How to specify the parameter's init value each time considering that it is a parameter in nn.Module?

Another question is that, if a batch of training data have different poses, are we able to multiple each pose with its corresponding data as a batch and launch different least square problems within a batch? For example, in a batch of 2, we have the pose batch [pose1, pose2] and we want to act on the batch of [image1, image2] to obtain [ pose1 @ image1, pose2 @ image2]

@wang-chen
Copy link
Member

For initialization, it has no difference from a neural network, you may perform in-place value assignment for module parameters, e.g. net.module.weight1.data.fill_(value) before solving the problem. More information is here.

For the second question, if you mean each time you want to activate different parameters for a LM problem to solve, PyPose currently doesn't directly support this because LM or GN doesn't work for stochastic inputs, as it doesn't use gradient descent, so it will not converge as solutions will jump far away from the last iteration. However, technically you can do it by defining different optimizers for different parameters.

@zeroAska
Copy link
Author

zeroAska commented Jul 5, 2023

Many thanks!

@zeroAska zeroAska closed this as completed Jul 5, 2023
@zeroAska zeroAska reopened this Jul 13, 2023
@zeroAska
Copy link
Author

As a followup question, for the above outer-inner loop setup, as the prediction is coming from the least squares, how is its gradient w.r.t ground truth is propagated through the least square layer?

@wang-chen
Copy link
Member

We suggest only retaining the gradients from the last iteration for the inner optimization, as it will be more efficient and equivalent to back-propagating through the inner iterative optimization. More details you may refer to Sec 3.4 of this paper.

@wang-chen
Copy link
Member

wang-chen commented Jul 13, 2023

We suggest only retaining the gradients from the last iteration for the inner optimization, as it will be more efficient and equivalent to back-propagating through the inner iterative optimization. More details you may refer to Sec 3.4 of this paper.

An easy way to do this is to perform one more model forward after inner optimization, then do outer optimization.

@zeroAska
Copy link
Author

Thanks for the paper link. I will check it out.

@zeroAska
Copy link
Author

In the provided paper above, does the bi-level optimization (or the inner/outer loop) share the same loss? If the two stages have different loss to optimize, can we still use the trick of keeping the last iteration's gradients? For example, the inner loop that optimizes the pose might have a label-free loss, while the outer loop that optimizes the network parameters might have a supervised loss.

@wang-chen
Copy link
Member

wang-chen commented Jul 19, 2023

They don't have to have the same loss. Another example having the different loss functions is this paper.

@Neutronpanp
Copy link

We suggest only retaining the gradients from the last iteration for the inner optimization, as it will be more efficient and equivalent to back-propagating through the inner iterative optimization. More details you may refer to Sec 3.4 of this paper.

I noticed that the optimizers in pypose are set to be @torch.no_grad() (e.g. in optim.GN.step, optim.LM.step), so how can I back-propagate the gradient through the optimizers to the front-end neural network?

@wang-chen
Copy link
Member

We suggest only retaining the gradients from the last iteration for the inner optimization, as it will be more efficient and equivalent to back-propagating through the inner iterative optimization. More details you may refer to Sec 3.4 of this paper.

I noticed that the optimizers in pypose are set to be @torch.no_grad() (e.g. in optim.GN.step, optim.LM.step), so how can I back-propagate the gradient through the optimizers to the front-end neural network?

After optimization, we suggest performing another forward operation for the loss, so that it can be backpropagated through the inner-level optimization with only one iteration.
For example, in the MPC example: In Line 231, we don't retain gradient, but then in Line 293, we perform another round of LQR, which bypass the multiple iterations and saves the computing time.

@zeroAska
Copy link
Author

zeroAska commented Aug 16, 2023

If the outer level loss is a supervised loss, does the outer level's gradient propagation method in the paper still hold?

@pyposebot
Copy link

pyposebot commented Aug 16, 2023 via email

@wang-chen wang-chen added the question Further information is requested label Aug 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants