Examples of using differentiable least squares #254

zeroAska · 2023-07-03T23:09:14Z

📚 The doc issue

In the provided examples, the least square problem optimizes over all the parameters. However, in some applications, parts of the parameters are from the neural network and should be optimized with SGD, while the others can be directly optimized by the least square solvers. In Theseus, this is specified by the "inner loop" and "outer loop". Does the current version of PyPose support this?

Suggest a potential alternative/fix

Provide an example that the state space is a neural network to learn and the pose is to be optimized by least square solvers.

wang-chen · 2023-07-03T23:19:24Z

@zeroAska Yes, it is supported. You may do something like

opt1 = SGD(net1.parameters())
opt2 = LM(net2, strategy=strategy)
for i in range(epochs):
    opt1.step()
    for j in range(iterations):
        opt2.step(input)

Bi-level optimization like this will be directly supported in a future release.

zeroAska · 2023-07-03T23:32:43Z

Thanks for the quick response. If net1 and net2 is the same nn.Module layer with different subset of parameters, is there a way to specify which subset of parameters is used for LM and SGD repectively?

wang-chen · 2023-07-03T23:34:34Z

You can use net.module1.parameters() and net.module2 to achieve this.

zeroAska · 2023-07-05T18:24:34Z

Thanks!! In the above net.module2's least square problem, there is a pose LieTensor in nn.Module.parameters whose initial value might need manual assignment for each problem and for each training example. An example of such application is the visual odometry where we will need to train the image encoder and perform least squares over the poses. How to specify the parameter's init value each time considering that it is a parameter in nn.Module?

Another question is that, if a batch of training data have different poses, are we able to multiple each pose with its corresponding data as a batch and launch different least square problems within a batch? For example, in a batch of 2, we have the pose batch [pose1, pose2] and we want to act on the batch of [image1, image2] to obtain [ pose1 @ image1, pose2 @ image2]

wang-chen · 2023-07-05T18:49:01Z

For initialization, it has no difference from a neural network, you may perform in-place value assignment for module parameters, e.g. net.module.weight1.data.fill_(value) before solving the problem. More information is here.

For the second question, if you mean each time you want to activate different parameters for a LM problem to solve, PyPose currently doesn't directly support this because LM or GN doesn't work for stochastic inputs, as it doesn't use gradient descent, so it will not converge as solutions will jump far away from the last iteration. However, technically you can do it by defining different optimizers for different parameters.

zeroAska · 2023-07-05T18:53:43Z

Many thanks!

zeroAska · 2023-07-13T01:36:46Z

As a followup question, for the above outer-inner loop setup, as the prediction is coming from the least squares, how is its gradient w.r.t ground truth is propagated through the least square layer?

wang-chen · 2023-07-13T01:51:51Z

We suggest only retaining the gradients from the last iteration for the inner optimization, as it will be more efficient and equivalent to back-propagating through the inner iterative optimization. More details you may refer to Sec 3.4 of this paper.

wang-chen · 2023-07-13T01:56:08Z

We suggest only retaining the gradients from the last iteration for the inner optimization, as it will be more efficient and equivalent to back-propagating through the inner iterative optimization. More details you may refer to Sec 3.4 of this paper.

An easy way to do this is to perform one more model forward after inner optimization, then do outer optimization.

zeroAska · 2023-07-13T06:39:58Z

Thanks for the paper link. I will check it out.

zeroAska · 2023-07-18T23:20:33Z

In the provided paper above, does the bi-level optimization (or the inner/outer loop) share the same loss? If the two stages have different loss to optimize, can we still use the trick of keeping the last iteration's gradients? For example, the inner loop that optimizes the pose might have a label-free loss, while the outer loop that optimizes the network parameters might have a supervised loss.

wang-chen · 2023-07-19T00:15:39Z

They don't have to have the same loss. Another example having the different loss functions is this paper.

Neutronpanp · 2023-08-16T13:40:39Z

We suggest only retaining the gradients from the last iteration for the inner optimization, as it will be more efficient and equivalent to back-propagating through the inner iterative optimization. More details you may refer to Sec 3.4 of this paper.

I noticed that the optimizers in pypose are set to be @torch.no_grad() (e.g. in optim.GN.step, optim.LM.step), so how can I back-propagate the gradient through the optimizers to the front-end neural network?

wang-chen · 2023-08-16T15:36:48Z

We suggest only retaining the gradients from the last iteration for the inner optimization, as it will be more efficient and equivalent to back-propagating through the inner iterative optimization. More details you may refer to Sec 3.4 of this paper.

I noticed that the optimizers in pypose are set to be @torch.no_grad() (e.g. in optim.GN.step, optim.LM.step), so how can I back-propagate the gradient through the optimizers to the front-end neural network?

After optimization, we suggest performing another forward operation for the loss, so that it can be backpropagated through the inner-level optimization with only one iteration.
For example, in the MPC example: In Line 231, we don't retain gradient, but then in Line 293, we perform another round of LQR, which bypass the multiple iterations and saves the computing time.

zeroAska · 2023-08-16T23:18:52Z

If the outer level loss is a supervised loss, does the outer level's gradient propagation method in the paper still hold?

pyposebot · 2023-08-16T23:52:13Z

Yes, Supervised loss is an easier case.

zeroAska closed this as completed Jul 5, 2023

zeroAska reopened this Jul 13, 2023

wang-chen added the question Further information is requested label Aug 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Examples of using differentiable least squares #254

Examples of using differentiable least squares #254

zeroAska commented Jul 3, 2023

wang-chen commented Jul 3, 2023

zeroAska commented Jul 3, 2023 •

edited

wang-chen commented Jul 3, 2023

zeroAska commented Jul 5, 2023 •

edited

wang-chen commented Jul 5, 2023

zeroAska commented Jul 5, 2023

zeroAska commented Jul 13, 2023

wang-chen commented Jul 13, 2023

wang-chen commented Jul 13, 2023 •

edited

zeroAska commented Jul 13, 2023

zeroAska commented Jul 18, 2023

wang-chen commented Jul 19, 2023 •

edited

Neutronpanp commented Aug 16, 2023

wang-chen commented Aug 16, 2023

zeroAska commented Aug 16, 2023 •

edited

pyposebot commented Aug 16, 2023 via email •

edited by wang-chen

Examples of using differentiable least squares #254

Examples of using differentiable least squares #254

Comments

zeroAska commented Jul 3, 2023

📚 The doc issue

Suggest a potential alternative/fix

wang-chen commented Jul 3, 2023

zeroAska commented Jul 3, 2023 • edited

wang-chen commented Jul 3, 2023

zeroAska commented Jul 5, 2023 • edited

wang-chen commented Jul 5, 2023

zeroAska commented Jul 5, 2023

zeroAska commented Jul 13, 2023

wang-chen commented Jul 13, 2023

wang-chen commented Jul 13, 2023 • edited

zeroAska commented Jul 13, 2023

zeroAska commented Jul 18, 2023

wang-chen commented Jul 19, 2023 • edited

Neutronpanp commented Aug 16, 2023

wang-chen commented Aug 16, 2023

zeroAska commented Aug 16, 2023 • edited

pyposebot commented Aug 16, 2023 via email • edited by wang-chen

zeroAska commented Jul 3, 2023 •

edited

zeroAska commented Jul 5, 2023 •

edited

wang-chen commented Jul 13, 2023 •

edited

wang-chen commented Jul 19, 2023 •

edited

zeroAska commented Aug 16, 2023 •

edited

pyposebot commented Aug 16, 2023 via email •

edited by wang-chen