Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About ft_optim grad from ft_loss #13

Open
remiMZ opened this issue Apr 29, 2020 · 6 comments
Open

About ft_optim grad from ft_loss #13

remiMZ opened this issue Apr 29, 2020 · 6 comments

Comments

@remiMZ
Copy link

remiMZ commented Apr 29, 2020

Hi, I reproduced your code and found that ft_loss did not produce a gradient in the film layer, so how does your learning to learn update ft_optim?

@hytseng0509
Copy link
Owner

You can refer to the implementation in methods/backbone.py and methods/LFTNet.py.

@dori2063
Copy link

Hi, thank you for sharing your efforts.
I tried to understand the eq.(6) and (7) through your code, but i guess failed.
As i understand "ft_loss" calculated without ft layers in methods/LFTNet.py-line137.
And the line138 tries to calculate and update the gradients of the ft layers.
However, as I know it is not possible because the ft layer was not used.
How can it possible to calculate the gradients of ft layer?

@dori2063
Copy link

dori2063 commented Sep 2, 2020

Thanks to your response in the openreview, I guess understood. Thank you!

@lianglele185
Copy link

c=

Thanks to your response in the openreview, I guess understood. Thank you!

can you explain why the ft layer can be updated without using it

@dori2063
Copy link

The ft layer is not used for pu, while the ft layer used for ps is used for pu. (In the end, ft layer is used for pu.)
In https://openreview.net/forum?id=SJl5Np4tPr, the author's sentence "the updated model used to calculate~" made me understand. You can think about it with how defined and used "self.ft_optim" in methods/LFTNet.py.

@LavieLuo
Copy link

Hi, thank you for sharing this information. It confuses me so much.

The ft layer is not used for pu, while the ft layer used for ps is used for pu. (In the end, ft layer is used for pu.)

After reading your reply, I guess it can be understood by "the ps and up share a single ft layer (i.e., the same parameter $theta_f$), which is only optimized on pu"?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants