Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[QUESTION] How P(τ;θ) disappeared while estimating the gradients using trajectory samples? #495

Open
ritwikmishra opened this issue Feb 29, 2024 · 2 comments

Comments

@ritwikmishra
Copy link

I am referring to the gradient derivation here.

The paragraph where the instructor claimed "we can approximate the likelihood ratio policy gradient with sample-based estimate" then term of P(τ;θ) (probability of trajectory τ given the parameters θ) disappeared in the subsequent summation. Why?

I asked the same question on the discord study-group (here) but got no response.

@simoninithomas
Copy link
Member

Hey there 👋
image

So P(tau;theta) is The probability of a trajectory but we can't have it. Since it would imply to know the environment dynamics (state dist)

If you look at the formulas after what we do is:

  • Replace P(τ;θ) (impossible to calculate)
  • With
    image
    where tau(i) is a sampled trajectory

Don't hesitate to take a piece of paper and write each part step by step to understand better. It's how I've did it.

@ritwikmishra
Copy link
Author

ritwikmishra commented Mar 8, 2024

@simoninithomas
I am sorry but it is still unclear to me. My doubt is... how we jumped from this
image

to this
image

shouldn't it be as follows:

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants