[QUESTION] How P(τ;θ) disappeared while estimating the gradients using trajectory samples? #495

ritwikmishra · 2024-02-29T15:24:21Z

I am referring to the gradient derivation here.

The paragraph where the instructor claimed "we can approximate the likelihood ratio policy gradient with sample-based estimate" then term of P(τ;θ) (probability of trajectory τ given the parameters θ) disappeared in the subsequent summation. Why?

I asked the same question on the discord study-group (here) but got no response.

simoninithomas · 2024-03-05T09:55:10Z

Hey there 👋

So P(tau;theta) is The probability of a trajectory but we can't have it. Since it would imply to know the environment dynamics (state dist)

If you look at the formulas after what we do is:

Replace P(τ;θ) (impossible to calculate)
With

where tau(i) is a sampled trajectory

Don't hesitate to take a piece of paper and write each part step by step to understand better. It's how I've did it.

ritwikmishra · 2024-03-08T06:02:58Z

@simoninithomas
I am sorry but it is still unclear to me. My doubt is... how we jumped from this

to this

shouldn't it be as follows:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[QUESTION] How P(τ;θ) disappeared while estimating the gradients using trajectory samples? #495

[QUESTION] How P(τ;θ) disappeared while estimating the gradients using trajectory samples? #495

ritwikmishra commented Feb 29, 2024

simoninithomas commented Mar 5, 2024

ritwikmishra commented Mar 8, 2024 •

edited

[QUESTION] How P(τ;θ) disappeared while estimating the gradients using trajectory samples? #495

[QUESTION] How P(τ;θ) disappeared while estimating the gradients using trajectory samples? #495

Comments

ritwikmishra commented Feb 29, 2024

simoninithomas commented Mar 5, 2024

ritwikmishra commented Mar 8, 2024 • edited

ritwikmishra commented Mar 8, 2024 •

edited