You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We exposed the class member td_error in the TDControlAgent in an attempt to log algorithm progress, but the values seem noisy and do not converge even when the agent finds an excellent representation. Is td_error hidden for a reason; are we missing something by trying to track it?
The text was updated successfully, but these errors were encountered:
The variable is the td-error with respect to the last observed transition. This quantity does not converge except for MDPs with deterministic transitions. The expectation of the td-error converges if the policy / Q-function has converged. You can use that as an indicator for convergence. You could for example compute a (rolling) average of this the variable and check that.
We exposed the class member td_error in the TDControlAgent in an attempt to log algorithm progress, but the values seem noisy and do not converge even when the agent finds an excellent representation. Is td_error hidden for a reason; are we missing something by trying to track it?
The text was updated successfully, but these errors were encountered: