Skip to content

xiaojianzhang/Average-Reward-TD-Q-Learning

Repository files navigation

Average-Reward-TD-Q-Learning

This repository contains the source code to reproduce all the numerical experiments as described in the paper "Finite Sample Analysis of Average-Reward TD Learning and Q-Learning".

Here's a BibTeX entry that you can use to cite it in a publication:

@inproceedings{
zhang2021finite,
title={Finite Sample Analysis of Average-Reward {TD} Learning and \$Q\$-Learning},
author={Sheng Zhang and Zhe Zhang and Siva Theja Maguluri},
booktitle={Thirty-Fifth Conference on Neural Information Processing Systems},
year={2021},
url={https://openreview.net/forum?id=1Rxp-demAH0}
}

Requirements

  • Python (>= 3.7)
  • Numpy (>= 1.19.1)

Usage

Different TD fixed Points

Show the average-reward TD() with linear function approximation algorithm converges to different TD fixed points starting from different initial points.

python different_TD_fixed_points.py

Rate of Convergence

Show the rate of convergence of the average-reward TD() with linear function approximation using diminishing step sizes for .

python rate_of_convergence.py

Maintainer

About

Code for the numerical experiments in Zhang, Sheng, Zhe Zhang, and Siva Theja Maguluri. "Finite Sample Analysis of Average-Reward TD Learning and Q-Learning."

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages