This repository contains the source code to reproduce all the numerical experiments as described in the paper "Finite Sample Analysis of Average-Reward TD Learning and Q-Learning".
Here's a BibTeX entry that you can use to cite it in a publication:
@inproceedings{
zhang2021finite,
title={Finite Sample Analysis of Average-Reward {TD} Learning and \$Q\$-Learning},
author={Sheng Zhang and Zhe Zhang and Siva Theja Maguluri},
booktitle={Thirty-Fifth Conference on Neural Information Processing Systems},
year={2021},
url={https://openreview.net/forum?id=1Rxp-demAH0}
}
- Python (>= 3.7)
- Numpy (>= 1.19.1)
Show the average-reward TD() with linear function approximation algorithm converges to different TD fixed points starting from different initial points.
python different_TD_fixed_points.py
Show the rate of convergence of the average-reward TD() with linear function approximation using diminishing step sizes for .
python rate_of_convergence.py