Average-Reward-TD-Q-Learning

This repository contains the source code to reproduce all the numerical experiments as described in the paper "Finite Sample Analysis of Average-Reward TD Learning and Q-Learning".

Here's a BibTeX entry that you can use to cite it in a publication:

@inproceedings{
zhang2021finite,
title={Finite Sample Analysis of Average-Reward {TD} Learning and \$Q\$-Learning},
author={Sheng Zhang and Zhe Zhang and Siva Theja Maguluri},
booktitle={Thirty-Fifth Conference on Neural Information Processing Systems},
year={2021},
url={https://openreview.net/forum?id=1Rxp-demAH0}
}

Requirements

Python (>= 3.7)
Numpy (>= 1.19.1)

Usage

Different TD fixed Points

Show the average-reward TD( $\lambda$ ) with linear function approximation algorithm converges to different TD fixed points starting from different initial points.

python different_TD_fixed_points.py

Rate of Convergence

Show the rate of convergence of the average-reward TD( $\lambda$ ) with linear function approximation using diminishing step sizes for $\lambda \in \{0, 0.2, 0.4, 0.8\}$ .

python rate_of_convergence.py

Maintainer

Sheng Zhang - shengzhang@gatech.edu

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
LICENSE		LICENSE
README.md		README.md
different_TD_fixed_points.py		different_TD_fixed_points.py
feature_matrix.py		feature_matrix.py
gain_and_bias.py		gain_and_bias.py
randomMRP.py		randomMRP.py
rate_of_convergence.py		rate_of_convergence.py
stationary_distribution.py		stationary_distribution.py
theta_star.py		theta_star.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LICENSE

LICENSE

README.md

README.md

different_TD_fixed_points.py

different_TD_fixed_points.py

feature_matrix.py

feature_matrix.py

gain_and_bias.py

gain_and_bias.py

randomMRP.py

randomMRP.py

rate_of_convergence.py

rate_of_convergence.py

stationary_distribution.py

stationary_distribution.py

theta_star.py

theta_star.py

Repository files navigation

Average-Reward-TD-Q-Learning

Requirements

Usage

Different TD fixed Points

Rate of Convergence

Maintainer

About

Releases

Packages

Languages

License

xiaojianzhang/Average-Reward-TD-Q-Learning

Folders and files

Latest commit

History

Repository files navigation

Average-Reward-TD-Q-Learning

Requirements

Usage

Different TD fixed Points

Rate of Convergence

Maintainer

About

Topics

Resources

License

Stars

Watchers

Forks

Languages