DQN Experience

Simple Double DQN implementation which can learn solely based on a list of trajectories without an environement. This can be helpful in offline reinforcement learning where trajectories are generated by a lagging policy, which results in a decoupling of training and experience collection.

Note that the policy used to generate the trajectories should be relatively up-to-date.

Since this is a private project and not meant for the public, the code is not the cleanest and also the performance could be lacking. This implementation is CPU only.

Usage

An example is provided here, which converges to the optimal q values for the described scenario.

Installation

pip install git+https://github.com/webertim/dqn_experience.git@master

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
dqn_experience		dqn_experience
example		example
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dqn_experience

dqn_experience

example

example

.gitignore

.gitignore

README.md

README.md

requirements.txt

requirements.txt

setup.py

setup.py

Repository files navigation

DQN Experience

Usage

Installation

About

Releases

Packages

Languages

webertim/dqn_experience

Folders and files

Latest commit

History

Repository files navigation

DQN Experience

Usage

Installation

About

Resources

Stars

Watchers

Forks

Languages