Skip to content

webertim/dqn_experience

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DQN Experience

Simple Double DQN implementation which can learn solely based on a list of trajectories without an environement. This can be helpful in offline reinforcement learning where trajectories are generated by a lagging policy, which results in a decoupling of training and experience collection.

Note that the policy used to generate the trajectories should be relatively up-to-date.

Since this is a private project and not meant for the public, the code is not the cleanest and also the performance could be lacking. This implementation is CPU only.

Usage

An example is provided here, which converges to the optimal q values for the described scenario.

Installation

pip install git+https://github.com/webertim/dqn_experience.git@master

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages