- DSAC Implementation using ChainerRL.
- Our Soft Actor Critic is based on ChainerRL implementation.
- continuous action space
- discrete action space
pip install -r requirements.txt
-
Training Soft Q Imitation Learning (SQIL) and DSAC
python train_sqil.py [options]
--load-demo [dirname]
: replay buffer of demonstrations--absorb
: with absorbing state wrapper--reward_func
: use not constant rewards but generated rewards by a reward function.
e.g.) DSAC with absorbing state wrapper in AntBulletEnv-v0 (random seed = 1)
python train_sqil.py --env AntBulletEnv-v0 --load-demo demos/4_episode/AntBulletEnv-v0 --absorb --reward-func --seed 1
python >= 3.7 and please see requirements.txt
If you'd like to use GPU, please pip install cupy-cudaOO
In relation to your version of cuda OO
, please see the webpage of cupy.