Key plots

In this project we studied how different RL algos explore the parameters space of the policy in order to explain the differences in performance on the Pendulum benchmark. New version of Olivier Sigaud's code https://github.com/osigaud/Basic-Policy-Gradient-Labs that includes CEM (Cross Entropy Method), new visualisations, and a Beta policy. You can find slides of a short presentation about the kind of experiments you can do with this code here (in French).

Key plots

Evolution of the episodic reward of CEM and PG on Pendulum:

Visualization of the reward landscape over the parameters space around the five first policies learned by CEM and PG:

Visualization of the reward landscape over the parameters space around a good policy learned by the CEM:

Code usage

Comparison between Policy Gradient and Cross Entropy Method

To launch a comparison between Policy Gradient (reinforce or custom PG) and CEM you can use:

python3 main.py --experiment comparison --env_name Pendulum-v0 --policy_type normal --nb_cycles 100 --nb_repet 1 --nb_eval 1 --eval_freq 20 --nb_trajs_cem 1 --reinforce True --nb_trajs_pg 20 --population 15 --lr_actor 1e-4

Plots and models are found in the /data folder.

Study of PG

For classic reinforce use --reinforce True. Otherwise you can build your own policy gradient algorithm like this for exemple:

python3 main.py --experiment pg --env_name Pendulum-v0 --policy_type normal --critic_update_method dataset --study_name discount --gamma 0.99 --lr_critic 1e-2 --gradients sum+baseline --critic_estim_method td --nb_trajs_pg 20

Study of CEM

To study the CEM you can use:

python3 main.py --experiment cem --population 20 --elites_frac 0.2 --sigma 1 --nb_trajs_cem 2

Simple expert policy on Pendulum-v0

python3 simple_eval_expert.py

Dependencies

pip install -r requirements.txt

TODO

Compatibility between CEM and Bernoulli Policy.
Fix plot axes and problem of last eval.
Make a "CustomNetwork" class with dim of NN and policy type as arguments.
Compatibility of simple_rendering.py with all types of policy.
Rebuild code for comparison so it runs both algo independently and then build plots.
Rethink compatibility with Vignettes.
Make better wrapper for Beta policy actions.
Make sure everything is in English
Translate slides of results in English

Name		Name	Last commit message	Last commit date
Latest commit History 204 Commits
Models		Models
RAPPORT		RAPPORT
critics		critics
data		data
make_visu_cem		make_visu_cem
my_gym		my_gym
policies		policies
starting_policy		starting_policy
visu		visu
wrappers		wrappers
README.md		README.md
algo.py		algo.py
arguments.py		arguments.py
batch.py		batch.py
chrono.py		chrono.py
environment.py		environment.py
episode.py		episode.py
evaluator.py		evaluator.py
main.py		main.py
main_beta.py		main_beta.py
main_nstep.py		main_nstep.py
main_regress.py		main_regress.py
mountain_car_expert.py		mountain_car_expert.py
requirements.txt		requirements.txt
simple_eval_expert.py		simple_eval_expert.py
simple_rendering.py		simple_rendering.py
simu.py		simu.py
slowBar.py		slowBar.py

KohlerHECTOR/CrossEntropyMethod-VS-PolicyGradient

Folders and files

Latest commit

History

Repository files navigation

Key plots

Evolution of the episodic reward of CEM and PG on Pendulum:

Visualization of the reward landscape over the parameters space around the five first policies learned by CEM and PG:

Visualization of the reward landscape over the parameters space around a good policy learned by the CEM:

Code usage

Comparison between Policy Gradient and Cross Entropy Method

Study of PG

Study of CEM

Simple expert policy on Pendulum-v0

Dependencies

TODO

About

Topics

Resources

Stars

Watchers

Forks

Languages