Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] ReinforcementLearning.jl integration #9

Draft
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

rejuvyesh
Copy link
Contributor

@rejuvyesh rejuvyesh commented Mar 9, 2022

I realized that CommonRLInterface.jl never settled on what to do with continuous action spaces, so directly integrating with RLBase from ReinforcementLearning.jl.

Will add tests and examples with PPO and DDPG.

@codecov-commenter
Copy link

codecov-commenter commented Mar 9, 2022

Codecov Report

Merging #9 (eb379f6) into main (f9b2fd1) will decrease coverage by 0.09%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##             main       #9      +/-   ##
==========================================
- Coverage   92.41%   92.31%   -0.10%     
==========================================
  Files          81       81              
  Lines        3823     3761      -62     
==========================================
- Hits         3533     3472      -61     
+ Misses        290      289       -1     
Impacted Files Coverage Δ
src/Dojo.jl 100.00% <ø> (ø)
src/orientation/quaternion.jl 82.92% <0.00%> (-5.41%) ⬇️
src/orientation/mapping.jl 36.36% <0.00%> (-5.31%) ⬇️
src/contacts/utilities.jl 40.00% <0.00%> (-2.86%) ⬇️
src/joints/rotational/input.jl 42.10% <0.00%> (-1.49%) ⬇️
src/joints/joint.jl 88.88% <0.00%> (-0.51%) ⬇️
src/contacts/impact.jl 86.20% <0.00%> (-0.46%) ⬇️
src/bodies/set.jl 94.54% <0.00%> (-0.37%) ⬇️
src/joints/rotational/springs.jl 97.29% <0.00%> (-0.14%) ⬇️
src/utilities/methods.jl 96.66% <0.00%> (-0.11%) ⬇️
... and 17 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5563639...eb379f6. Read the comment docs.

Comment on lines 31 to 34
actor = Chain(
Dense(ns, 256, relu; init = glorot_uniform(rng)),
Dense(256, na; init = glorot_uniform(rng)),
),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that you are using the discrete version of PPO here. But the cart pole env here seems to be a continuous version. (The actions space is [-1.0, 1.0]). So you may take reference from https://github.com/JuliaReinforcementLearning/ReinforcementLearning.jl/blob/935f68b6cb378f9929a8d9914eb388e86213c86d/src/ReinforcementLearningExperiments/deps/experiments/experiments/Policy%20Gradient/JuliaRL_PPO_Pendulum.jl#L43-L50

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! Thanks for checking in. Although currently I also need to define the reward/cost function for cartpole on Dojo side.

@janbruedigam
Copy link
Member

We should probably rethink the interface to ReinforcementLearning.jl once their updates are done (JuliaReinforcementLearning/ReinforcementLearning.jl#614)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants