Draft: DroQ and TD3+TQC jax implementation #272

araffin · 2022-09-16T13:16:04Z

Description

FYI: unpolished jax implementation of TD3+DroQ and TD3+TQC implementations.
Related to #262 #258
My plan is to try to have sac in jax, but currently jax rely on tensorflow for probability distributions :/
So I adapted TD3 instead.
I also want to make it even faster but would need to tweak a bit the way the replay buffer is used.

EDIT: apparently tfd doesn't depends on tf anymore for latest version: https://www.tensorflow.org/probability/examples/TensorFlow_Probability_on_JAX

Reference:

EDIT: SBX = SB3 + JAX: https://github.com/araffin/sbx

~~Known difference with original implementation: qf are updated at the same time of the actor instead of after each gradient step.~~

Types of changes

Bug fix
New feature
New algorithm
Documentation

Checklist:

I've read the CONTRIBUTION guide (required).
I have ensured pre-commit run --all-files passes (required).
I have updated the documentation and previewed the changes via mkdocs serve.
I have updated the tests accordingly (if applicable).

If you are adding new algorithms or your change could result in performance difference, you may need to (re-)run tracked experiments. See #137 as an example PR.

This reverts commit df61ae5.

This reverts commit 23e4d3b.

This reverts commit f0cc8ff.

vercel · 2022-09-16T13:16:09Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Updated
cleanrl	✅ Ready (Inspect)	Visit Preview	Sep 24, 2022 at 4:50PM (UTC)

This reverts commit d5704b3.

vwxyzjn

👀 how does Adan perform?

This reverts commit 047c314.

araffin · 2022-09-18T20:19:05Z

eyes how does Adan perform?

Results are very preliminary, ADAN performs on par or slightly better than ADAM, but nothing significant yet.
The noticeable difference is the FPS though (adan slower, for instance 100 FPS vs 130 FPS).
Btw, I managed to JIT the for loop =) it goes 2x faster now but results are different than without jit 👀 (not worse/better, just different)

vwxyzjn · 2022-09-19T14:50:19Z

FYI https://github.com/deepmind/distrax might be a better replacement for tensorflow probability

joaogui1 · 2022-09-19T19:56:58Z

fwiw you can also use tensorflow_probability with a jax backend and then you don't need to use tensorflow at all (in one of their tutorials they even explicitly unninstall tf)

araffin · 2022-09-23T19:50:43Z

@vwxyzjn Good news, I've got a TQC + SAC version working =) (currently doing some runs)

@joaogui1 thanks, I gave distrax a try but it was giving me weird errors, and at the end it still depends on tf proba (which doesn't require tensorflow as I learned =)), so I switched to tf proba ;)

araffin · 2022-09-29T12:47:24Z

Fyi, I converted that single file to a proof of concept of SB3 + Jax (SBX): https://github.com/araffin/sbx
the nice thing is that I'm re-using SB3 base class, which means it has access to saving/loading/scikit interface/callbacks and soon the RL zoo =)

araffin added 12 commits September 6, 2022 23:11

Clone file

33df709

Fixes and reformating

6ed7655

Add dropout and layernorm

6a289ca

Add evaluation and tqdm progress bar

d3ef56b

Different dropout keys

9704f1d

Separate q network target update

f0cc8ff

Try to jit the for loop

23e4d3b

Add no jit train version

df61ae5

Revert "Add no jit train version"

f7b4e7c

This reverts commit df61ae5.

Revert "Try to jit the for loop"

373aabb

This reverts commit 23e4d3b.

Revert "Separate q network target update"

85fa143

This reverts commit f0cc8ff.

TQC + TD3 + DroQ first attempt

60f63e1

vercel bot deployed to Preview September 16, 2022 13:16 View deployment

araffin added 2 commits September 16, 2022 16:16

Add number of quantiles to drop as param

44f3a9b

Fixes and reformat

5156d78

vercel bot deployed to Preview September 16, 2022 14:21 View deployment

n_units as param

8aaca4f

vercel bot deployed to Preview September 16, 2022 15:17 View deployment

araffin added 2 commits September 17, 2022 11:24

Add train method

aabf789

JIT train loop

cc74d9e

vercel bot deployed to Preview September 17, 2022 10:25 View deployment

Debug jit

8058979

vercel bot deployed to Preview September 18, 2022 17:00 View deployment

Cleanup + faster eval

99686c8

vercel bot deployed to Preview September 18, 2022 17:21 View deployment

Try ADAN

d5704b3

vercel bot deployed to Preview September 18, 2022 17:53 View deployment

Revert "Try ADAN"

047c314

This reverts commit d5704b3.

vercel bot deployed to Preview September 18, 2022 19:12 View deployment

vwxyzjn reviewed Sep 18, 2022

View reviewed changes

araffin added 2 commits September 18, 2022 21:37

Revert "Revert "Try ADAN""

443dc71

This reverts commit 047c314.

Sort important and Try ADAN again

8f3beec

This reverts commit 047c314.

vercel bot deployed to Preview September 18, 2022 19:38 View deployment

Back to ADAM

940a4b6

vercel bot deployed to Preview September 18, 2022 20:29 View deployment

araffin added 2 commits September 19, 2022 19:29

Rename file

bcfee18

Add fast eval for TD3 + DroQo

d68b262

vercel bot deployed to Preview September 19, 2022 17:35 View deployment

araffin added 4 commits September 23, 2022 19:16

Add buggy sac implementation

70aa57d

Bug fixes and faster sampling (still not working)

21361c3

Bug fixes, SAC now workingo

c883386

Cleanup

f455b4e

vercel bot deployed to Preview September 23, 2022 19:40 View deployment

araffin mentioned this pull request Sep 23, 2022

Question about the paper/implementation ikostrikov/walk_in_the_park#3

Open

Match DroQ implementation

7eb2c4f

vercel bot deployed to Preview September 24, 2022 16:50 View deployment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Draft: DroQ and TD3+TQC jax implementation #272

Draft: DroQ and TD3+TQC jax implementation #272

araffin commented Sep 16, 2022 •

edited

vercel bot commented Sep 16, 2022 •

edited

vwxyzjn left a comment

araffin commented Sep 18, 2022 •

edited

vwxyzjn commented Sep 19, 2022

joaogui1 commented Sep 19, 2022

araffin commented Sep 23, 2022

araffin commented Sep 29, 2022

Draft: DroQ and TD3+TQC jax implementation #272

Are you sure you want to change the base?

Draft: DroQ and TD3+TQC jax implementation #272

Conversation

araffin commented Sep 16, 2022 • edited

Description

Types of changes

Checklist:

vercel bot commented Sep 16, 2022 • edited

vwxyzjn left a comment

Choose a reason for hiding this comment

araffin commented Sep 18, 2022 • edited

vwxyzjn commented Sep 19, 2022

joaogui1 commented Sep 19, 2022

araffin commented Sep 23, 2022

araffin commented Sep 29, 2022

araffin commented Sep 16, 2022 •

edited

vercel bot commented Sep 16, 2022 •

edited

araffin commented Sep 18, 2022 •

edited