feature(xrk): add q-transformer #783

rongkunxue · 2024-03-22T07:33:12Z

Description

Related Issue

TODO

Check List

merge the latest version source branch/repo, and resolve all the conflicts
pass style check
pass all the tests

PaParaZz1 · 2024-03-22T11:21:11Z

ding/policy/__init__.py

@@ -19,6 +19,7 @@
 from .ppo import PPOPolicy, PPOPGPolicy, PPOOffPolicy
 from .sac import SACPolicy, DiscreteSACPolicy, SQILSACPolicy
 from .cql import CQLPolicy, DiscreteCQLPolicy
+from .qtransformer import QtransformerPolicy 


QTransformerPolicy

PaParaZz1 · 2024-03-22T11:22:11Z

dizoo/d4rl/entry/d4rl_qtransformer_main.py

+from ding.entry import serial_pipeline_offline
+from ding.config import read_config
+from pathlib import Path
+from ding.model.template.qtransformer import QTransformer


import from the secondary directory, such as:

from ding.model import QTransformer

PaParaZz1 · 2024-04-02T11:01:06Z

dizoo/d4rl/config/hopper_expert_qtransformer_config.py

+            alpha=0.2,
+            discount_factor_gamma=0.9,
+            min_reward = 0.1,
+            auto_alpha=False,


remove unused fields like this

PaParaZz1 · 2024-04-02T11:03:10Z

ding/policy/qtransformer.py

+            update_type='momentum',
+            update_kwargs={'theta': self._cfg.learn.target_theta}
+        )
+        self._low = np.array(self._cfg.other["low"])


we don't need low and high here, We always think that the action value range in the policy is [-1,1]

PaParaZz1 · 2024-04-02T11:05:53Z

dizoo/d4rl/config/hopper_expert_qtransformer_config.py

+        cuda=True,
+        model=dict(
+            num_actions = 3,
+            action_bins = 256,


this action_bins field is not used in policy

PaParaZz1 · 2024-04-02T11:18:07Z

ding/policy/qtransformer.py

+        selected = t.gather(-1, indices)
+        return rearrange(selected, '... 1 -> ...')
+
+    def _discretize_action(self, actions):


we can optimize this for loop:

action_values = np.linspace(-1, 1, 8)[np.newaxis, ...].repeat(4, 0) action_values = torch.as_tensor(action_values).to(self._device) diff = (actions.unsqueeze(-1) - action_values.unsqueeze(0)) ** 2 indices = diff.argmin(-1)

PaParaZz1 · 2024-04-02T11:21:49Z

ding/policy/qtransformer.py

+        actions = data['action']
+
+        #get q
+        num_timesteps, device = states.shape[1], states.device   


use self._device, which is the default member variable of Policy

PaParaZz1 · 2024-04-09T07:26:39Z

ding/policy/qtransformer.py

+import torch
+import torch.nn.functional as F
+from torch.distributions import Normal, Independent
+from ema_pytorch import EMA


remove unused third party libraries

PaParaZz1 · 2024-04-09T07:26:56Z

ding/policy/qtransformer.py

+
+from pathlib import Path
+from functools import partial
+from contextlib import nullcontext


polish imports

PaParaZz1 · 2024-04-09T07:27:08Z

ding/policy/qtransformer.py

+
+from torchtyping import TensorType
+
+from einops import rearrange, repeat, pack, unpack


add einops in setup.py

PaParaZz1 · 2024-04-09T07:27:53Z

ding/policy/qtransformer.py

+from einops import rearrange, repeat, pack, unpack
+from einops.layers.torch import Rearrange
+
+from beartype import beartype


we will not use beartype to validate runtime types in the current version, thus remove it in this PR

PaParaZz1 · 2024-04-09T07:29:02Z

ding/model/template/qtransformer.py

@@ -0,0 +1,753 @@
+from random import random
+from functools import partial, cache


cache is the new feature in python3.9, for compatibility, you should implement it as follows:

try: from functools import cache # only in Python >= 3.9 except ImportError: from functools import lru_cache cache = lru_cache(maxsize=None)

…tput; more pannel to see

make it can use

c0416af

PaParaZz1 added the algo Add new algorithm or improve old one label Mar 22, 2024

PaParaZz1 mentioned this pull request Mar 22, 2024

Roadmap for DI-engine #548

Open

rongkunxue added 4 commits March 29, 2024 06:13

change config to fit

8ab5da8

good use

b12714e

change all framework

066ff45

good use for eval

5988d14

PaParaZz1 requested changes Apr 2, 2024

View reviewed changes

add q_value

0875c3f

PaParaZz1 requested changes Apr 9, 2024

View reviewed changes

rongkunxue added 11 commits April 10, 2024 12:05

change action_bin to 8 with best control; init q weight for middle ou…

cf51545

…tput; more pannel to see

Merge branch 'opendilab:main' into q_transformner

90b3dbb

Merge branch 'opendilab:main' into q_transformner

0446efe

polish code

f309121

change it

8eff2ef

polish code for init

191fe53

polish config

33554e7

add more high and low with action_bin

81bea50

polish import

4fe9db0

polish import

1839ded

Merge branch 'opendilab:main' into q_transformner

be60d5c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feature(xrk): add q-transformer #783

feature(xrk): add q-transformer #783

rongkunxue commented Mar 22, 2024

PaParaZz1 Mar 22, 2024

PaParaZz1 Mar 22, 2024

PaParaZz1 Apr 2, 2024

PaParaZz1 Apr 2, 2024

PaParaZz1 Apr 2, 2024

PaParaZz1 Apr 2, 2024

PaParaZz1 Apr 2, 2024

PaParaZz1 Apr 9, 2024

PaParaZz1 Apr 9, 2024

PaParaZz1 Apr 9, 2024

PaParaZz1 Apr 9, 2024

PaParaZz1 Apr 9, 2024


		from torchtyping import TensorType

		from einops import rearrange, repeat, pack, unpack

		@@ -0,0 +1,753 @@
		from random import random
		from functools import partial, cache

feature(xrk): add q-transformer #783

Are you sure you want to change the base?

feature(xrk): add q-transformer #783

Conversation

rongkunxue commented Mar 22, 2024

Description

Related Issue

TODO

Check List

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment