Releases · takuseno/d3rlpy

11 May 09:38

takuseno

v2.5.0

bad85ac

Release v2.5.0 Latest

Latest

New Algorithm

Cal-QL has been added to d3rlpy in v2.5.0! Please check a reproduction script here. To support faithful reproduction, SparseRewardTransitionPicker has been also added, which is used in the reproduction script.

Custom Algorithm Example

One of the frequent questions is "How can I implement a custom algorithm on top of d3rlpy?". Now, the new example script has been added to answer this question. Based on this example, you can build your own algorithm while you can utilize a whole training pipeline provided by d3rlpy. Please check the script here.

Enhancement

Exporting Decision Transformer models as TorchScript and ONNX has been implemented. You can use this feature via save_policy method in the same way as you use with Q-learning algorithms.
Tuple observation support has been added to PyTorch/ONNX export.
Modified return-to-go calculation for Q-learning algorithms and skip this calculation if return-to-go is not necessary.
n_updates option has been added to fit_online method to control update-to-data (UTD) ratio.
write_at_termination option has been added to ReplayBuffer.

Bugfix

Action scaling has been fixed for D4RL datasets.
Default replay buffer creation at fix_online method has been fixed.

Assets 2

18 Feb 03:57

takuseno

v2.4.0

c31ad8a

Release v2.4.0

Tuple observations

In v2.4.0, d3rlpy supports tuple observations.

import numpy as np
import d3rlpy

observations = [np.random.random((1000, 100)), np.random.random((1000, 32))]
actions = np.random.random((1000, 4))
rewards = np.random.random((1000, 1))
terminals = np.random.randint(2, size=(1000, 1))
dataset = d3rlpy.dataset.MDPDataset(
    observations=observations,
    actions=actions,
    rewards=rewards,
    terminals=terminals,
)

You can find an example script here

Enhancements

logging_steps and logging_strategy options have been added to fit and fit_online methods (thanks, @claudius-kienle )
Logging with WanDB has been supported. (thanks, @claudius-kienle )
Goal-conditioned envs in Minari have been supported.

Bugfix

Fix errors for distributed training.
OPE documentation has been fixed.

Contributors

claudius-kienle

Assets 2

02 Dec 08:30

takuseno

v2.3.0

b4290f8

Release v2.3.0

Distributed data parallel training

Distributed data parallel training with multiple nodes and GPUs has been one of the most demanded feature. Now, it's finally available! It's extremely easy to use this feature.

Example:

# train.py
from typing import Dict

import d3rlpy


def main() -> None:
    # GPU version:
    # rank = d3rlpy.distributed.init_process_group("nccl")
    rank = d3rlpy.distributed.init_process_group("gloo")
    print(f"Start running on rank={rank}.")

    # GPU version:
    # device = f"cuda:{rank}"
    device = "cpu:0"

    # setup algorithm
    cql = d3rlpy.algos.CQLConfig(
        actor_learning_rate=1e-3,
        critic_learning_rate=1e-3,
        alpha_learning_rate=1e-3,
    ).create(device=device)

    # prepare dataset
    dataset, env = d3rlpy.datasets.get_pendulum()

    # disable logging on rank != 0 workers
    logger_adapter: d3rlpy.logging.LoggerAdapterFactory
    evaluators: Dict[str, d3rlpy.metrics.EvaluatorProtocol]
    if rank == 0:
        evaluators = {"environment": d3rlpy.metrics.EnvironmentEvaluator(env)}
        logger_adapter = d3rlpy.logging.FileAdapterFactory()
    else:
        evaluators = {}
        logger_adapter = d3rlpy.logging.NoopAdapterFactory()

    # start training
    cql.fit(
        dataset,
        n_steps=10000,
        n_steps_per_epoch=1000,
        evaluators=evaluators,
        logger_adapter=logger_adapter,
        show_progress=rank == 0,
        enable_ddp=True,
    )

    d3rlpy.distributed.destroy_process_group()


if __name__ == "__main__":
    main()

You need to use torchrun command to start training, which should be already installed once you install PyTorch.

$ torchrun \
   --nnodes=1 \
   --nproc_per_node=3 \
   --rdzv_id=100 \
   --rdzv_backend=c10d \
   --rdzv_endpoint=localhost:29400 \
   train.py

In this case, 3 processes will be launched and start training loop. DecisionTransformer-based algorithms also support this distributed training feature.

The example is also available here

Minari support (thanks, @grahamannett !)

Minari is an OSS library to provide a standard format of offline reinforcement learning datasets. Now, d3rlpy provides an easy access to this library.

You can install Minari via d3rlpy CLI.

$ d3rlpy install minari

Example:

import d3rlpy

dataset, env = d3rlpy.datasets.get_minari("antmaze-umaze-v0")

iql = d3rlpy.algos.IQLConfig(
    actor_learning_rate=3e-4,
    critic_learning_rate=3e-4,
    batch_size=256,
    weight_temp=10.0,
    max_weight=100.0,
    expectile=0.9,
    reward_scaler=d3rlpy.preprocessing.ConstantShiftRewardScaler(shift=-1),
).create(device="cpu:0")

iql.fit(
    dataset,
    n_steps=1000000,
    n_steps_per_epoch=100000,
    evaluators={"environment": d3rlpy.metrics.EnvironmentEvaluator(env)},
)

Minimize redundant computes

From this version, calculation of some algorithms are optimized to remove redundant inference. Therefore, especially algorithms with dual optimization such as SAC and CQL became extremely faster than the previous version.

Enhancements

GoalConcatWrapper has been added to support goal-conditioned environments.
return_to_go has been added to Transition and TransitionMiniBatch
MixedReplayBuffer has been added to sample two experiences from multiple buffers with arbitrary ratio.
initial_temperature supports 0 at DiscreteSAC.

Bugfix

Getting started page has been fixed.

Contributors

grahamannett

Assets 2

24 Oct 11:30

takuseno

v2.2.0

f9bde31

Release v2.2.0

Algorithm

DiscreteDecisionTransformer, a Decision Transformer implementation for discrete action-space, has been finally implemented in v2.2.0! The reduction results with Atari 2600 are available here.

import d3rlpy

dataset, env = d3rlpy.datasets.get_cartpole()

dt = d3rlpy.algos.DiscreteDecisionTransformerConfig(
    batch_size=64,
    num_heads=1,
    learning_rate=1e-4,
    max_timestep=1000,
    num_layers=3,
    position_encoding_type=d3rlpy.PositionEncodingType.SIMPLE,
    encoder_factory=d3rlpy.models.VectorEncoderFactory([128], exclude_last_activation=True),
    observation_scaler=d3rlpy.preprocessing.StandardObservationScaler(),
    context_size=20,
    warmup_tokens=100000,
).create()

dt.fit(
    dataset,
    n_steps=100000,
    n_steps_per_epoch=1000,
    eval_env=env,
    eval_target_return=500,
)

Enhancement

Expose action_size and action_space options for manual dataset creation #338
FrameStackTrajectorySlicer has been added.

Refactoring

Typing check of numpy is enabled. Some parts of codes differentiate data types of numpy arrays, which is checked by mypy.

Bugfix

Device error at AWAC #341
Invalid batch.intervals #346
- ⚠️ This fix is important to retain the performance of Q-learning algorithms since v1.1.1.

Assets 2

02 Sep 11:29

takuseno

v2.1.0

ea64d17

Release v2.1.0

Upgrade PyTorch to v2

From this version, d3rlpy requires PyTorch v2 (v1 still may partially work). To do this, the minimum Python version has been bumped to 3.8. This change allows d3rlpy to utilize more advanced features such as torch.compile in the upcoming releases.

Healthcheck

From this version, d3rlpy diagnoses dependency health automatically. In this version, the version of Gym is checked to make sure you have installed the correct version of Gym.

Gymnasium support

d3rlpy now supports Gymnasium as well as Gym. You can use it just same as Gym. Please check example for the further details.

d3rlpy install command

To make your life easier, d3rlpy provides d3rlpy install commands to install additional dependencies. This is the part of d3rlpy CLI. Please check docs for the further details.

$ d3rlpy install atari  # Atari 2600 dependencies
$ d3rlpy install d4rl_atari  # Atari 2600 + d4rl-atari dependencies
$ d3rlpy install d4rl  # D4RL dependencies

Refactoring

In this version, the internal design has been refactored. The algorithm implementation and the way to assign models are mainly refactored. ⚠️ Because of this change, the previously saved models might be incompatible to load in this version.

Enhancement

Added Jupyter Notebook for TPU on Google Colaboratory.
Added d3rlpy.notebook_utils to provide utilities for Jupyter Notebook.
Updated notebook link #313 (thanks @asmith26 !)

Bugfix

Fixed typo docstrings #316 (thanks @asmith26 !)
Fixed docker build #311 (thanks @HassamSheikh !)

Contributors

asmith26 and HassamSheikh

Assets 2

21 Jul 08:49

takuseno

v2.0.4

d8edc4d

Release v2.0.4

Bugfix

Fix DiscreteCQL loss metrics #298
Fix dump ReplayBuffer #299
Fix InitialStateValueEstimationEvaluator #301
Fix rendering interface to match the latest Gym version #302

To the rendering fix, I recommend you reinstall d4rl-atari if you use it.

$ pip install -U git+https://github.com/takuseno/d4rl-atari

Assets 2

18 Jul 13:54

takuseno

v2.0.3

f931720

Release v2.0.3

An emergency patch to fix a bug of predict_value method #297 .

Assets 2

18 Jul 07:22

takuseno

v2.0.2

45687d1

Release v2.0.2

The major update has been finally released! Since the start of the project, this project has earned almost 1K GitHub stars ⭐ , which is a great milestone of d3rlpy. In this update, there are many major changes.

Upgrade Gym version

From this version, d3rlpy only supports the latest Gym version 0.26.0. This change allows us to support Gymnasium in the future update.

Algorithm

Clear separation between configuration and algorithm

From this version, each algorithm (e.g. "DQN") has a config class (e.g. "DQNConfig"). This allows us to serialize and deserialize algorithms as described later.

dqn = d3rlpy.algos.DQNConfig(learning_rate=3e-4).create(device="cuda:0")

Decision Transformer

Decision Transformer is finally available! You can check reproduction code to see how to use it.

import d3rlpy

dataset, env = d3rlpy.datasets.get_pendulum()

dt = d3rlpy.algos.DecisionTransformerConfig(
    batch_size=64,
    learning_rate=1e-4,
    optim_factory=d3rlpy.models.AdamWFactory(weight_decay=1e-4),
    encoder_factory=d3rlpy.models.VectorEncoderFactory(
        [128],
        exclude_last_activation=True,
    ),
    observation_scaler=d3rlpy.preprocessing.StandardObservationScaler(),
    reward_scaler=d3rlpy.preprocessing.MultiplyRewardScaler(0.001),
    context_size=20,
    num_heads=1,
    num_layers=3,
    warmup_steps=10000,
    max_timestep=1000,
).create(device="cuda:0")

dt.fit(
    dataset,
    n_steps=100000,
    n_steps_per_epoch=1000,
    save_interval=10,
    eval_env=env,
    eval_target_return=0.0,
)

Serialization

In this version, d3rlpy introduces a compact serialization, d3 format, that includes both hyperparameters and model parameters in a single file. This makes it possible for you to easily save checkpoints and reconstruct algorithms for evaluation and deployment.

import d3rlpy

dataset, env = d3rlpy.datasets.get_cartpole()

dqn = d3rlpy.algos.DQNConfig().create()

dqn.fit(dataset, n_steps=10000)

# save as d3 file
dqn.save("model.d3")

# reconstruct the exactly same DQN
new_dqn = d3rlpy.load_learnable("model.d3")

ReplayBuffer

From this version, there is no clear separation between ReplayBuffer and MDPDataset anymore. Instead, ReplayBuffer has unlimited flexibility to support any kinds of algorithms and experiments. Please check details at documentation.

Assets 2

24 Jun 11:12

takuseno

v1.1.1

1ac85b9

Release v1.1.1

Benchmark

The benchmark results of IQL and NFQ have been added to d3rlpy-benchmarks. Plus, the results of the more random seeds up to 10 have been added to all algorithms. The benchmark results are more reliable now.

Documentation

More descriptions have been added to Finetuning tutorial page.
Offline Policy Selection tutorial page has been added

Enhancements

cloudpickle and GPUUtil dependencies have been removed.
gaussian likelihood computation for MOPO becomes more mathematically right (thanks @tominku )

Contributors

tominku

Assets 11

27 Apr 15:37

takuseno

v1.1.0

69e4628

Release v1.1.0

MDPDataset

The timestep alignment is now exactly the same as D4RL:

# observations = [o_1, o_2, ..., o_n]
observations = np.random.random((1000, 10))

# actions = [a_1, a_2, ..., a_n]
actions = np.random.random((1000, 10))

# rewards = [r(o_1, a_1), r(o_2, a_2), ...]
rewards = np.random.random(1000)

# terminals = [t(o_1, a_1), t(o_2, a_2), ...]
terminals = ...

where r(o, a) is the reward function and t(o, a) is the terminal function.

The reason of this change is that the many users were confused with the difference between d3rlpy and D4RL. But, now it's aligned in the same way. This change might break your dataset.

Algorithms

Neural Fitted Q-iteration (NFQ)
- https://link.springer.com/chapter/10.1007/11564096_32

Enhancements

AWAC, CRR and IQL use a non-squashed gaussian policy function.
The more tutorial pages have been added to the documentation.
The software design page has been added to the documentation.
The reproduction script for IQL has been added.
The progress bar in online training is visually improved in Jupyter Notebook #161 (thanks, @aiueola )
The nan checks have been added to MDPDataset.
The target_reduction_type and bootstrap options have been removed.

Bugfix

The unnecessary test conditions have been removed
Typo in dataset.pyx has been fixed #167 (thanks, @zbzhu99 )
The details of IQL implementation have been fixed.

Contributors

zbzhu99 and aiueola

Assets 11

Releases: takuseno/d3rlpy

Release v2.5.0

New Algorithm

Custom Algorithm Example

Enhancement

Bugfix

Release v2.4.0

Tuple observations

Enhancements

Bugfix

Contributors

Release v2.3.0

Distributed data parallel training

Minari support (thanks, @grahamannett !)

Minimize redundant computes

Enhancements

Bugfix

Contributors

Release v2.2.0

Algorithm

Enhancement

Refactoring

Bugfix

Release v2.1.0

Upgrade PyTorch to v2

Healthcheck

Gymnasium support

d3rlpy install command

Refactoring

Enhancement

Bugfix

Contributors

Release v2.0.4

Bugfix

Release v2.0.3

Release v2.0.2

Upgrade Gym version

Algorithm

Clear separation between configuration and algorithm

Decision Transformer

Serialization

ReplayBuffer

Release v1.1.1

Benchmark

Documentation

Enhancements

Contributors

Release v1.1.0

MDPDataset

Algorithms

Enhancements

Bugfix

Contributors