RL Toolkit

Papers

Installation with PyPI

On PC AMD64 with Ubuntu/Debian

Install dependences
```
apt update -y
apt install swig -y
```
Install RL-Toolkit
```
pip3 install rl-toolkit[all]
```

Run (for Server)

python3 -m rl_toolkit -c ./rl_toolkit/config.yaml -e MinitaurBulletEnv-v0 server

Run (for Agent)

python3 -m rl_toolkit -c ./rl_toolkit/config.yaml -e MinitaurBulletEnv-v0 agent --db_server localhost

Run (for Learner)

python3 -m rl_toolkit -c ./rl_toolkit/config.yaml -e MinitaurBulletEnv-v0 learner --db_server 192.168.1.2

Run (for Tester)

python3 -m rl_toolkit -c ./rl_toolkit/config.yaml -e MinitaurBulletEnv-v0 tester -f save/model/actor.h5

On NVIDIA Jetson

Install dependences
Tensorflow for JetPack, follow instructions here for installation.
```
sudo apt install swig -y
```

Install Reverb
Download Bazel 3.7.2 for arm64, here

mkdir ~/bin
mv ~/Downloads/bazel-3.7.2-linux-arm64 ~/bin/bazel
chmod +x ~/bin/bazel
export PATH=$PATH:~/bin

Clone Reverb with version that corespond with TF verion installed on NVIDIA Jetson !

git clone https://github.com/deepmind/reverb
cd reverb/
git checkout r0.9.0

Make changes in Reverb before building !
In .bazelrc

- build:manylinux2010 --crosstool_top=//third_party/toolchains/preconfig/ubuntu16.04/gcc7_manylinux2010:toolchain
+ # build:manylinux2010 --crosstool_top=//third_party/toolchains/preconfig/ubuntu16.04/gcc7_manylinux2010:toolchain

- build --copt=-mavx --copt=-DEIGEN_MAX_ALIGN_BYTES=64
+ build --copt=-DEIGEN_MAX_ALIGN_BYTES=64

In WORKSPACE

- PROTOC_SHA256 = "15e395b648a1a6dda8fd66868824a396e9d3e89bc2c8648e3b9ab9801bea5d55"
+ # PROTOC_SHA256 = "15e395b648a1a6dda8fd66868824a396e9d3e89bc2c8648e3b9ab9801bea5d55"
+ PROTOC_SHA256 = "7877fee5793c3aafd704e290230de9348d24e8612036f1d784c8863bc790082e"

In oss_build.sh

-  bazel test -c opt --copt=-mavx --config=manylinux2010 --test_output=errors //reverb/cc/...
+  bazel test -c opt --copt="-march=armv8-a+crypto" --test_output=errors //reverb/cc/...

# Builds Reverb and creates the wheel package.
-  bazel build -c opt --copt=-mavx $EXTRA_OPT --config=manylinux2010 reverb/pip_package:build_pip_package
+  bazel build -c opt --copt="-march=armv8-a+crypto" $EXTRA_OPT reverb/pip_package:build_pip_package

In reverb/cc/platform/default/repo.bzl

urls = [
   -        "https://github.com/protocolbuffers/protobuf/releases/download/v%s/protoc-%s-linux-x86_64.zip" % (version, version),
   +        "https://github.com/protocolbuffers/protobuf/releases/download/v%s/protoc-%s-linux-aarch_64.zip" % (version, version),
]

In reverb/pip_package/build_pip_package.sh

-  "${PYTHON_BIN_PATH}" setup.py bdist_wheel ${PKG_NAME_FLAG} ${RELEASE_FLAG} ${TF_VERSION_FLAG} --plat manylinux2010_x86_64 > /dev/null
+  "${PYTHON_BIN_PATH}" setup.py bdist_wheel ${PKG_NAME_FLAG} ${RELEASE_FLAG} ${TF_VERSION_FLAG}  > /dev/null

Build and install

bash oss_build.sh --clean true --tf_dep_override "tensorflow~=2.9.1" --release --python "3.8"
bash ./bazel-bin/reverb/pip_package/build_pip_package --dst /tmp/reverb/dist/ --release
pip3 install /tmp/reverb/dist/dm_reverb-*

Cleaning

cd ../
rm -R reverb/

Install RL-Toolkit
```
pip3 install rl-toolkit
```

Run (for Server)

python3 -m rl_toolkit -c ./rl_toolkit/config.yaml -e MinitaurBulletEnv-v0 server

Run (for Agent)

python3 -m rl_toolkit -c ./rl_toolkit/config.yaml -e MinitaurBulletEnv-v0 agent --db_server localhost

Run (for Learner)

python3 -m rl_toolkit -c ./rl_toolkit/config.yaml -e MinitaurBulletEnv-v0 learner --db_server 192.168.1.2

Run (for Tester)

python3 -m rl_toolkit -c ./rl_toolkit/config.yaml -e MinitaurBulletEnv-v0 tester -f save/model/actor.h5

Environments

Environment	Observation space	Observation bounds	Action space	Action bounds	Reward bounds
BipedalWalkerHardcore-v3	(24, )	[-inf, inf]	(4, )	[-1.0, 1.0]	[-1.0, 1.0]
Walker2DBulletEnv-v0	(22, )	[-inf, inf]	(6, )	[-1.0, 1.0]	[-1.0, 1.0]
AntBulletEnv-v0	(28, )	[-inf, inf]	(8, )	[-1.0, 1.0]	[-1.0, 1.0]
HalfCheetahBulletEnv-v0	(26, )	[-inf, inf]	(6, )	[-1.0, 1.0]	[-1.0, 1.0]
HopperBulletEnv-v0	(15, )	[-inf, inf]	(3, )	[-1.0, 1.0]	[-1.0, 1.0]
HumanoidBulletEnv-v0	(44, )	[-inf, inf]	(17, )	[-1.0, 1.0]	[-1.0, 1.0]
MinitaurBulletEnv-v0	(28, )	[-167.72488, 167.72488]	(8, )	[-1.0, 1.0]	[-1.0, 1.0]

Results

Environment	SAC + gSDE	SAC + gSDE + Huber loss	SAC + TQC + gSDE	RL-Toolkit
BipedalWalkerHardcore-v3	13 ± 18⁽²⁾	239 ± 118	228 ± 18⁽²⁾	205 ± 134
Walker2DBulletEnv-v0	2270 ± 28⁽¹⁾	2732 ± 96	2535 ± 94⁽²⁾	3123 ± 594
AntBulletEnv-v0	3106 ± 61⁽¹⁾	3460 ± 119	3700 ± 37⁽²⁾	3993 ± 214
HalfCheetahBulletEnv-v0	2945 ± 95⁽¹⁾	3003 ± 226	3041 ± 157⁽²⁾	2762 ± 153
HopperBulletEnv-v0	2515 ± 50⁽¹⁾	2555 ± 405	2401 ± 62⁽²⁾	2151 ± 664

Releases

SAC + gSDE + Huber loss
is stored here, branch r2.0
SAC + TQC + gSDE + LogCosh + Reverb
is stored here, branch r4.0

Frameworks: Tensorflow, Reverb, OpenAI Gym, PyBullet, WanDB, OpenCV

Name		Name	Last commit message	Last commit date
Latest commit History 1,005 Commits
.github		.github
config		config
docker		docker
img		img
models		models
rl_toolkit		rl_toolkit
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE.txt		LICENSE.txt
README.md		README.md
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py

License

markub3327/rl-toolkit

Folders and files

Latest commit

History

Repository files navigation

RL Toolkit

Papers

Installation with PyPI

On PC AMD64 with Ubuntu/Debian

On NVIDIA Jetson

Environments

Results

Releases

About

Topics

Resources

License

Stars

Watchers

Forks

Sponsor this project

Languages