Tools for training/testing by different RL algorithms in many game environments.
apt update -y
apt install swig -y
pip3 install rl-toolkit[all]
Run (for Agent)
rl_toolkit rl_toolkit -c ./config/sac.yaml -a sac -e BipedalWalkerHardcore-v3 server
Run (for Learner)
rl_toolkit -c ./config/sac.yaml -a sac -e BipedalWalkerHardcore-v3 agent
Run (for Tester)
rl_toolkit -c ./config/sac.yaml -a sac -e BipedalWalkerHardcore-v3 learner --db_server 192.168.1.2
rl_toolkit -c ./config/sac.yaml -a sac -e BipedalWalkerHardcore-v3 tester -f save/model/actor.h5
Install dependences
Tensorflow for JetPack, follow instructions here for installation.
sudo apt install swig -y
Install Reverb
Download Bazel 3.7.2 for arm64, here
mkdir ~/bin
mv ~/Downloads/bazel-3.7.2-linux-arm64 ~/bin/bazel
chmod +x ~/bin/bazel
export PATH=$PATH:~/bin
Clone Reverb with version that corespond with TF verion installed on NVIDIA Jetson !
git clone https://github.com/deepmind/reverb
cd reverb/
git checkout r0.9.0
Make changes in Reverb before building !
In .bazelrc
- build:manylinux2010 --crosstool_top=//third_party/toolchains/preconfig/ubuntu16.04/gcc7_manylinux2010:toolchain
+ # build:manylinux2010 --crosstool_top=//third_party/toolchains/preconfig/ubuntu16.04/gcc7_manylinux2010:toolchain
- build --copt=-mavx --copt=-DEIGEN_MAX_ALIGN_BYTES=64
+ build --copt=-DEIGEN_MAX_ALIGN_BYTES=64
In WORKSPACE
- PROTOC_SHA256 = "15e395b648a1a6dda8fd66868824a396e9d3e89bc2c8648e3b9ab9801bea5d55"
+ # PROTOC_SHA256 = "15e395b648a1a6dda8fd66868824a396e9d3e89bc2c8648e3b9ab9801bea5d55"
+ PROTOC_SHA256 = "7877fee5793c3aafd704e290230de9348d24e8612036f1d784c8863bc790082e"
In oss_build.sh
- bazel test -c opt --copt=-mavx --config=manylinux2010 --test_output=errors //reverb/cc/...
+ bazel test -c opt --copt="-march=armv8-a+crypto" --test_output=errors //reverb/cc/...
# Builds Reverb and creates the wheel package.
- bazel build -c opt --copt=-mavx $EXTRA_OPT --config=manylinux2010 reverb/pip_package:build_pip_package
+ bazel build -c opt --copt="-march=armv8-a+crypto" $EXTRA_OPT reverb/pip_package:build_pip_package
In reverb/cc/platform/default/repo.bzl
urls = [
- "https://github.com/protocolbuffers/protobuf/releases/download/v%s/protoc-%s-linux-x86_64.zip" % (version, version),
+ "https://github.com/protocolbuffers/protobuf/releases/download/v%s/protoc-%s-linux-aarch_64.zip" % (version, version),
]
In reverb/pip_package/build_pip_package.sh
- "${PYTHON_BIN_PATH}" setup.py bdist_wheel ${PKG_NAME_FLAG} ${RELEASE_FLAG} ${TF_VERSION_FLAG} --plat manylinux2010_x86_64 > /dev/null
+ "${PYTHON_BIN_PATH}" setup.py bdist_wheel ${PKG_NAME_FLAG} ${RELEASE_FLAG} ${TF_VERSION_FLAG} > /dev/null
Build and install
bash oss_build.sh --clean true --tf_dep_override "tensorflow~=2.9.1" --release --python "3.8"
bash ./bazel-bin/reverb/pip_package/build_pip_package --dst /tmp/reverb/dist/ --release
pip3 install /tmp/reverb/dist/dm_reverb-*
Cleaning
cd ../
rm -R reverb/
pip3 install rl-toolkit
Environment | Observation space | Observation bounds | Action space | Action bounds | Reward bounds |
---|---|---|---|---|---|
BipedalWalkerHardcore-v3 | (24, ) | [-inf, inf] | (4, ) | [-1.0, 1.0] | [-1.0, 1.0] |
FlappyBird-v0 | (16, 180) | [0, dmax] | (2, ) | {DO NOTHING, FLAP} | [-1.0, 1.0] |
Environment | SAC + gSDE |
SAC + gSDE + Huber loss |
SAC + TQC + gSDE |
Q-Learning | RL-Toolkit |
---|---|---|---|---|---|
BipedalWalkerHardcore-v3 | 13 ± 18(1) | 239 ± 118 | 228 ± 18(1) | - | 205 ± 134 |
FlappyBird-v0 | - | - | - | 209.298(2) | 13 156 |
Frameworks: Tensorflow, DeepMind Reverb, Gymnasium, DeepMind Control Suite, WanDB, OpenCV