项目作者: markub3327

项目描述 :
Tools for training/testing by different RL algorithms in many game environments.
高级语言: Python
项目地址: git://github.com/markub3327/rl-toolkit.git
创建时间: 2020-11-23T13:18:40Z
项目社区:https://github.com/markub3327/rl-toolkit

开源协议:MIT License

下载


RL Toolkit

Release
Tag
Issues
Commits
Languages
Size

Papers

Installation with PyPI

On PC AMD64 with Ubuntu/Debian

  1. Install dependences
    1. apt update -y
    2. apt install swig -y
  2. Install RL-Toolkit
    1. pip3 install rl-toolkit[all]
  3. Run (for Server)
    1. rl_toolkit rl_toolkit -c ./config/sac.yaml -a sac -e BipedalWalkerHardcore-v3 server
    Run (for Agent)
    1. rl_toolkit -c ./config/sac.yaml -a sac -e BipedalWalkerHardcore-v3 agent
    Run (for Learner)
    1. rl_toolkit -c ./config/sac.yaml -a sac -e BipedalWalkerHardcore-v3 learner --db_server 192.168.1.2
    Run (for Tester)
    1. rl_toolkit -c ./config/sac.yaml -a sac -e BipedalWalkerHardcore-v3 tester -f save/model/actor.h5

On NVIDIA Jetson

  1. Install dependences

    Tensorflow for JetPack, follow instructions here for installation.

    1. sudo apt install swig -y
  2. Install Reverb

    Download Bazel 3.7.2 for arm64, here

    1. mkdir ~/bin
    2. mv ~/Downloads/bazel-3.7.2-linux-arm64 ~/bin/bazel
    3. chmod +x ~/bin/bazel
    4. export PATH=$PATH:~/bin

    Clone Reverb with version that corespond with TF verion installed on NVIDIA Jetson !

    1. git clone https://github.com/deepmind/reverb
    2. cd reverb/
    3. git checkout r0.9.0

    Make changes in Reverb before building !

    In .bazelrc

    1. - build:manylinux2010 --crosstool_top=//third_party/toolchains/preconfig/ubuntu16.04/gcc7_manylinux2010:toolchain
    2. + # build:manylinux2010 --crosstool_top=//third_party/toolchains/preconfig/ubuntu16.04/gcc7_manylinux2010:toolchain
    3. - build --copt=-mavx --copt=-DEIGEN_MAX_ALIGN_BYTES=64
    4. + build --copt=-DEIGEN_MAX_ALIGN_BYTES=64

    In WORKSPACE

    1. - PROTOC_SHA256 = "15e395b648a1a6dda8fd66868824a396e9d3e89bc2c8648e3b9ab9801bea5d55"
    2. + # PROTOC_SHA256 = "15e395b648a1a6dda8fd66868824a396e9d3e89bc2c8648e3b9ab9801bea5d55"
    3. + PROTOC_SHA256 = "7877fee5793c3aafd704e290230de9348d24e8612036f1d784c8863bc790082e"

    In oss_build.sh

    1. - bazel test -c opt --copt=-mavx --config=manylinux2010 --test_output=errors //reverb/cc/...
    2. + bazel test -c opt --copt="-march=armv8-a+crypto" --test_output=errors //reverb/cc/...
    3. # Builds Reverb and creates the wheel package.
    4. - bazel build -c opt --copt=-mavx $EXTRA_OPT --config=manylinux2010 reverb/pip_package:build_pip_package
    5. + bazel build -c opt --copt="-march=armv8-a+crypto" $EXTRA_OPT reverb/pip_package:build_pip_package

    In reverb/cc/platform/default/repo.bzl

    1. urls = [
    2. - "https://github.com/protocolbuffers/protobuf/releases/download/v%s/protoc-%s-linux-x86_64.zip" % (version, version),
    3. + "https://github.com/protocolbuffers/protobuf/releases/download/v%s/protoc-%s-linux-aarch_64.zip" % (version, version),
    4. ]

    In reverb/pip_package/build_pip_package.sh

    1. - "${PYTHON_BIN_PATH}" setup.py bdist_wheel ${PKG_NAME_FLAG} ${RELEASE_FLAG} ${TF_VERSION_FLAG} --plat manylinux2010_x86_64 > /dev/null
    2. + "${PYTHON_BIN_PATH}" setup.py bdist_wheel ${PKG_NAME_FLAG} ${RELEASE_FLAG} ${TF_VERSION_FLAG} > /dev/null

    Build and install

    1. bash oss_build.sh --clean true --tf_dep_override "tensorflow~=2.9.1" --release --python "3.8"
    2. bash ./bazel-bin/reverb/pip_package/build_pip_package --dst /tmp/reverb/dist/ --release
    3. pip3 install /tmp/reverb/dist/dm_reverb-*

    Cleaning

    1. cd ../
    2. rm -R reverb/
  3. Install RL-Toolkit
    1. pip3 install rl-toolkit

Environments

Environment Observation space Observation bounds Action space Action bounds Reward bounds
BipedalWalkerHardcore-v3 (24, ) [-inf, inf] (4, ) [-1.0, 1.0] [-1.0, 1.0]
FlappyBird-v0 (16, 180) [0, dmax] (2, ) {DO NOTHING, FLAP} [-1.0, 1.0]

Results

Environment SAC
+ gSDE
SAC
+ gSDE
+ Huber loss
SAC
+ TQC
+ gSDE
Q-Learning RL-Toolkit
BipedalWalkerHardcore-v3 13 ± 18(1) 239 ± 118 228 ± 18(1) - 205 ± 134
FlappyBird-v0 - - - 209.298(2) 13 156

dm_ant_ball_sac

Releases


Frameworks: Tensorflow, DeepMind Reverb, Gymnasium, DeepMind Control Suite, WanDB, OpenCV