直播技术-walking_marvin-PROSAGA-码农传奇

Fast Evolution Strategy for Walking Marvin

This is a design doc for the implementation that I’ve come up with.

Install Guide

Create a virtual environment python3 -m venv marvin_env
Activate it source marvin_env/bin/activate
Install Swig library brew install swig.
pip install numpy==1.17.2 gym==0.14.0 Box2D==2.3.2 box2d-py==2.3.8
Copy gym directory provided in this repo to marvin_env/lib/python3.7/site-packages (with replacement, like cp -r gym marvin_env/lib/python3.7/site_packages)
import gym
env = gym.make("Marvin-v0") to create an environment
Other environments should work fine too env = gym.make("BipedalWalker-v2)"

In order to run distributed version you need Ray: pip install ray psutil

If you encounter an error, contact me. It’s likely that this will break in the future due to dependencies.

gif

Server

The purpose of Server is to synchronize progress across multiple Clients as well as distribute work to each of the Client. It does so by creating a list of Client actors, initializing them with model architecture, random seed used for model initialization, seed for perturbation generation, and environment identifier.

Client

Client is initialized with it’s personal random seed that is known for Server. When evaluate method
is called, it samples weights perturbation according to it’s seed and evaluates model with it, sending
only the reward back to Server.

Client can run evaluate multiple times with perturbation added to the same set of weights.

Once Server is done distributing evaluation across Clients, it collects the rewards and reproduces
perturbations on the client nodes. It then proceeds with performing weights update according with the
Evolution Strategy. It then broadcasts new weights across all clients by calling update method.

Fast Evolution Strategy for Walking Marvin

Install Guide

Server

Client

Bibliography