Who needs RL anyway? ES to the rescue! 😱
This is a design doc for the implementation that I’ve come up with.
python3 -m venv marvin_env
source marvin_env/bin/activate
brew install swig
.pip install numpy==1.17.2 gym==0.14.0 Box2D==2.3.2 box2d-py==2.3.8
gym
directory provided in this repo to marvin_env/lib/python3.7/site-packages
(with replacement, like cp -r gym marvin_env/lib/python3.7/site_packages
)import gym
env = gym.make("Marvin-v0")
to create an environmentenv = gym.make("BipedalWalker-v2)"
In order to run distributed version you need Ray: pip install ray psutil
If you encounter an error, contact me. It’s likely that this will break in the future due to dependencies.
The purpose of Server is to synchronize progress across multiple Clients as well as distribute work to each of the Client. It does so by creating a list of Client actors, initializing them with model architecture, random seed used for model initialization, seed for perturbation generation, and environment identifier.
Client is initialized with it’s personal random seed that is known for Server. When evaluate
method
is called, it samples weights perturbation according to it’s seed and evaluates model with it, sending
only the reward back to Server.
Client can run evaluate
multiple times with perturbation added to the same set of weights.
Once Server is done distributing evaluation across Clients, it collects the rewards and reproduces
perturbations on the client nodes. It then proceeds with performing weights update according with the
Evolution Strategy. It then broadcasts new weights across all clients by calling update
method.