项目作者: blavad

项目描述 :
Multi-agent reinforcement learning framework
高级语言: Python
项目地址: git://github.com/blavad/marl.git
创建时间: 2019-11-24T09:55:30Z
项目社区:https://github.com/blavad/marl

开源协议:

下载


MARL

MARL is a high-level multi-agent reinforcement learning library, written in Python.

Project doc : [DOC]

Installation

  1. git clone https://github.com/blavad/marl.git
  2. cd marl
  3. pip install -e .

Implemented algorithms

Single-agent algorithms

Q-learning DQN Actor-Critic DDPG TD3
:heavy_check_mark: :heavy_check_mark: :heavy_check_mark: :heavy_check_mark: :x:

Multi-agent algorithms

minimaxQ PHC JAL MAAC MADDPG
:heavy_check_mark: :heavy_check_mark: :x: :heavy_check_mark: :heavy_check_mark:

Examples

Check existing methods

  1. import marl
  2. # Check available agents
  3. print("\n| Agents\t\t", list(marl.agent.available()))
  4. # Check available agents
  5. print("\n| Policies\t\t", list(marl.policy.available()))
  6. # Check available agents
  7. print("\n| Models\t\t", list(marl.model.available()))
  8. # Check available exploration process
  9. print("\n| Expl. Processes\t", list(marl.exploration.available()))
  10. # Check available experience memory
  11. print("\n| Experience Memory\t", list(marl.experience.available()))

Train a single agent with DQN algorithm

  1. import marl
  2. from marl.agent import DQNAgent
  3. from marl.model.nn import MlpNet
  4. import gym
  5. env = gym.make("LunarLander-v2")
  6. obs_s = env.observation_space
  7. act_s = env.action_space
  8. mlp_model = MlpNet(8,4, hidden_size=[64, 32])
  9. dqn_agent = DQNAgent(mlp_model, obs_s, act_s, experience="ReplayMemory-5000", exploration="EpsGreedy", lr=0.001, name="DQN-LunarLander")
  10. # Train the agent for 100 000 timesteps
  11. dqn_agent.learn(env, nb_timesteps=100000)
  12. # Test the agent for 10 episodes
  13. dqn_agent.test(env, nb_episodes=10)

Train two agents with Minimax-Q algorithm

  1. import marl
  2. from marl import MARL
  3. from marl.agent import MinimaxQAgent
  4. from marl.exploration import EpsGreedy
  5. from soccer import DiscreteSoccerEnv
  6. # Environment available here "https://github.com/blavad/soccer"
  7. env = DiscreteSoccerEnv(nb_pl_team1=1, nb_pl_team2=1)
  8. obs_s = env.observation_space
  9. act_s = env.action_space
  10. # Custom exploration process
  11. expl1 = EpsGreedy(eps_deb=1.,eps_fin=.3)
  12. expl2 = EpsGreedy(eps_deb=1.,eps_fin=.3)
  13. # Create two minimax-Q agents
  14. q_agent1 = MinimaxQAgent(obs_s, act_s, act_s, exploration=expl1, gamma=0.9, lr=0.001, name="SoccerJ1")
  15. q_agent2 = MinimaxQAgent(obs_s, act_s, act_s, exploration=expl2, gamma=0.9, lr=0.001, name="SoccerJ2")
  16. # Create the trainable multi-agent system
  17. mas = MARL(agents_list=[q_agent1, q_agent2])
  18. # Assign MAS to each agent
  19. q_agent1.set_mas(mas)
  20. q_agent2.set_mas(mas)
  21. # Train the agent for 100 000 timesteps
  22. mas.learn(env, nb_timesteps=100000)
  23. # Test the agents for 10 episodes
  24. mas.test(env, nb_episodes=10, time_laps=0.5)