Stanford - CS234 Reinforcement Learning - Assignment 2
Stanford - CS234 Reinforcement Learning - Assignment 2
In Pong, one player scores if the ball passes by the other player. An episode is over when one of the players
reaches 21 points. Thus, the total return of an episode is between -21 (lost every point) and +21 (won every
point). Our agent plays against a decent hard-coded AI player.
python q2_linear.py
python q3_nature.py
python q4_train_atari_linear.py
python q5_train_atari_nature.py
Numpy, TensorFlow, Gym, collections, pyglet, random, time, sys, loggings, matplotlib, os