The project compares the sample efficiency of reward-search and reward-shaping in learning an optimal policy