Reinforcement LearningΒΆ

A DQN is trained on the FlappyBird game for 100000 episodes using a Bayesian CNN and then adversarial examples are crafted to fool the trained policy.