Super-Mario-Bros Reinforcement Learning: QL vs Sarsa

The following project concerns the development of an intelligent agent for the famous game produced by Nintendo Super Mario Bros. More in detail: the goal of this project was to design, implement and train an agent with the Q-learning reinforcement learning algorithm. Subsequently, the results of learning with the Q-learning algorithm were compared with the SARSA algorithm. In our case study, other types of learning involving the double Q-learning algorithm, Deep Q-Network (DQN) and Double Deep Q-Network (DDQN). The reason why these different learnings are provided is for performance issues. For more information, read the report written by us.

The parameters and plots of the relevant QL models are located under ./code/Reinforcement_Learning/models, while the parameters and plots of the Sarsa models are located under ./code/Reinforcement_Learning/sarsa/models.

Requirements (tested)

Module	Version
gym	0.25.2
gym-super-mario-bros	7.4.0
nes-py	8.2.1
pyglet	1.5.21
torch	2.1.1
pygame	2.5.2

Gym Environment

We used the gym-super-mario-bros environment. The code can be found in ./code/Reinforcement_Learning/utils/enviroment.py, where we do the setup of the environment. In ./code/Reinforcement_Learning/utils/setup_env.py We assign custom values to the rewards so as to take as many power-ups as possible. Then the agents QL logic can be found in ./code/Reinforcement_Learning/utils/agents, while models and Sarsa agents can be found in ./code/Reinforcement_Learning/sarsa The custom rewards are:

time: -0.1, per second that passes
death: -100., mario dies
extra_life: 100., mario gets an extra life
mushroom: 20., mario eats a mushroom to become big
flower: 25., mario eats a flower
mushroom_hit: -10., mario gets hit while big
flower_hit: -15., mario gets hit while fire mario
coin: 15., mario gets a coin
score: 15., mario hit enemies
victory: 1000 mario win

Training & Results

We used the QL, Double QL, Deep QN, Double Deep QN agents together with their respective sarsa agents with epsilon-greedy policy. Each model was trained for 1000 steps and took about 3.5 hours to finish except for DDQN and DDN Sarsa that they was trained for 10.000 steps and took about 13.4 hours.

Here are the results of all the models, specifically we make a comparison between the QL and Sarsa algorithms.


Training steps	10K	10K	10K
Episode score	1723	4100	4320
Agents	DDN Sarsa	DDN Sarsa	DDQN
Completed level?	False	True	True

So, to get more results, we could implement the PPO agorithm in both QL and Sarsa algorithms and make further comparisons in order to figure out which algorithm is best for the super mario bros game.

Author & Contacts

Name

Description

Alberto Montefusco

Developer - Alberto-00

Email - a.montefusco28@studenti.unisa.it

LinkedIn - Alberto Montefusco

My WebSite - alberto-00.github.io

Alessandro Aquino

Developer - AlessandroUnisa

Email - a.aquino33@studenti.unisa.it

LinkedIn - Alessandro Aquino

Mattia d'Argenio

Developer - mattiadarg

Email - m.dargenio5@studenti.unisa.it

LinkedIn - Mattia d'Argenio

Name		Name	Last commit message	Last commit date
Latest commit History 70 Commits
code		code
demo		demo
grafici		grafici
.gitignore		.gitignore
README.md		README.md
Super Mario Bros.pdf		Super Mario Bros.pdf
report.pdf		report.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

code

code

demo

demo

grafici

grafici

.gitignore

.gitignore

README.md

README.md

Super Mario Bros.pdf

Super Mario Bros.pdf

report.pdf

report.pdf

Repository files navigation

Super-Mario-Bros Reinforcement Learning: QL vs Sarsa

Requirements (tested)

Gym Environment

Training & Results

Author & Contacts

About

Releases

Packages

Contributors 3

Languages

Alberto-00/Super-Mario-Bros-AI

Folders and files

Latest commit

History

Repository files navigation

Super-Mario-Bros Reinforcement Learning: QL vs Sarsa

Requirements (tested)

Gym Environment

Training & Results

Author & Contacts

About

Topics

Resources

Stars

Watchers

Forks

Languages