GitHub - Elmar999/Q-learning-OpenAI-Taxi-V2: Taxi-v2 game using Q learning algorithm

OpenAI Taxi-v2 game using Q learning

Q-learning is an off policy reinforcement learning algorithm that seeks to find the best action to take given the current state. In this project Q learning technique has been applied over OpenAI Taxi-v2 game.

In order to update Q values, we will use the following action-value function:

In this game we have 6 actions:

down (south)
up (north)
right (east)
left (west)
picking up
dropping off

Secondly, we should define our states. We have 5*5 cells which means we have at least 25 states. The taxi can be also in a state of picking up or dropping off the passenger. Then we have 4 more states. Finally, we have to compute those states where the passenger is actually picked up, dropped off (+ 4 states, since the locations where the passenger could be are 4) or simply transported (+ 1 state). So, in total, we have 5x5x4x5=500 states.

Each state of our agent is actually encoded within value 0 and 499. We should also take rewards into account. There are different rewards depending on action that agent makes:

if the taxi correctly pick-up/drop-off the passenger, it is rewarded with +20 points
if the taxi does an illegal pick-up/drop-off, it is punished with -10 points
for each step which does not include the states above, it loses 1 point

An agent interacts with the environment in 1 of 2 ways. The first is to use the q-table as a reference and view all possible actions for a given state. The agent then selects the action based on the max value of those actions. This is known as exploiting since we use the information we have available to us to make a decision.

The second way to take action is to act randomly. This is called exploring. Instead of selecting actions based on the max future reward we select an action at random. Acting randomly is important because it allows the agent to explore and discover new states that otherwise may not be selected during the exploitation process. You can balance exploration/exploitation using epsilon (ε) and setting the value of how often you want to explore vs exploit. Here’s some rough code that will depend on how the state and action space are setup.

You can launch game and see optimal converged Q table, the steps that has been taken to reach the goal:

$ python3 game.py

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
imgs		imgs
README.md		README.md
agent.py		agent.py
game.py		game.py
q_learning_taxi_driver.ipynb		q_learning_taxi_driver.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

imgs

imgs

README.md

README.md

agent.py

agent.py

game.py

game.py

q_learning_taxi_driver.ipynb

q_learning_taxi_driver.ipynb

Repository files navigation

OpenAI Taxi-v2 game using Q learning

About

Releases

Packages

Languages

Elmar999/Q-learning-OpenAI-Taxi-V2

Folders and files

Latest commit

History

Repository files navigation

OpenAI Taxi-v2 game using Q learning

About

Topics

Resources

Stars

Watchers

Forks

Languages