Skip to content

Using the SARSA to beat the environment, Windy Gridworld. Implement in C++.

Notifications You must be signed in to change notification settings

Eshaancoding/gridworld

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SARSA: Windy Gridworld

I am going to attempt to implement on-policy Sarsa to solve the task Windy Gridworld, a task detailed in page 130 from the Introduction to Reinforcement Learning by Sutton & Barto. The Sarsa algorithm is also detailed in that page.

Both policies are stochastic, but overtime both become of the policies will become deterministic. Therefore, the two policies should be e-greedy policies. This will ensure exploration. e-greedy policies are described in page 100.

To run the algorithm, type make AI in the linux terminal. It will then show you the Episodes, Steps, Last total steps, and the total timesteps. Pay close attention to the "Last total steps", as that will tell how many timesteps the agent (a.k.a the computer) had to take to reach the goal state. If that number is around 15-20 after ~8000 total timesteps, the agent has successfully been trained on the environment. Simply hit b to break the program.

You could also run the algorithm to see how the computer plays. Type make AISim in your linux terminal. This will show you the Last total reward, the total timesteps, the episode, the steps, the explore toggle, and the game itself. If the simulation is too fast or too slow, simply change the delay variable (in milliseconds) in AISim.cpp. You could hit b to end the program (you may need to press it multiple times). The last total reward is the same thing as last total timesteps, but it is negated (since each action will result in -1 reward)

You could use the explore toggle, which will decide whether the agent should look and execute new actions. The explore toggle is usually switched off (0 = off, 1 = on in the simulation) after it has been trained for a decent amount of timesteps. If the computer does not explore at beginning of training, the computer will do fairly poor at the environment. The agent should be trained after ~8000 total timesteps.

If you have any question, you are welcome to email me at eshaanbarkataki@gmail.com

About

Using the SARSA to beat the environment, Windy Gridworld. Implement in C++.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published