Pacman_RL_with_feedback

Reinforcement learning (Q learning with function approximation) with human feedback implemented in Pacman environment

About

This work is built upon the UC Berkley's Pacman environment (http://ai.berkeley.edu/home.html). To teach Pacman to play the game, we design features (described below) and use function approximation for Q learning algorithm. Human feedback can be provided using these features to guide agent's learning. The agent can correct underperforming behavior and learn about unseen situations from good feedback. In 6 trial runs with total of 18 episodes with and without feedback, we recorded an average reward improvement of about 20 percent using the feedback mechanism. An introduction to the concepts involved from Markov Decison Process (MDP) to Q-learning with function approximation (with emphasis on the Pacman agent) can be found in the Theory.pdf.

Demo after learning with Feedback

Setup

Requires Python2

Usage instructions

Move into src directory using "cd src"
Normal Pacman game can be run using "python2 pacman.py"
To let Pacman train and show what it learnt use: "bash run_q_learning.sh" or "python2 pacman.py -p ApproximateQAgent -a extractor=SimpleExtractorPellet -x 10 -n 11 -l mediumClassic"
To run the program with feedback mode enabled use: "bash run_feedback_mode.sh" or "python2 pacman.py -p ApproximateQAgentFeedback -a extractor=SimpleExtractorPellet -x 0 -n 4 -l mediumClassic"

Experiments with feedback

Option: Read from file - is for loading feature weights from previous run
Option: Write to file - is for storing weights after runs to read next time
Option: Explore - to choose a random action with a probability (Required for learning).
Option: Use queries - to give feedback for next 3 games

Use f, t, t, f to run with exploration, without feedback and write weights to file. (We used this for 3 episodes to allow agent to learn some basic rules)
Use t, f, f, t to load weights and use feedback without exploration. (We used this for 4 episodes to test feedback mechanism)
Use t, f, f, f to load same weights and run without feedback or exploration. (We ran this 4 times as well)

Comparision of performance in 2 and 3 will show the effect of using feedback.

The results of our experiments are documented in results.txt

Feedback mechanism

We have provided additional features to incentivize Pacman to hunt and eat scared ghosts in this program. Feedback can be provided by adding/decreasing importance of following:

bias
run into scared ghosts 1 step away
run into non-scared ghosts 1 step away
eat food
distance to closest food
hunt scared ghost

To increase importance of a feature use +, to decrease importance use -.

For example:

6 + (Increase importance of hunting scared ghost)
3 - (Increase importance of running from normal ghost)
-1 -1 (End of feedback)

Developed by

Abhiramon R. and Suraj Singh

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
gif		gif
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
Theory.pdf		Theory.pdf
results.txt		results.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gif

gif

src

src

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

Theory.pdf

Theory.pdf

results.txt

results.txt

Repository files navigation

Pacman_RL_with_feedback

About

Demo after learning with Feedback

Setup

Usage instructions

Experiments with feedback

Feedback mechanism

Developed by

About

Releases

Packages

Languages

License

Abhiramon/Pacman_RL_with_feedback

Folders and files

Latest commit

History

Repository files navigation

Pacman_RL_with_feedback

About

Demo after learning with Feedback

Setup

Usage instructions

Experiments with feedback

Feedback mechanism

Developed by

About

Resources

License

Stars

Watchers

Forks

Languages