Reinforcement Learning in the NetHack Environment

Methods we used:

Advantage Actor Critic (A2C)
A2C + LSTM
Model Based Search (MBS)
REINFORCE

Useful NLE links

NetHack Wiki: (Useful info.)
https://nethackwiki.com/

NLE API:
https://github.com/facebookresearch/nle

NLE Paper:
https://arxiv.org/pdf/2006.13760.pdf

Nethack example Agent:
https://github.com/facebookresearch/nle/blob/master/nle/agent/agent.py

NLE default options :

Human male neutral monk
NETHACKOPTIONS = [ "color", "showexp", "autopickup", "pickup_types:$?!/", "pickup_burden:unencumbered", "nobones", "nolegacy", "nocmdassist", "disclose:+i +a +v +g +c +o", "runmode:teleport", "mention_walls", "nosparkle", "showexp", "showscore", ]
This means we don't have to pickup items.

The action space:

The action space might be extended to included all 90'something actions, By default, the action space includes the following that might be stepped in gym:

[0] : More (Not doing anything)
[1] : North 1 step
[2] : East 1 step
[3] : South 1 step
[4] : West 1 step
[5] : North-East 1 step
[6] : Sout-East 1 step
[7] : South-West 1 step
[8] : North-West 1 step
[9] : North max
[10] : East max
[11] : South max
[12] : West max
[13] : North-East max
[14] : Sout-East max
[15] : South-West max
[16] : North-West max
[17] : Go up a staircase
[18] : Go down a starcase
[19] : Wait / Do nothing
[20] : Kick
[21] : Eat
[22] : Search

To reduce the action space, I'm removing the actions that auto move (0, and 9 - 16)

The observation space (We have to make sure the blstats expanded here are correct.):

If we work with the same setup as the example agent we have the following:
observation_space['glyphs'] = Box(0, 5976, (21, 79), int16), which may represent a symbol with int val between 0 and 5976 in the shape (height=21, width=79)

observation_space['blstats'] = Box(-something, +something, (25, ), int16), which are 25 stats in an array:

[0] : X_Coordinate
[1] : Y_Coordinate
[2] : Strength Percentage
[3] : Strength (Strength corresponds to the ability to have more weight in your inventory.)
[4] : Dexterity (has a multitude of effects, of which the most significant is probably that it affects your chance of hitting monsters, whether in melee combat or with a missile or spell)
[5] : Constitution (Having a high constitution increases your healing rate and the number of HP you gain when levelling up and allows you to carry more weight in your inventory.)
[6] : Intelligence (If you are a Healer, Knight, Monk, Priest or Valkyrie, in which case it is wisdom that affects your chances of successfully casting a spe
[7] : Wisdom (A Healer, Knight, Monk, Priest or Valkyrie requires wisdom to cast spells)
[8] : Charisma (Charisma is mostly useful for obtaining better prices at shops. )
[9] : Score
[10] : Current Health Points
[11] : Maximum Health Points
[12] : Dungeon depth
[13] : Available gold
[14] : Current energy
[15] : Max energy
[16] : Armor class
[17] : Monster level
[18] : Experience level
[19] : Experience points
[20] : Time
[21] : Hunger level (Too little and you starve; too much and you choke.)
[22] : Carying capacity
[23] : NLE stat
[24] : NLE stat

Name		Name	Last commit message	Last commit date
Latest commit History 257 Commits
NLE_A2C		NLE_A2C
NLE_A2C_LSTM		NLE_A2C_LSTM
NLE_DQN		NLE_DQN
NLE_MBS		NLE_MBS
NLE_Method_Comparison		NLE_Method_Comparison
NLE_OPTION_CRITIC		NLE_OPTION_CRITIC
NLE_REINFORCE		NLE_REINFORCE
Submission		Submission
resources		resources
.gitignore		.gitignore
LICENSE		LICENSE
NLE API Guide.pdf		NLE API Guide.pdf
README.md		README.md
Submission.zip		Submission.zip
The Nethack Learning Environment.pdf		The Nethack Learning Environment.pdf
requirements.txt		requirements.txt

License

Pieter-Cawood/Reinforcement-Learning

Folders and files

Latest commit

History

Repository files navigation

Reinforcement Learning in the NetHack Environment

Useful NLE links

NLE default options :

The action space:

The observation space (We have to make sure the blstats expanded here are correct.):

About

Resources

License

Stars

Watchers

Forks

Languages