Skip to content

Pieter-Cawood/Reinforcement-Learning

Repository files navigation

Reinforcement Learning in the NetHack Environment

Taxonomy

Methods we used:

  • Advantage Actor Critic (A2C)
  • A2C + LSTM
  • Model Based Search (MBS)
  • REINFORCE

Useful NLE links

NetHack Wiki: (Useful info.)
https://nethackwiki.com/

NLE API:
https://github.com/facebookresearch/nle

NLE Paper:
https://arxiv.org/pdf/2006.13760.pdf

Nethack example Agent:
https://github.com/facebookresearch/nle/blob/master/nle/agent/agent.py

NLE default options :

  • Human male neutral monk

  • NETHACKOPTIONS = [ "color", "showexp", "autopickup", "pickup_types:$?!/", "pickup_burden:unencumbered", "nobones", "nolegacy", "nocmdassist", "disclose:+i +a +v +g +c +o", "runmode:teleport", "mention_walls", "nosparkle", "showexp", "showscore", ]
    This means we don't have to pickup items.

The action space:

The action space might be extended to included all 90'something actions, By default, the action space includes the following that might be stepped in gym:

  • [0] : More (Not doing anything)
  • [1] : North 1 step
  • [2] : East 1 step
  • [3] : South 1 step
  • [4] : West 1 step
  • [5] : North-East 1 step
  • [6] : Sout-East 1 step
  • [7] : South-West 1 step
  • [8] : North-West 1 step
  • [9] : North max
  • [10] : East max
  • [11] : South max
  • [12] : West max
  • [13] : North-East max
  • [14] : Sout-East max
  • [15] : South-West max
  • [16] : North-West max
  • [17] : Go up a staircase
  • [18] : Go down a starcase
  • [19] : Wait / Do nothing
  • [20] : Kick
  • [21] : Eat
  • [22] : Search

To reduce the action space, I'm removing the actions that auto move (0, and 9 - 16)

The observation space (We have to make sure the blstats expanded here are correct.):

If we work with the same setup as the example agent we have the following:
observation_space['glyphs'] = Box(0, 5976, (21, 79), int16), which may represent a symbol with int val between 0 and 5976 in the shape (height=21, width=79)

observation_space['blstats'] = Box(-something, +something, (25, ), int16), which are 25 stats in an array:

  • [0] : X_Coordinate
  • [1] : Y_Coordinate
  • [2] : Strength Percentage
  • [3] : Strength (Strength corresponds to the ability to have more weight in your inventory.)
  • [4] : Dexterity (has a multitude of effects, of which the most significant is probably that it affects your chance of hitting monsters, whether in melee combat or with a missile or spell)
  • [5] : Constitution (Having a high constitution increases your healing rate and the number of HP you gain when levelling up and allows you to carry more weight in your inventory.)
  • [6] : Intelligence (If you are a Healer, Knight, Monk, Priest or Valkyrie, in which case it is wisdom that affects your chances of successfully casting a spe
  • [7] : Wisdom (A Healer, Knight, Monk, Priest or Valkyrie requires wisdom to cast spells)
  • [8] : Charisma (Charisma is mostly useful for obtaining better prices at shops. )
  • [9] : Score
  • [10] : Current Health Points
  • [11] : Maximum Health Points
  • [12] : Dungeon depth
  • [13] : Available gold
  • [14] : Current energy
  • [15] : Max energy
  • [16] : Armor class
  • [17] : Monster level
  • [18] : Experience level
  • [19] : Experience points
  • [20] : Time
  • [21] : Hunger level (Too little and you starve; too much and you choke.)
  • [22] : Carying capacity
  • [23] : NLE stat
  • [24] : NLE stat

About

Reinforcement Learning in the NetHack Environment

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •