Skip to content

MohammadAsadolahi/Reinforcement-Learning-solving-a-simple-4-4-Gridworld-using-TD0-evaluation-method-in-python

Repository files navigation

Reinforcement-Learning-solving-a-simple-4*4-Gridworld-using-TD0-evaluation-method

solving a simple 4*4 Gridworld almost similar to openAI gym frozenlake using Temporal difference method Reinforcement Learning

algorithm contains some bugs!


Algorithm Flow


first we initialize a random policy that indicate prefered moves in every cell:

| D |  | L |  | R |  | D | 
----------------------------
| U |  | U |  | R |  | D | 
----------------------------
| D |  | R |  | R |  | U | 
----------------------------
| U |  | L |  | R | 
----------------------------

U = going up
D = going down
L = going left
R = going right

and we initialize Q table like:

(0, 0): {'D': 0, 'R': 0},
(0, 1): {'L': 0, 'D': 0, 'R': 0},
(0, 2): {'L': 0, 'D': 0, 'R': 0},
(0, 3): {'L': 0, 'D': 0},
(1, 0): {'U': 0, 'D': 0, 'R': 0},
(1, 1): {'U': 0, 'L': 0, 'D': 0, 'R': 0},
(1, 2): {'U': 0, 'L': 0, 'D': 0, 'R': 0},
(1, 3): {'U': 0, 'L': 0, 'D': 0},
(2, 0): {'U': 0, 'D': 0, 'R': 0},
(2, 1): {'U': 0, 'L': 0, 'D': 0, 'R': 0},
(2, 2): {'U': 0, 'L': 0, 'D': 0, 'R': 0},
(2, 3): {'U': 0, 'L': 0, 'D': 0},
(3, 0): {'U': 0, 'R': 0},
(3, 1): {'U': 0, 'L': 0, 'R': 0},
(3, 2): {'U': 0, 'L': 0, 'R': 0}}


 | R |  | R |  | D |  | L | 
----------------------------
 | U |  | R |  | D |  | D | 
----------------------------
 | D |  | R |  | R |  | D | 
----------------------------
 | R |  | R |  | R | 
----------------------------



 step:0
 | R |  | R |  | D |  | L | 
----------------------------
 | U |  | R |  | D |  | D | 
----------------------------
 | D |  | R |  | R |  | D | 
----------------------------
 | R |  | R |  | R | 
----------------------------
 | 0.0 |  | 0.0 |  | 0.0 |  | 0.0 | 
--------------------------------
 | 0.0 |  | 0.0 |  | 0.0 |  | 0.0 | 
--------------------------------
 | 0.0 |  | 0.0 |  | 0.0 |  | 0.1 | 
--------------------------------
 | 0.0 |  | 0.0 |  | 0.0 |  | 0.0 | 
--------------------------------

----------------------------





 step:100
 | R |  | R |  | D |  | L | 
----------------------------
 | U |  | R |  | D |  | D | 
----------------------------
 | D |  | R |  | R |  | D | 
----------------------------
 | R |  | R |  | R | 
----------------------------
 | 0.026012275504845712 |  | 0.08885289070054438 |  | 0.31226530150752957 |  | -0.024786977375880693 | 
--------------------------------
 | 0.0008908501874592104 |  | 0.06291967792871309 |  | 0.84349452162495 |  | 0.43539640063225415 | 
--------------------------------
 | 0.0 |  | 0.15654607548449048 |  | 2.0020346733409053 |  | 3.7926076351359854 | 
--------------------------------
 | 0.0 |  | -0.00072 |  | 0.7121543661878007 |  | 0.0 | 
--------------------------------

----------------------------





 step:200
 | R |  | R |  | D |  | L | 
----------------------------
 | U |  | R |  | D |  | D | 
----------------------------
 | D |  | R |  | R |  | D | 
----------------------------
 | R |  | R |  | R | 
----------------------------
 | 0.3795875441432838 |  | 0.6991492120115776 |  | 1.2645298228093873 |  | 0.10815998241652194 | 
--------------------------------
 | 0.022578486496616812 |  | 0.39647178233187175 |  | 2.05338168464567 |  | 1.197459594418069 | 
--------------------------------
 | 0.0 |  | 0.3716742452514141 |  | 2.843783846486921 |  | 4.200809984231413 | 
--------------------------------
 | -0.0400127008 |  | 0.05500977619404736 |  | 1.3262428996537865 |  | 0.0 | 
--------------------------------

----------------------------





 step:300
 | R |  | R |  | D |  | L | 
----------------------------
 | U |  | R |  | D |  | D | 
----------------------------
 | D |  | R |  | R |  | D | 
----------------------------
 | R |  | R |  | R | 
----------------------------
 | 0.9374486990718105 |  | 1.3220965611506466 |  | 1.8754176755005907 |  | 0.2814431181067252 | 
--------------------------------
 | 0.10941495378856969 |  | 0.6860396552916664 |  | 2.518020456989545 |  | 2.005789473630823 | 
--------------------------------
 | 0.0 |  | 0.5189134522485985 |  | 3.471504256528754 |  | 4.458899607515437 | 
--------------------------------
 | -0.0400127008 |  | 0.05500977619404736 |  | 1.5007091502148615 |  | 0.0 | 
--------------------------------

----------------------------





 step:400
 | R |  | R |  | D |  | L | 
----------------------------
 | U |  | R |  | D |  | D | 
----------------------------
 | D |  | R |  | R |  | D | 
----------------------------
 | R |  | R |  | R | 
----------------------------
 | 1.2285511426270412 |  | 1.7227671178191266 |  | 2.254719527673698 |  | 0.3948583519796051 | 
--------------------------------
 | 0.3564473378858508 |  | 1.0289010250551867 |  | 2.763540750720418 |  | 2.4088887070217604 | 
--------------------------------
 | -0.0007202286144 |  | 0.9821329377002909 |  | 3.42199080160048 |  | 4.549964622130637 | 
--------------------------------
 | -0.07822227081250716 |  | 0.08748939346140078 |  | 1.9896695669278603 |  | 0.0 | 
--------------------------------

----------------------------





 step:500
 | R |  | R |  | D |  | L | 
----------------------------
 | U |  | R |  | D |  | D | 
----------------------------
 | D |  | R |  | R |  | D | 
----------------------------
 | R |  | R |  | R | 
----------------------------
 | 1.5519814303543642 |  | 1.9862855550823793 |  | 2.3402863782530905 |  | 0.7817621219154731 | 
--------------------------------
 | 0.5852352423942883 |  | 1.3547635265199507 |  | 2.942040736685975 |  | 2.7139767903146947 | 
--------------------------------
 | -0.0007202286144 |  | 1.0704184198044615 |  | 3.8106954270548172 |  | 4.504399564467179 | 
--------------------------------
 | -0.07822227081250716 |  | 0.12426814704916482 |  | 2.3086374994994037 |  | 0.0 | 
--------------------------------

----------------------------





 step:600
 | R |  | R |  | D |  | L | 
----------------------------
 | U |  | R |  | D |  | D | 
----------------------------
 | D |  | R |  | R |  | D | 
----------------------------
 | R |  | R |  | R | 
----------------------------
 | 1.7111412672749062 |  | 2.108246382511863 |  | 2.50441743611798 |  | 1.0704883543402977 | 
--------------------------------
 | 0.8082776811200897 |  | 1.7169289054086683 |  | 3.0254630900635333 |  | 2.835242785304778 | 
--------------------------------
 | -0.0021138249167371287 |  | 1.28808953184633 |  | 3.815176405608271 |  | 4.690302092261093 | 
--------------------------------
 | -0.11442099874937205 |  | 0.1633382590991708 |  | 2.584930731801839 |  | 0.0 | 
--------------------------------

----------------------------





 step:700
 | R |  | R |  | D |  | L | 
----------------------------
 | U |  | R |  | D |  | D | 
----------------------------
 | D |  | R |  | R |  | D | 
----------------------------
 | R |  | R |  | R | 
----------------------------
 | 1.8491110390516932 |  | 2.1788215164015226 |  | 2.6695172861536434 |  | 1.1794277580691084 | 
--------------------------------
 | 0.9227234656959072 |  | 1.904589319760103 |  | 3.1655537611471476 |  | 3.0547097436644814 | 
--------------------------------
 | -0.006719657719325485 |  | 1.3073546295681406 |  | 4.024804496989696 |  | 4.8263643369719516 | 
--------------------------------
 | -0.1808998691424352 |  | 0.29813845474559714 |  | 2.947193114105866 |  | 0.0 | 
--------------------------------

----------------------------





 step:800
 | R |  | R |  | D |  | L | 
----------------------------
 | U |  | R |  | D |  | D | 
----------------------------
 | D |  | R |  | R |  | D | 
----------------------------
 | R |  | R |  | R | 
----------------------------
 | 1.9804856581980501 |  | 2.400805717178168 |  | 2.861589774511544 |  | 1.3158879308117706 | 
--------------------------------
 | 1.077611446891992 |  | 2.081877616550027 |  | 3.3997378205858237 |  | 3.1120569862943466 | 
--------------------------------
 | -0.006719657719325485 |  | 1.5148788818331789 |  | 3.9784468760115574 |  | 4.626867775518119 | 
--------------------------------
 | -0.1808998691424352 |  | 0.29813845474559714 |  | 3.106556289822498 |  | 0.0 | 
--------------------------------

----------------------------





 step:900
 | R |  | R |  | D |  | L | 
----------------------------
 | U |  | R |  | D |  | D | 
----------------------------
 | D |  | R |  | R |  | D | 
----------------------------
 | R |  | R |  | R | 
----------------------------
 | 2.0453746594988753 |  | 2.430780944743091 |  | 2.78274059657772 |  | 1.4868065870986555 | 
--------------------------------
 | 1.2080441155369541 |  | 2.162055255124448 |  | 3.2423854575956144 |  | 3.352774204750204 | 
--------------------------------
 | -0.028657871951945052 |  | 1.6708825094968385 |  | 3.8733477231404176 |  | 4.76662622993533 | 
--------------------------------
 | -0.2350957389641074 |  | 0.3411578452036233 |  | 3.389128106040918 |  | 0.0 | 
--------------------------------

----------------------------





 step:1000
 | R |  | R |  | D |  | L | 
----------------------------
 | U |  | R |  | D |  | D | 
----------------------------
 | D |  | R |  | R |  | D | 
----------------------------
 | R |  | R |  | R | 
----------------------------
 | 2.0638851584424547 |  | 2.3964463302595265 |  | 2.7057034946449727 |  | 1.6306508417394179 | 
--------------------------------
 | 1.386618576879537 |  | 2.1687303113243543 |  | 3.1497777726126714 |  | 3.420543087293492 | 
--------------------------------
 | -0.032316437814260086 |  | 1.8511870719320984 |  | 3.860986953686102 |  | 4.755910674064825 | 
--------------------------------
 | -0.2645648742366737 |  | 0.37893816141214826 |  | 3.654799751025675 |  | 0.0 | 
--------------------------------

----------------------------





 step:1100
 | R |  | R |  | D |  | L | 
----------------------------
 | U |  | R |  | D |  | D | 
----------------------------
 | D |  | R |  | R |  | D | 
----------------------------
 | R |  | R |  | R | 
----------------------------
 | 2.0537272161424065 |  | 2.4304125501965146 |  | 2.8089011454363786 |  | 1.6396214831405924 | 
--------------------------------
 | 1.4695375932124428 |  | 2.2262137253172414 |  | 3.2348607052248646 |  | 3.5265587029703935 | 
--------------------------------
 | -0.03643227679423501 |  | 2.1323882225450967 |  | 3.8433939610720738 |  | 4.722141708286746 | 
--------------------------------
 | -0.29245268984652156 |  | 0.43947221752755355 |  | 3.8083646087097387 |  | 0.0 | 
--------------------------------

----------------------------





 step:1200
 | R |  | R |  | D |  | L | 
----------------------------
 | U |  | R |  | D |  | D | 
----------------------------
 | D |  | R |  | R |  | D | 
----------------------------
 | R |  | R |  | R | 
----------------------------
 | 2.06810836981977 |  | 2.39425516977435 |  | 2.741912133902752 |  | 1.8184966455606666 | 
--------------------------------
 | 1.5483817956852761 |  | 2.3194857109503175 |  | 3.3406461504867457 |  | 3.585752165997799 | 
--------------------------------
 | -0.03643227679423501 |  | 2.24182717675796 |  | 3.994553369587223 |  | 4.7764219556576 | 
--------------------------------
 | -0.29245268984652156 |  | 0.43947221752755355 |  | 3.9992994559301422 |  | 0.0 | 
--------------------------------

----------------------------





 step:1300
 | R |  | R |  | D |  | L | 
----------------------------
 | U |  | R |  | D |  | D | 
----------------------------
 | D |  | R |  | R |  | D | 
----------------------------
 | R |  | R |  | R | 
----------------------------
 | 2.0884582209404567 |  | 2.4456684850488677 |  | 2.8857066238507425 |  | 1.8998728775020695 | 
--------------------------------
 | 1.5717162058354117 |  | 2.4129217454420515 |  | 3.3062537794355777 |  | 3.729864631193589 | 
--------------------------------
 | -0.04629125447790337 |  | 2.196680491535763 |  | 3.857646273455679 |  | 4.597105082241745 | 
--------------------------------
 | -0.36232116235491885 |  | 0.7323668637559092 |  | 4.052403465614548 |  | 0.0 | 
--------------------------------

----------------------------





 step:1400
 | R |  | R |  | D |  | L | 
----------------------------
 | U |  | R |  | D |  | D | 
----------------------------
 | D |  | R |  | R |  | D | 
----------------------------
 | R |  | R |  | R | 
----------------------------
 | 2.1714050042280975 |  | 2.5095138704613653 |  | 2.836802212759311 |  | 1.9463466297048304 | 
--------------------------------
 | 1.5722714021541129 |  | 2.570966801723427 |  | 3.3672278995755978 |  | 3.8312216049147656 | 
--------------------------------
 | -0.06332433374694559 |  | 2.2505199532157123 |  | 4.01447656742647 |  | 4.525641366621889 | 
--------------------------------
 | -0.3935493793399403 |  | 0.8494512534487643 |  | 4.143447785593383 |  | 0.0 | 
--------------------------------

----------------------------





 step:1500
 | R |  | R |  | D |  | L | 
----------------------------
 | U |  | R |  | D |  | D | 
----------------------------
 | D |  | R |  | R |  | D | 
----------------------------
 | R |  | R |  | R | 
----------------------------
 | 2.1893012566849683 |  | 2.5333804894165293 |  | 2.9114467780455215 |  | 1.9869791676873707 | 
--------------------------------
 | 1.6676367035195736 |  | 2.5923329720210586 |  | 3.348648439606155 |  | 3.884628459596083 | 
--------------------------------
 | -0.06332433374694559 |  | 2.3031729629861464 |  | 4.018165699492697 |  | 4.722157744367752 | 
--------------------------------
 | -0.3935493793399403 |  | 0.8494512534487643 |  | 4.182881523892219 |  | 0.0 | 
--------------------------------

----------------------------





 step:1600
 | R |  | R |  | D |  | L | 
----------------------------
 | U |  | R |  | D |  | D | 
----------------------------
 | D |  | R |  | R |  | D | 
----------------------------
 | R |  | R |  | R | 
----------------------------
 | 2.2281000308397245 |  | 2.559874308687316 |  | 2.8697291540329792 |  | 2.0778170715265727 | 
--------------------------------
 | 1.7189684614202212 |  | 2.6408066631671567 |  | 3.28135710194414 |  | 3.8967903186895607 | 
--------------------------------
 | -0.06332433374694559 |  | 2.3690158657267557 |  | 3.925502302250625 |  | 4.630057701071106 | 
--------------------------------
 | -0.3935493793399403 |  | 0.964366514015704 |  | 4.154979794955975 |  | 0.0 | 
--------------------------------

----------------------------





 step:1700
 | R |  | R |  | D |  | L | 
----------------------------
 | U |  | R |  | D |  | D | 
----------------------------
 | D |  | R |  | R |  | D | 
----------------------------
 | R |  | R |  | R | 
----------------------------
 | 2.222121942564929 |  | 2.525713185983246 |  | 2.8463065699530237 |  | 2.104538730465739 | 
--------------------------------
 | 1.785707649881144 |  | 2.6566295955612875 |  | 3.26785266282995 |  | 3.9379540854517248 | 
--------------------------------
 | -0.06332433374694559 |  | 2.536628730050737 |  | 3.834709706368889 |  | 4.66253905748329 | 
--------------------------------
 | -0.3935493793399403 |  | 0.964366514015704 |  | 4.253629417288097 |  | 0.0 | 
--------------------------------

----------------------------





 step:1800
 | R |  | R |  | D |  | L | 
----------------------------
 | U |  | R |  | D |  | D | 
----------------------------
 | D |  | R |  | R |  | D | 
----------------------------
 | R |  | R |  | R | 
----------------------------
 | 2.1974448561187514 |  | 2.5152046143336344 |  | 2.8333600238738432 |  | 2.1418432598366426 | 
--------------------------------
 | 1.8330577631345617 |  | 2.644293024547355 |  | 3.2031608326776144 |  | 3.918019472996897 | 
--------------------------------
 | -0.08007857500608405 |  | 2.576651931490971 |  | 3.9683021165819277 |  | 4.482879086199157 | 
--------------------------------
 | -0.3955603199940009 |  | 0.964366514015704 |  | 4.3253401081669764 |  | 0.0 | 
--------------------------------

----------------------------





 step:1900
 | R |  | R |  | D |  | L | 
----------------------------
 | U |  | R |  | D |  | D | 
----------------------------
 | D |  | R |  | R |  | D | 
----------------------------
 | R |  | R |  | R | 
----------------------------
 | 2.209144285007248 |  | 2.5163104954031303 |  | 2.85633182085655 |  | 2.211758234340188 | 
--------------------------------
 | 1.88080729165131 |  | 2.6576371224248123 |  | 3.3824366927625937 |  | 3.8525557456924906 | 
--------------------------------
 | -0.08007857500608405 |  | 2.692124961438562 |  | 3.866695853160625 |  | 4.7256888659593805 | 
--------------------------------
 | -0.3955603199940009 |  | 1.0213994707535872 |  | 4.255215626546297 |  | 0.0 | 
--------------------------------

----------------------------





 step:2000
 | R |  | R |  | D |  | L | 
----------------------------
 | U |  | R |  | D |  | D | 
----------------------------
 | D |  | R |  | R |  | D | 
----------------------------
 | R |  | R |  | R | 
----------------------------
 | 2.2416286008721964 |  | 2.5571186916204742 |  | 2.960921063063906 |  | 2.2464261858307277 | 
--------------------------------
 | 1.9117083949713671 |  | 2.71824674821634 |  | 3.401485079857313 |  | 3.9030405657409544 | 
--------------------------------
 | -0.08007857500608405 |  | 2.6505117424618314 |  | 4.015752814053215 |  | 4.529291294701698 | 
--------------------------------
 | -0.3955603199940009 |  | 1.0786075457805084 |  | 4.3535090424604554 |  | 0.0 | 
--------------------------------

----------------------------





 step:2100
 | R |  | R |  | D |  | L | 
----------------------------
 | U |  | R |  | D |  | D | 
----------------------------
 | D |  | R |  | R |  | D | 
----------------------------
 | R |  | R |  | R | 
----------------------------
 | 2.25551663468766 |  | 2.588970056477259 |  | 2.862910538317157 |  | 2.3190400827368 | 
--------------------------------
 | 1.8701049561303107 |  | 2.7692101995575342 |  | 3.3796671373301406 |  | 3.957734389077514 | 
--------------------------------
 | -0.09123336268039858 |  | 2.7354782316281194 |  | 3.989252171299937 |  | 4.6827356347423015 | 
--------------------------------
 | -0.42089293292656094 |  | 1.1237488220833436 |  | 4.4387646927468 |  | 0.0 | 
--------------------------------

----------------------------





 step:2200
 | R |  | R |  | D |  | L | 
----------------------------
 | U |  | R |  | D |  | D | 
----------------------------
 | D |  | R |  | R |  | D | 
----------------------------
 | R |  | R |  | R | 
----------------------------
 | 2.2959893086286725 |  | 2.6153422702424534 |  | 2.981211416064012 |  | 2.3814609361219268 | 
--------------------------------
 | 1.897903743801355 |  | 2.828618178935658 |  | 3.4614502929068403 |  | 3.9450338711031203 | 
--------------------------------
 | -0.0969847682194687 |  | 2.729316553011482 |  | 3.9188592188317863 |  | 4.470457733288336 | 
--------------------------------
 | -0.43224759547052954 |  | 1.181850402819875 |  | 4.497206972996758 |  | 0.0 | 
--------------------------------

----------------------------





 step:2300
 | R |  | R |  | D |  | L | 
----------------------------
 | U |  | R |  | D |  | D | 
----------------------------
 | D |  | R |  | R |  | D | 
----------------------------
 | R |  | R |  | R | 
----------------------------
 | 2.330668848641675 |  | 2.644940614430643 |  | 2.9798876813413373 |  | 2.3826790696447073 | 
--------------------------------
 | 1.8862250583148212 |  | 2.7947426271828673 |  | 3.371199817259146 |  | 3.8934746291465117 | 
--------------------------------
 | -0.10282552957354886 |  | 2.713689654929019 |  | 4.0078321953425515 |  | 4.810823340567074 | 
--------------------------------
 | -0.4423293363103612 |  | 1.2251250018669932 |  | 4.589181490142975 |  | 0.0 | 
--------------------------------

----------------------------





 step:2400
 | R |  | R |  | D |  | L | 
----------------------------
 | U |  | R |  | D |  | D | 
----------------------------
 | D |  | R |  | R |  | D | 
----------------------------
 | R |  | R |  | R | 
----------------------------
 | 2.3028237819707535 |  | 2.6554888945534203 |  | 3.0289491913478317 |  | 2.392312706971362 | 
--------------------------------
 | 1.8555146474799293 |  | 2.814087966623228 |  | 3.4198047853717872 |  | 3.9519646885399196 | 
--------------------------------
 | -0.11485319315910333 |  | 2.7984238194726783 |  | 4.145743026167115 |  | 4.831847508611819 | 
--------------------------------
 | -0.47788647213383173 |  | 1.3321489040529804 |  | 4.523692532873715 |  | 0.0 | 
--------------------------------

----------------------------





 step:2500
 | R |  | R |  | D |  | L | 
----------------------------
 | U |  | R |  | D |  | D | 
----------------------------
 | D |  | R |  | R |  | D | 
----------------------------
 | R |  | R |  | R | 
----------------------------
 | 2.3356651087560754 |  | 2.683349855333285 |  | 3.0265980211825094 |  | 2.407878197644426 | 
--------------------------------
 | 1.7860776680357946 |  | 2.8862732343535327 |  | 3.408799759853414 |  | 3.9518895109277254 | 
--------------------------------
 | -0.13987653207822034 |  | 2.9021639580834186 |  | 3.9112283830933348 |  | 4.681274717252181 | 
--------------------------------
 | -0.4946435827907281 |  | 1.4617419369226707 |  | 4.594775219589367 |  | 0.0 | 
--------------------------------

----------------------------





 step:2600
 | R |  | R |  | D |  | L | 
----------------------------
 | U |  | R |  | D |  | D | 
----------------------------
 | D |  | R |  | R |  | D | 
----------------------------
 | R |  | R |  | R | 
----------------------------
 | 2.2847096661845554 |  | 2.638676475858352 |  | 2.994758739365008 |  | 2.4546466101804727 | 
--------------------------------
 | 1.8213668113838914 |  | 2.920594981124154 |  | 3.41739663619113 |  | 3.9354021854616086 | 
--------------------------------
 | -0.15203484262121675 |  | 2.9519764378849915 |  | 3.9881361953991323 |  | 4.700546494431717 | 
--------------------------------
 | -0.5011967342064365 |  | 1.5681809952624017 |  | 4.633708893608115 |  | 0.0 | 
--------------------------------

----------------------------





 step:2700
 | R |  | R |  | D |  | L | 
----------------------------
 | U |  | R |  | D |  | D | 
----------------------------
 | D |  | R |  | R |  | D | 
----------------------------
 | R |  | R |  | R | 
----------------------------
 | 2.3148357688871615 |  | 2.6120107264599324 |  | 2.8581718046082565 |  | 2.461183980350833 | 
--------------------------------
 | 1.8806174678641439 |  | 2.9126412684874086 |  | 3.3125467709186105 |  | 3.92710892987151 | 
--------------------------------
 | -0.15801568698450827 |  | 2.9371578324615437 |  | 3.9724752031490427 |  | 4.663236229556999 | 
--------------------------------
 | -0.5029455416075845 |  | 1.6202241354420999 |  | 4.700713497562438 |  | 0.0 | 
--------------------------------

----------------------------





 step:2800
 | R |  | R |  | D |  | L | 
----------------------------
 | U |  | R |  | D |  | D | 
----------------------------
 | D |  | R |  | R |  | D | 
----------------------------
 | R |  | R |  | R | 
----------------------------
 | 2.250779450936691 |  | 2.597859594429423 |  | 2.9455931266829394 |  | 2.460059494016751 | 
--------------------------------
 | 1.891173754372728 |  | 2.9263313241713993 |  | 3.415217470937708 |  | 3.9943688613684554 | 
--------------------------------
 | -0.16390839299375462 |  | 2.9100248982949446 |  | 4.033970566718138 |  | 4.684413045103019 | 
--------------------------------
 | -0.502782845853024 |  | 1.72317280222733 |  | 4.708182049200756 |  | 0.0 | 
--------------------------------

----------------------------





 step:2900
 | R |  | R |  | D |  | L | 
----------------------------
 | U |  | R |  | D |  | D | 
----------------------------
 | D |  | R |  | R |  | D | 
----------------------------
 | R |  | R |  | R | 
----------------------------
 | 2.283423666442579 |  | 2.58564316536397 |  | 2.9199013700688536 |  | 2.4369047448340977 | 
--------------------------------
 | 1.8552835544550195 |  | 2.8758637925569923 |  | 3.3271026950027234 |  | 4.005666691401718 | 
--------------------------------
 | -0.09786675338561203 |  | 2.8576829017518253 |  | 3.9908424823788415 |  | 4.715802967823407 | 
--------------------------------
 | -0.502782845853024 |  | 1.7735616775306848 |  | 4.741495291685145 |  | 0.0 | 
--------------------------------

----------------------------





 step:3000
 | R |  | R |  | D |  | L | 
----------------------------
 | U |  | R |  | D |  | D | 
----------------------------
 | D |  | R |  | R |  | D | 
----------------------------
 | R |  | R |  | R | 
----------------------------
 | 2.2818253663210477 |  | 2.6185191076212107 |  | 2.9671819372751127 |  | 2.4641978515271155 | 
--------------------------------
 | 1.8893890052643765 |  | 2.6997780422572557 |  | 3.406895953883866 |  | 4.050225031334686 | 
--------------------------------
 | -0.09786675338561203 |  | 2.964233477212128 |  | 4.100622785640691 |  | 4.667669062316396 | 
--------------------------------
 | -0.502782845853024 |  | 1.7735616775306848 |  | 4.771005573721815 |  | 0.0 | 
--------------------------------

----------------------------





 step:3100
 | R |  | R |  | D |  | L | 
----------------------------
 | U |  | R |  | D |  | D | 
----------------------------
 | D |  | R |  | R |  | D | 
----------------------------
 | R |  | R |  | R | 
----------------------------
 | 2.2409867731430384 |  | 2.5879048008865566 |  | 2.9684652650562247 |  | 2.4892521009290594 | 
--------------------------------
 | 1.9093308354469036 |  | 2.7345865789031256 |  | 3.3905518178248295 |  | 4.047082249237818 | 
--------------------------------
 | -0.11187477476971654 |  | 2.9702146251128965 |  | 4.028856419308793 |  | 4.670852485266842 | 
--------------------------------
 | -0.49273190692600055 |  | 1.913987251961752 |  | 4.592487625947743 |  | 0.0 | 
--------------------------------

----------------------------





 step:3200
 | R |  | R |  | D |  | L | 
----------------------------
 | U |  | R |  | D |  | D | 
----------------------------
 | D |  | R |  | R |  | D | 
----------------------------
 | R |  | R |  | R | 
----------------------------
 | 2.3168885215446946 |  | 2.6338400561031907 |  | 2.944426663869434 |  | 2.506451885846339 | 
--------------------------------
 | 1.8921672481356149 |  | 2.783137216470791 |  | 3.379214119082044 |  | 4.037221094111107 | 
--------------------------------
 | -0.10168653307713064 |  | 2.9886937989461373 |  | 3.9701514453941114 |  | 4.6973962838967 | 
--------------------------------
 | -0.4876619842437583 |  | 1.9983297036225975 |  | 4.512891037381999 |  | 0.0 | 
--------------------------------

----------------------------





 step:3300
 | R |  | R |  | D |  | L | 
----------------------------
 | U |  | R |  | D |  | D | 
----------------------------
 | D |  | R |  | R |  | D | 
----------------------------
 | R |  | R |  | R | 
----------------------------
 | 2.2979750563019206 |  | 2.5856333280821193 |  | 2.952141618749421 |  | 2.5092901668978635 | 
--------------------------------
 | 1.9220260278573722 |  | 2.8001204998367513 |  | 3.301085378014462 |  | 3.941614704857609 | 
--------------------------------
 | -0.10843071813197568 |  | 2.9720036142877713 |  | 4.081631123689444 |  | 4.65277624389716 | 
--------------------------------
 | -0.4819388098936764 |  | 2.0395951482230217 |  | 4.5410738224349725 |  | 0.0 | 
--------------------------------

----------------------------





 step:3400
 | R |  | R |  | D |  | L | 
----------------------------
 | U |  | R |  | D |  | D | 
----------------------------
 | D |  | R |  | R |  | D | 
----------------------------
 | R |  | R |  | R | 
----------------------------
 | 2.2943445653176053 |  | 2.608676974954453 |  | 2.9307182967473384 |  | 2.5111417717737226 | 
--------------------------------
 | 1.9334894855000913 |  | 2.834175301938307 |  | 3.350988921819594 |  | 4.0040811483440395 | 
--------------------------------
 | -0.11493700234742234 |  | 3.0247640785241536 |  | 3.7872881780874694 |  | 4.545986628026715 | 
--------------------------------
 | -0.47558732102778845 |  | 2.081485592618611 |  | 4.609562578032872 |  | 0.0 | 
--------------------------------

----------------------------





 step:3500
 | R |  | R |  | D |  | L | 
----------------------------
 | U |  | R |  | D |  | D | 
----------------------------
 | D |  | R |  | R |  | D | 
----------------------------
 | R |  | R |  | R | 
----------------------------
 | 2.234631867766403 |  | 2.5699626208588846 |  | 2.853277604533315 |  | 2.4787177880488587 | 
--------------------------------
 | 1.8276629780806632 |  | 2.8476905497243763 |  | 3.154927361127724 |  | 3.9223025214618548 | 
--------------------------------
 | -0.1384563273906389 |  | 3.0548918001366174 |  | 3.8511424341833904 |  | 4.554038772734177 | 
--------------------------------
 | -0.44413814690075387 |  | 2.2435508309947494 |  | 4.6676806279554395 |  | 0.0 | 
--------------------------------

----------------------------





 step:3600
 | R |  | R |  | D |  | L | 
----------------------------
 | U |  | R |  | D |  | D | 
----------------------------
 | D |  | R |  | R |  | D | 
----------------------------
 | R |  | R |  | R | 
----------------------------
 | 2.2279400918451597 |  | 2.55056698732247 |  | 2.8915115896497787 |  | 2.495619697287173 | 
--------------------------------
 | 1.8642739652643958 |  | 2.8089372994854402 |  | 3.349669392787899 |  | 3.882735128847659 | 
--------------------------------
 | -0.1384563273906389 |  | 3.0338238069721295 |  | 3.8720139450463953 |  | 4.525287210781496 | 
--------------------------------
 | -0.44413814690075387 |  | 2.319217873567548 |  | 4.622547178671772 |  | 0.0 | 
--------------------------------

----------------------------





 step:3700
 | R |  | R |  | D |  | L | 
----------------------------
 | U |  | R |  | D |  | D | 
----------------------------
 | D |  | R |  | R |  | D | 
----------------------------
 | R |  | R |  | R | 
----------------------------
 | 2.2518173535763832 |  | 2.5787594586272617 |  | 2.936004305940645 |  | 2.5138151396213977 | 
--------------------------------
 | 1.8607027260595352 |  | 2.781335957489635 |  | 3.4136048659693774 |  | 3.9294613079507505 | 
--------------------------------
 | -0.1436816874870397 |  | 3.057888346812081 |  | 4.08813701298765 |  | 4.753710501936774 | 
--------------------------------
 | -0.43350946223852294 |  | 2.3561752483279674 |  | 4.637494310396369 |  | 0.0 | 
--------------------------------

----------------------------





 step:3800
 | R |  | R |  | D |  | L | 
----------------------------
 | U |  | R |  | D |  | D | 
----------------------------
 | D |  | R |  | R |  | D | 
----------------------------
 | R |  | R |  | R | 
----------------------------
 | 2.2694315309198925 |  | 2.6252149912216565 |  | 2.952221655806219 |  | 2.5220957006804547 | 
--------------------------------
 | 1.8547265451266775 |  | 2.7979520412063605 |  | 3.4192388914965433 |  | 3.994871788065981 | 
--------------------------------
 | -0.1486112240575923 |  | 3.122729423705193 |  | 4.090974342596318 |  | 4.659405611584359 | 
--------------------------------
 | -0.42242811852384904 |  | 2.3929103691712386 |  | 4.665636294090769 |  | 0.0 | 
--------------------------------

----------------------------





 step:3900
 | R |  | R |  | D |  | L | 
----------------------------
 | U |  | R |  | D |  | D | 
----------------------------
 | D |  | R |  | R |  | D | 
----------------------------
 | R |  | R |  | R | 
----------------------------
 | 2.285650873183429 |  | 2.6021994983768097 |  | 2.9388257916356366 |  | 2.5323135640895504 | 
--------------------------------
 | 1.9058729884740095 |  | 2.8207971806435896 |  | 3.37149842340012 |  | 3.9362916182843244 | 
--------------------------------
 | -0.1486112240575923 |  | 3.181229606613334 |  | 3.921062824620547 |  | 4.3258266321017915 | 
--------------------------------
 | -0.42242811852384904 |  | 2.3929103691712386 |  | 4.686445967692615 |  | 0.0 | 
--------------------------------

----------------------------





 step:4000
 | R |  | R |  | D |  | L | 
----------------------------
 | U |  | R |  | D |  | D | 
----------------------------
 | D |  | R |  | R |  | D | 
----------------------------
 | R |  | R |  | R | 
----------------------------
 | 2.249942298382279 |  | 2.5741553438043123 |  | 2.8536576925548176 |  | 2.5280724651270234 | 
--------------------------------
 | 1.9734974600058297 |  | 2.8387605218392378 |  | 3.331978136650207 |  | 3.919899219632974 | 
--------------------------------
 | -0.1486112240575923 |  | 3.198171089541271 |  | 3.8410697406183547 |  | 4.597362055915224 | 
--------------------------------
 | -0.42242811852384904 |  | 2.3929103691712386 |  | 4.716571989276855 |  | 0.0 | 
--------------------------------

----------------------------





 step:4100
 | R |  | R |  | D |  | L | 
----------------------------
 | U |  | R |  | D |  | D | 
----------------------------
 | D |  | R |  | R |  | D | 
----------------------------
 | R |  | R |  | R | 
----------------------------
 | 2.254118682938152 |  | 2.563957575463326 |  | 2.8984802108406473 |  | 2.5226293913114874 | 
--------------------------------
 | 1.845030951230756 |  | 2.8301684262421776 |  | 3.259027625760046 |  | 3.887756101284385 | 
--------------------------------
 | -0.0906107138290735 |  | 3.219050472082059 |  | 3.8256210176631247 |  | 4.62585767582778 | 
--------------------------------
 | -0.3918187614535082 |  | 2.4979355768019964 |  | 4.666334021558213 |  | 0.0 | 
--------------------------------

----------------------------





 step:4200
 | R |  | R |  | D |  | L | 
----------------------------
 | U |  | R |  | D |  | D | 
----------------------------
 | D |  | R |  | R |  | D | 
----------------------------
 | R |  | R |  | R | 
----------------------------
 | 2.280096652793734 |  | 2.5628340947250887 |  | 2.910526061164713 |  | 2.5302796567424073 | 
--------------------------------
 | 1.8293725190517067 |  | 2.797007731669185 |  | 3.437173321617448 |  | 3.9175161895235124 | 
--------------------------------
 | -0.05779778503959636 |  | 3.122300478549357 |  | 3.964260237814277 |  | 4.72268632969544 | 
--------------------------------
 | -0.36789544818729564 |  | 2.508746978214801 |  | 4.604674712544526 |  | 0.0 | 
--------------------------------

----------------------------





 step:4300
 | R |  | R |  | D |  | L | 
----------------------------
 | U |  | R |  | D |  | D | 
----------------------------
 | D |  | R |  | R |  | D | 
----------------------------
 | R |  | R |  | R | 
----------------------------
 | 2.294846952408533 |  | 2.610582727066826 |  | 2.9386932712251332 |  | 2.5343616521163415 | 
--------------------------------
 | 1.8590377417275377 |  | 2.820752659868471 |  | 3.4224965742316265 |  | 3.951135086380666 | 
--------------------------------
 | -0.05779778503959636 |  | 3.161101562089526 |  | 4.003233712329802 |  | 4.734643058610507 | 
--------------------------------
 | -0.36789544818729564 |  | 2.508746978214801 |  | 4.635364542008224 |  | 0.0 | 
--------------------------------

----------------------------





 step:4400
 | R |  | R |  | D |  | L | 
----------------------------
 | U |  | R |  | D |  | D | 
----------------------------
 | D |  | R |  | R |  | D | 
----------------------------
 | R |  | R |  | R | 
----------------------------
 | 2.267685589234615 |  | 2.617614911788245 |  | 2.9567076267286825 |  | 2.547014291866991 | 
--------------------------------
 | 1.8280732103730708 |  | 2.8493470329660595 |  | 3.416608329848797 |  | 3.92799750962139 | 
--------------------------------
 | -0.03528918308289623 |  | 3.1238040937028027 |  | 4.076294209779394 |  | 4.454053575132283 | 
--------------------------------
 | -0.34256976251937654 |  | 2.5401584631633636 |  | 4.608506250521682 |  | 0.0 | 
--------------------------------

----------------------------





 step:4500
 | R |  | R |  | D |  | L | 
----------------------------
 | U |  | R |  | D |  | D | 
----------------------------
 | D |  | R |  | R |  | D | 
----------------------------
 | R |  | R |  | R | 
----------------------------
 | 2.2955282286326217 |  | 2.605672041695138 |  | 2.945572852695414 |  | 2.552102169887071 | 
--------------------------------
 | 1.8766146548838845 |  | 2.8922769909376442 |  | 3.5188372352550927 |  | 3.937645009426949 | 
--------------------------------
 | -0.03528918308289623 |  | 3.153504338874 |  | 4.010930491761269 |  | 4.587771059284745 | 
--------------------------------
 | -0.34256976251937654 |  | 2.5401584631633636 |  | 4.624009403001023 |  | 0.0 | 
--------------------------------

----------------------------





 step:4600
 | R |  | R |  | D |  | L | 
----------------------------
 | U |  | R |  | D |  | D | 
----------------------------
 | D |  | R |  | R |  | D | 
----------------------------
 | R |  | R |  | R | 
----------------------------
 | 2.3066347669493537 |  | 2.625575734935165 |  | 2.930066073966142 |  | 2.542427303617948 | 
--------------------------------
 | 1.8884703784617818 |  | 2.887682625901573 |  | 3.2632562789022463 |  | 3.950434211689313 | 
--------------------------------
 | -0.040749655146587085 |  | 3.1747934875728117 |  | 3.889485984571322 |  | 4.575268994015025 | 
--------------------------------
 | -0.3299955149320485 |  | 2.5731128623077337 |  | 4.660134279971377 |  | 0.0 | 
--------------------------------

----------------------------





 step:4700
 | R |  | R |  | D |  | L | 
----------------------------
 | U |  | R |  | D |  | D | 
----------------------------
 | D |  | R |  | R |  | D | 
----------------------------
 | R |  | R |  | R | 
----------------------------
 | 2.2508880446496398 |  | 2.5472274032513416 |  | 2.862802775124169 |  | 2.5486661696865642 | 
--------------------------------
 | 1.9325505683397801 |  | 2.8177589419666305 |  | 3.3417176834748705 |  | 3.9839480811744905 | 
--------------------------------
 | -0.040749655146587085 |  | 3.2062455134476697 |  | 4.012955493773023 |  | 4.66029819912978 | 
--------------------------------
 | -0.3299955149320485 |  | 2.5731128623077337 |  | 4.658027459171573 |  | 0.0 | 
--------------------------------

----------------------------





 step:4800
 | R |  | R |  | D |  | L | 
----------------------------
 | U |  | R |  | D |  | D | 
----------------------------
 | D |  | R |  | R |  | D | 
----------------------------
 | R |  | R |  | R | 
----------------------------
 | 2.2213332073497774 |  | 2.592190370517068 |  | 2.9880011097949626 |  | 2.5550376908312677 | 
--------------------------------
 | 1.9187489266124547 |  | 2.8124298161248005 |  | 3.469643030166058 |  | 3.902822315273502 | 
--------------------------------
 | -0.04587458131243222 |  | 3.173868163866955 |  | 4.0310508387753945 |  | 4.637236095047366 | 
--------------------------------
 | -0.31649447686346294 |  | 2.6674746749140112 |  | 4.592569849100079 |  | 0.0 | 
--------------------------------

----------------------------





 step:4900
 | R |  | R |  | D |  | L | 
----------------------------
 | U |  | R |  | D |  | D | 
----------------------------
 | D |  | R |  | R |  | D | 
----------------------------
 | R |  | R |  | R | 
----------------------------
 | 2.2759114493372303 |  | 2.6118349776044245 |  | 2.9287682300444695 |  | 2.5761326979187587 | 
--------------------------------
 | 1.9708340645032496 |  | 2.7074982618408385 |  | 3.2549557123758954 |  | 3.9208643731108834 | 
--------------------------------
 | -0.050653990269725904 |  | 3.1161833379761843 |  | 3.8713826309499386 |  | 4.598133848802746 | 
--------------------------------
 | -0.3011309751263997 |  | 2.751054898945296 |  | 4.53434339726199 |  | 0.0 | 
--------------------------------

----------------------------


exploited:30027  explored:7415

About

solving a simple 4*4 Gridworld almost similar to openAI gym FrozenLake using Temporal difference method Reinforcement Learning

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published