Skip to content

Latest commit

 

History

History
41 lines (34 loc) · 2.32 KB

Policy-Iteration.md

File metadata and controls

41 lines (34 loc) · 2.32 KB

POLICY-ITERATION

AIMA3e

function POLICY-ITERATION(mdp) returns a policy
inputs: mdp, an MDP with states S, actions A(s), transition model P(s′ | s, a)
local variables: U, a vector of utilities for states in S, initially zero
        π, a policy vector indexed by state, initially random

repeat
   U ← POLICY-EVALUATION(π, U, mdp)
   unchanged? ← true
   for each state s in S do
     if maxaA(s) Σs′ P(s′ | s, a) U[s′] > Σs′ P(s′ | s, π[s]) U[s′] then do
       π[s] ← argmaxaA(s) Σs′ P(s′ | s, a) U[s′]
       unchanged? ← false
until unchanged?
return π


Figure ?? The policy iteration algorithm for calculating an optimal policy.


AIMA4e

function POLICY-ITERATION(mdp) returns a policy
inputs: mdp, an MDP with states S, actions A(s), transition model P(s′ | s, a)
local variables: U, a vector of utilities for states in S, initially zero
        π, a policy vector indexed by state, initially random

repeat
   U ← POLICY-EVALUATION(π, U, mdp)
   unchanged? ← true
   for each state s in S do
     a * ← argmaxaA(s) Q-VALUE(mdp,s,a,U)
     if Q-VALUE(mdp,s,a*,U) > Q-VALUE(mdp,s,π[s],U) then do
       π[s] ← a* ; unchanged? ← false
until unchanged?
return π


Figure ?? The policy iteration algorithm for calculating an optimal policy.