Skip to content

Latest commit

 

History

History
18 lines (15 loc) · 1.3 KB

Q-Learning-Agent.md

File metadata and controls

18 lines (15 loc) · 1.3 KB

Q-LEARNING-AGENT

#AIMA3e function Q-Learning_Agent(percept) returns an action
inputs: percept, a percept indicating the current state s' and reward signal r'
persistent: Q, a table of action values indexed by state and action, initially zero
       Nsa, a table of frequencies for state-action pairs, initially zero
       s, a, r, the previous state, action, and reward, initially null

if Terminal?(s) then Q[s, None] ← r'
if s is not null then
   increment Nsa[s, a]
   Q[s, a] ← Q[s, a] + α(Nsa[s, a])(r + γ maxa' Q[s', a'] - Q[s, a])
s, a, rs', argmaxa' f(Q[s', a'], Nsa[s', a']), r'
return a


Figure ?? An exploratory Q-learning agent. It is an active learner that learns the value Q(s, a) of each action in each situation. It uses the same exploration function f as the exploratory ADP agent, but avoids having to learn the transition model because the Q-value of a state can be related directly to those of its neighbors.