This link has been bookmarked by 1 people . It was first bookmarked on 17 Feb 2008, by Moe Mauch.
-
17 Feb 08
-
value function

-
parameter vector

-
value function
depends totally on 
-
number of parameters (the number of components of
) is much less than the number of states -
individual backup
-
, where
is the state backed up and
is the backed-up value -
backup
means that the estimated value for state
should be more like
. -
In reinforcement learning
-
learning
-
occur on-line
-
nonstationary target functions
-
Methods that cannot easily handle such nonstationarity are less suitable for reinforcement learning
-
target function is the true value function
, -
value prediction problem
-
inputs are states
-
not possible to reduce the error to zero at all states
-
where
is a distribution weighting the errors of different states -
important
-
Better approximation at some states can be gained, generally, only at the expense of worse approximation at other states
-
distribution
is also usually the distribution from which the states in the training examples are drawn -
distribution of particular interest
-
frequency with which states are encountered
-
on-policy distribution
-
best predictions
-
not necessarily the best for minimizing MSE
-
not yet clear what a more useful alternative goal for value prediction might be
-
Would you like to comment?
Join Diigo for a free account, or sign in if you are already a member.