This link has been bookmarked by 1 people . It was first bookmarked on 17 Feb 2008, by Moe Mauch.
-
17 Feb 08
-
We assume that states appear in examples with the same distribution,
, over which we are trying to minimize the MSE -
is a positive step-size -
, for any function
, denotes the vector of partial derivatives -
gradient of
with respect to
. -
If
is an unbiased estimate, that is, if
, for each
, then
is guaranteed to converge to a local optimum under the usual stochastic approximation conditions (2.7) for decreasing the step-size parameter
. -
-
As (8.4) provides the forward view of gradient-descent TD(
), so the backward view is provided by
where
is the usual TD error,
and
is a column vector of eligibility traces, one for each component of
, updated by
with
.
-
Would you like to comment?
Join Diigo for a free account, or sign in if you are already a member.