Skip to main content

8.2 Gradient-Descent Methods - The Diigo Meta page

www.cs.ualberta.ca/...node87.html - Cached

Share This

This link has been bookmarked by 1 people . It was first bookmarked on 17 Feb 2008, by Moe Mauch.

17 Feb 08

Moe Mauch
- We assume that states appear in examples with the same distribution, , over which we are trying to minimize the MSE
- $\alpha$ is a positive step-size
- , for any function , denotes the vector of partial derivatives
- gradient of with respect to .
- general gradient-descent method for state-value prediction:
  
  (8.3)
- If is an unbiased estimate, that is, if , for each , then is guaranteed to converge to a local optimum under the usual stochastic approximation conditions (2.7) for decreasing the step-size parameter $\alpha$ .
- gradient-descent form of TD( $\lambda$ ) uses the $\lambda$ -return, , as its approximation to , yielding the forward-view update:
  
  (8.4)
- As (8.4) provides the forward view of gradient-descent TD( $\lambda$ ), so the backward view is provided by
  
     (8.5)
  
  where is the usual TD error,
  
     (8.6)
  
  and is a column vector of eligibility traces, one for each component of , updated by
  
     (8.7)
  
  with .
6 more annotations...

Would you like to comment?

Join Diigo for a free account, or sign in if you are already a member.

Top Tags

no_tag

Other bookmarks from the site www.cs.ualberta.ca »

Check out another URL