Skip to main content

8.1 Value Prediction with Function Approximation - The Diigo Meta page

www.cs.ualberta.ca/...node86.html - Cached

Share This

This link has been bookmarked by 1 people . It was first bookmarked on 17 Feb 2008, by Moe Mauch.

17 Feb 08

Moe Mauch
- value function
- parameter vector
- value function depends totally on
- number of parameters (the number of components of ) is much less than the number of states
- individual backup
- , where is the state backed up and is the backed-up value
- backup means that the estimated value for state should be more like .
- In reinforcement learning
- learning
- occur on-line
- nonstationary target functions
- Methods that cannot easily handle such nonstationarity are less suitable for reinforcement learning
- target function is the true value function ,
- value prediction problem
- inputs are states
- not possible to reduce the error to zero at all states
- where is a distribution weighting the errors of different states
- important
- Better approximation at some states can be gained, generally, only at the expense of worse approximation at other states
- distribution is also usually the distribution from which the states in the training examples are drawn
- distribution of particular interest
- frequency with which states are encountered
- on-policy distribution
- best predictions
- not necessarily the best for minimizing MSE
- not yet clear what a more useful alternative goal for value prediction might be
24 more annotations...

Would you like to comment?

Join Diigo for a free account, or sign in if you are already a member.

Top Tags

no_tag

Other bookmarks from the site www.cs.ualberta.ca »

Check out another URL