Skip to main content

Close
Get the best research tool on the web today,and free!
Connect with people with common interests!

saved by2 people, first byMoe Mauch on 2006-11-16, last bypetrknoth on 2008-01-31

  • value function depends totally on ,
  • number of parameters (the number of
    components of ) is much less than the number of states, and changing one
    parameter changes the estimated value of many states
  • neural network and statistical methods all
    assume a static training set over which multiple passes are
    made. In reinforcement learning, however, it is important that
    learning be able to occur on-line, while interacting with the
    environment or with a model of the environment.
  • nonstationary target functions (target functions that change over time)
  • inputs are states and the target
    function is the true value function ,
  • Better approximation at some states can be
    gained, generally, only at the expense of worse approximation at other states.
  • The most sophisticated neural network and statistical methods all
    assume a static training set over which multiple passes are
    made. In reinforcement learning, however, it is important that
    learning be able to occur on-line, while interacting with the
    environment or with a model of the environment.
  • on-policy distribution
  • Stronger convergence results are available
    for the on-policy distribution than for other distributions
  • An ideal goal in terms of MSE would be to find a global optimum, a
    parameter vector for which for all possible
    . Reaching this goal is sometimes possible for simple function
    approximators such as linear ones, but is rarely possible for complex function
    approximators such as artificial neural networks and decision trees.
  • omplex function approximators may seek to converge instead to a local
    optimum
  • memory-based and
    decision-tree methods.