Reducing Policy Degradation in Neuro-Dynamic Programming
本文档由 纺织服装文库 分享于2010-10-07 10:12
We focus on neuro-dynamic programming methods to learn state-action value functions and outline some of the inherent problems to be faced, when per- forming reinforcement learning in combination with function approximation. In an attempt to overcome some of these problems, we develop a reinforcement learning method that monitors the learning process, enables the learner to re..
君,已阅读到文档的结尾了呢~~