Reducing Policy Degradation in Neuro-Dynamic Programming

本文档由 纺织服装文库 分享于2010-10-07 10:12

We focus on neuro-dynamic programming methods to learn state-action value functions and outline some of the inherent problems to be faced, when per- forming reinforcement learning in combination with function approximation. In an attempt to overcome some of these problems, we develop a reinforcement learning method that monitors the learning process, enables the learner to re..
文档格式:
.pdf
文档大小:
85.76K
文档页数:
6
顶 /踩数:
0 0
收藏人数:
0
评论次数:
0
文档热度:
文档分类:
IT计算机  —  软件工程
添加到豆单
文档标签:
reinforcement learning value function process state-action approximation Policy Degradation NDP
系统标签:
neuro policy dynamic programming degradation reducing
下载文档
收藏
打印

扫扫二维码,随身浏览文档

手机或平板扫扫即可继续访问

推荐豆丁书房APP  

获取二维码

分享文档

将文档分享至:
分享完整地址
文档地址: 复制
粘贴到BBS或博客
flash地址: 复制

支持嵌入FLASH地址的网站使用

html代码: 复制

默认尺寸450px*300px480px*400px650px*490px

支持嵌入HTML代码的网站使用

分享到