没有合适的资源?快使用搜索试试~ 我知道了~
强化学习导论(Reinforcement Learning)
需积分: 50 93 下载量 77 浏览量
2016-05-26
21:15:02
上传
评论
收藏 5.45MB PDF 举报
温馨提示
Reinforcement Learning:An Introduction 强化学习经典入门教程
资源推荐
资源详情
资源评论
Book
Next: Contents Contents
Reinforcement Learning:
An Introduction
Richard S. Sutton and Andrew G. Barto
A Bradford Book
The MIT Press
Cambridge, Massachusetts
London, England
In memory of A. Harry Klopf
● Contents
❍ Preface
❍ Series Forward
❍ Summary of Notation
● I. The Problem
❍ 1. Introduction
■ 1.1 Reinforcement Learning
http://www.cs.ualberta.ca/%7Esutton/book/ebook/the-book.html (1 di 4)22/06/2005 9.04.27
Book
■ 1.2 Examples
■ 1.3 Elements of Reinforcement Learning
■ 1.4 An Extended Example: Tic-Tac-Toe
■ 1.5 Summary
■ 1.6 History of Reinforcement Learning
■ 1.7 Bibliographical Remarks
❍ 2. Evaluative Feedback
■ 2.1 An -Armed Bandit Problem
■ 2.2 Action-Value Methods
■ 2.3 Softmax Action Selection
■ 2.4 Evaluation Versus Instruction
■ 2.5 Incremental Implementation
■ 2.6 Tracking a Nonstationary Problem
■ 2.7 Optimistic Initial Values
■ 2.8 Reinforcement Comparison
■ 2.9 Pursuit Methods
■ 2.10 Associative Search
■ 2.11 Conclusions
■ 2.12 Bibliographical and Historical Remarks
❍ 3. The Reinforcement Learning Problem
■ 3.1 The Agent-Environment Interface
■ 3.2 Goals and Rewards
■ 3.3 Returns
■ 3.4 Unified Notation for Episodic and Continuing Tasks
■ 3.5 The Markov Property
■ 3.6 Markov Decision Processes
■ 3.7 Value Functions
■ 3.8 Optimal Value Functions
■ 3.9 Optimality and Approximation
■ 3.10 Summary
■ 3.11 Bibliographical and Historical Remarks
● II. Elementary Solution Methods
❍ 4. Dynamic Programming
■ 4.1 Policy Evaluation
■ 4.2 Policy Improvement
■ 4.3 Policy Iteration
■ 4.4 Value Iteration
■ 4.5 Asynchronous Dynamic Programming
■ 4.6 Generalized Policy Iteration
■ 4.7 Efficiency of Dynamic Programming
http://www.cs.ualberta.ca/%7Esutton/book/ebook/the-book.html (2 di 4)22/06/2005 9.04.27
Book
■ 4.8 Summary
■ 4.9 Bibliographical and Historical Remarks
❍ 5. Monte Carlo Methods
■ 5.1 Monte Carlo Policy Evaluation
■ 5.2 Monte Carlo Estimation of Action Values
■ 5.3 Monte Carlo Control
■ 5.4 On-Policy Monte Carlo Control
■ 5.5 Evaluating One Policy While Following Another
■ 5.6 Off-Policy Monte Carlo Control
■ 5.7 Incremental Implementation
■ 5.8 Summary
■ 5.9 Bibliographical and Historical Remarks
❍ 6. Temporal-Difference Learning
■ 6.1 TD Prediction
■ 6.2 Advantages of TD Prediction Methods
■ 6.3 Optimality of TD(0)
■ 6.4 Sarsa: On-Policy TD Control
■ 6.5 Q-Learning: Off-Policy TD Control
■ 6.6 Actor-Critic Methods
■ 6.7 R-Learning for Undiscounted Continuing Tasks
■ 6.8 Games, Afterstates, and Other Special Cases
■ 6.9 Summary
■ 6.10 Bibliographical and Historical Remarks
● III. A Unified View
❍ 7. Eligibility Traces
■ 7.1 -Step TD Prediction
■ 7.2 The Forward View of TD( )
■ 7.3 The Backward View of TD( )
■ 7.4 Equivalence of Forward and Backward Views
■ 7.5 Sarsa( )
■ 7.6 Q( )
■ 7.7 Eligibility Traces for Actor-Critic Methods
■ 7.8 Replacing Traces
■ 7.9 Implementation Issues
■ 7.10 Variable
■ 7.11 Conclusions
■ 7.12 Bibliographical and Historical Remarks
❍ 8. Generalization and Function Approximation
■ 8.1 Value Prediction with Function Approximation
■ 8.2 Gradient-Descent Methods
http://www.cs.ualberta.ca/%7Esutton/book/ebook/the-book.html (3 di 4)22/06/2005 9.04.27
Book
■ 8.3 Linear Methods
■ 8.3.1 Coarse Coding
■ 8.3.2 Tile Coding
■ 8.3.3 Radial Basis Functions
■ 8.3.4 Kanerva Coding
■ 8.4 Control with Function Approximation
■ 8.5 Off-Policy Bootstrapping
■ 8.6 Should We Bootstrap?
■ 8.7 Summary
■ 8.8 Bibliographical and Historical Remarks
❍ 9. Planning and Learning
■ 9.1 Models and Planning
■ 9.2 Integrating Planning, Acting, and Learning
■ 9.3 When the Model Is Wrong
■ 9.4 Prioritized Sweeping
■ 9.5 Full vs. Sample Backups
■ 9.6 Trajectory Sampling
■ 9.7 Heuristic Search
■ 9.8 Summary
■ 9.9 Bibliographical and Historical Remarks
❍ 10. Dimensions of Reinforcement Learning
■ 10.1 The Unified View
■ 10.2 Other Frontier Dimensions
❍ 11. Case Studies
■ 11.1 TD-Gammon
■ 11.2 Samuel's Checkers Player
■ 11.3 The Acrobot
■ 11.4 Elevator Dispatching
■ 11.5 Dynamic Channel Allocation
■ 11.6 Job-Shop Scheduling
● Bibliography
❍ Index
Mark Lee 2005-01-04
http://www.cs.ualberta.ca/%7Esutton/book/ebook/the-book.html (4 di 4)22/06/2005 9.04.27
Contents
Next: Preface Up: Book Previous: Book
Contents
● I. The Problem
❍ 1. Introduction
❍ 2. Evaluative Feedback
❍ 3. The Reinforcement Learning Problem
● II. Elementary Solution Methods
❍ 4. Dynamic Programming
❍ 5. Monte Carlo Methods
❍ 6. Temporal-Difference Learning
● III. A Unified View
❍ 7. Eligibility Traces
❍ 8. Generalization and Function Approximation
❍ 9. Planning and Learning
❍ 10. Dimensions of Reinforcement Learning
❍ 11. Case Studies
● Bibliography
Subsections
❍ Preface
❍ Series Forward
❍ Summary of Notation
Mark Lee 2005-01-04
http://www.cs.ualberta.ca/%7Esutton/book/ebook/node1.html22/06/2005 9.04.31
剩余397页未读,继续阅读
资源评论
rtygbwwwerr
- 粉丝: 184
- 资源: 1
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- HTML5实现好看的清洁服务公司网站模板.zip
- HTML5实现好看的墙壁粉刷公司网站源码.zip
- HTML5实现好看的清爽创意家居网站源码.zip
- HTML5实现好看的清爽大屏饼干制作网站源码.zip
- HTML5实现好看的清爽家政公司网站源码.zip
- HTML5实现好看的清新的教育机构网站源码.zip
- 重庆邮电大学信号处理实验三
- WINCC的SQL应用,无需修改任何源码, 导入变量即可自动生成配方报表 配方报表,vbs应用,配方应用 学习利器,可供有需要学习的朋友学习, 源码公开, 配合SQLSERVER使用
- 基于卷积神经网络(CNN)的手写数字识别 matlab代码,要求2018版本及以上
- 重庆邮电大学信号处理实验四代码
- 基于SSM框架的家庭健康管理系统+Java、HTML+家庭健康管理、健康指标管理
- 基于c代码的空间电压矢量svpwm算法simulink仿真: 1.svpwm的c代码为实际工程中使用和验证过,代码简洁,注释详细; 2.采用7段式svpwm,有过调机制处理; 3.送svpwm原理详
- fpga sata 2.0 3.0源码,纯verilog代码,根据不同的平台,支持gtx gth gty平台
- 堆垛机西门子PLC程序+输送线程序 物流仓储 涵盖通信,算法,运动控制,屏幕程序,可电脑仿真测试 实际项目完整程序 西门子S7-1200+G120+劳易测激光测距 博途V15.1编程 采用SC
- 基于SSM框架的家庭健康管理系统论文+Java、SSM、MySQL+健康管理、指标管理
- carsim与simulink联合仿真的线控转向系统
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功