没有合适的资源?快使用搜索试试~ 我知道了~
温馨提示
介绍深度强化学习的教材,非常实用。摘要:Deep reinforcement learning is the combination of reinforce- ment learning (RL) and deep learning. This field of research has been able to solve a wide range of complex decision- making tasks that were previously out of reach for a machine. Thus, deep RL opens up many new applications in domains such as healthcare, robotics, smart grids, finance, and many more. This manuscript provides an introduction to deep reinforcement learning models, algorithms and techniques. Particular focus is on the aspects related to generalization and how deep RL can be used for practical applications. We assume the reader is familiar with basic machine learning concepts.
资源推荐
资源详情
资源评论
An Introduction to Deep
Reinforcement Learning
Vincent François-Lavet, Peter Henderson, Riashat Islam, Marc G. Bellemare and Joelle
Pineau (2018), “An Introduction to Deep Reinforcement Learning”, Foundations and
Trends in Machine Learning: Vol. 11, No. 3-4. DOI: 10.1561/2200000071.
Vincent François-Lavet
McGill University
vincent.francois-lavet@mcgill.ca
Peter Henderson
McGill University
peter.henderson@mail.mcgill.ca
Riashat Islam
McGill University
riashat.islam@mail.mcgill.ca
Marc G. Bellemare
Google Brain
bellemare@go ogle.com
Joelle Pineau
Faceb ook, McGill University
jpineau@cs.mcgill.ca
Boston — Delft
arXiv:1811.12560v2 [cs.LG] 3 Dec 2018
Contents
1 Introduction 2
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2 Machine learning and deep learning 6
2.1 Supervised learning and the concepts of bias and overfitting 7
2.2 Unsupervised learning . . . . . . . . . . . . . . . . . . . . 9
2.3 The deep learning approach . . . . . . . . . . . . . . . . . 10
3 Introduction to reinforcement learning 15
3.1 Formal framework . . . . . . . . . . . . . . . . . . . . . . 16
3.2 Different components to learn a policy . . . . . . . . . . . 20
3.3 Different settings to learn a policy from data . . . . . . . . 21
4 Value-based methods for deep RL 24
4.1 Q-learning . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.2 Fitted Q-learning . . . . . . . . . . . . . . . . . . . . . . 25
4.3 Deep Q-networks . . . . . . . . . . . . . . . . . . . . . . 27
4.4 Double DQN . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.5 Dueling network architecture . . . . . . . . . . . . . . . . 29
4.6 Distributional DQN . . . . . . . . . . . . . . . . . . . . . 31
4.7 Multi-step learning . . . . . . . . . . . . . . . . . . . . . . 32
4.8
Combination of all DQN improvements and variants of DQN
34
5 Policy gradient methods for deep RL 36
5.1 Stochastic Policy Gradient . . . . . . . . . . . . . . . . . 37
5.2 Deterministic Policy Gradient . . . . . . . . . . . . . . . . 39
5.3 Actor-Critic Methods . . . . . . . . . . . . . . . . . . . . 40
5.4 Natural Policy Gradients . . . . . . . . . . . . . . . . . . 42
5.5 Trust Region Optimization . . . . . . . . . . . . . . . . . 43
5.6 Combining policy gradient and Q-learning . . . . . . . . . 44
6 Model-based methods for deep RL 46
6.1 Pure model-based methods . . . . . . . . . . . . . . . . . 46
6.2 Integrating model-free and model-based methods . . . . . 49
7 The concept of generalization 53
7.1 Feature selection . . . . . . . . . . . . . . . . . . . . . . . 58
7.2
Choice of the learning algorithm and function approximator
selection . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
7.3 Modifying the objective function . . . . . . . . . . . . . . 61
7.4 Hierarchical learning . . . . . . . . . . . . . . . . . . . . . 62
7.5 How to obtain the best bias-overfitting tradeoff . . . . . . 63
8 Particular challenges in the online setting 66
8.1 Exploration/Exploitation dilemma . . . . . . . . . . . . . . 66
8.2 Managing experience replay . . . . . . . . . . . . . . . . . 71
9 Benchmarking Deep RL 73
9.1 Benchmark Environments . . . . . . . . . . . . . . . . . . 73
9.2 Best practices to benchmark deep RL . . . . . . . . . . . 78
9.3 Open-source software for Deep RL . . . . . . . . . . . . . 80
10 Deep reinforcement learning beyond MDPs 81
10.1 Partial observability and the distribution of (related) MDPs 81
10.2 Transfer learning . . . . . . . . . . . . . . . . . . . . . . . 86
10.3 Learning without explicit reward function . . . . . . . . . . 89
10.4 Multi-agent systems . . . . . . . . . . . . . . . . . . . . . 91
11 Perspectives on deep reinforcement learning 94
11.1 Successes of deep reinforcement learning . . . . . . . . . . 94
11.2
Challenges of applying reinforcement learning to real-world
problems . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
11.3 Relations between deep RL and neuroscience . . . . . . . . 96
12 Conclusion 99
12.1 Future development of deep RL . . . . . . . . . . . . . . . 99
12.2 Applications and societal impact of deep RL . . . . . . . . 100
Appendices 103
References 106
An Introduction to Deep
Reinforcement Learning
Vincent François-Lavet
1
, Peter Henderson
2
, Riashat Islam
3
, Marc
G. Bellemare
4
and Joelle Pineau
5
1
McGill University; vincent.francois-lavet@mcgill.ca
2
McGill University; peter.henderson@mail.mcgill.ca
3
McGill University; riashat.islam@mail.mcgill.ca
4
Google Brain; bellemare@google.com
5
Facebook, McGill University; jpineau@cs.mcgill.ca
ABSTRACT
Deep reinforcement learning is the combination of reinforce-
ment learning (RL) and deep learning. This field of research
has been able to solve a wide range of complex decision-
making tasks that were previously out of reach for a machine.
Thus, deep RL opens up many new applications in domains
such as healthcare, robotics, smart grids, finance, and many
more. This manuscript provides an introduction to deep
reinforcement learning models, algorithms and techniques.
Particular focus is on the aspects related to generalization
and how deep RL can be used for practical applications. We
assume the reader is familiar with basic machine learning
concepts.
剩余139页未读,继续阅读
资源评论
江南小白龙
- 粉丝: 57
- 资源: 14
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- Refrig-RefrigerationBoxLoads.exe
- AHTTv510.pdf
- TRANE.zip
- 开利PLV IPLV计算软件ECAT-PLV120.zip
- Copeland-Refrigeration-Manuals.zip
- 汽车空调制冷系统匹配设计1.pdf
- 换热器选型GreenheckCoilSelection.zip
- SystemSyzerPsychrometric .exe
- 汽车空调制冷系统匹配研究1.pdf
- 混合动力汽车空调系统研究及优化1.pdf
- 电动汽车热泵空调系统匹配特性研究1.pdf
- 基于互信息(MI)的回归数据特征选择算法 matlab代码
- 纯电动汽车动力电池与空调联合热管理仿真研究1.pdf
- 某车型汽车空调系统匹配研究1.pdf
- 汽车空调制冷系统匹配计算及研究1.pdf
- 商用车驻车空调系统性能分与实验研究1.pdf
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功