ReinforcementLearning-TheoryandAlgorithm.pdf资源-CSDN文库

强化学习

需积分: 50 106 浏览量 2021-02-04 12:25:40 上传评论收藏 652KB PDF 举报

资源推荐

资源详情

资源评论

Reinforcement Learning:

Theory and Algorithms

Alekh Agarwal Nan Jiang Sham M. Kakade

October 27, 2019

WORKING DRAFT: Text not yet at the level of publication.

4 Policy Gradient Methods 33

4.1 The Policy Gradient Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

4.1.1 Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

4.2 The Softmax Policy and Relative Entropy Regularization . . . . . . . . . . . . . . . . . . . . . . . . 40

4.3 The Natural Policy Gradient . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

4.3.1 Global Convergence and the Softmax Policy Class . . . . . . . . . . . . . . . . . . . . . . . 43

4.3.2 Function Approximation and a Connection to Transfer Learning . . . . . . . . . . . . . . . . 46

4.4 Related algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

4.4.1 Trust Region Policy Optimization (TRPO) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

4.4.2 Proximal Policy Optimization (PPO) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

4.4.3 Conservative Policy Iteration (CPI) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

4.5 Bibliographic Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

5 Value Function Approximation 53

5.1 Approximate Policy Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

5.2 Approximate Policy Improvement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

5.2.1 Greedy policy improvement with `

∞

approximation . . . . . . . . . . . . . . . . . . . . . . 59

5.2.2 Conservative Policy Iteration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

6 Strategic Exploration in RL with rich observations 65

6.1 Problem setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

6.2 Value-function approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

6.3 Bellman Rank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

6.4 Sample-efﬁcient learning for CDPs with a small Bellman rank . . . . . . . . . . . . . . . . . . . . . 70

7 Behavioral Cloning and Apprenticeship Learning 73

7.1 Linear Programming Formulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

7.1.1 The Primal LP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

7.1.2 The Dual LP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

7.2 Behavioral Cloning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

7.2.1 Behavioral Cloning via Supervised Learning . . . . . . . . . . . . . . . . . . . . . . . . . . 76

7.2.2 Behavioral Cloning via Distribution Matching . . . . . . . . . . . . . . . . . . . . . . . . . . 77

7.2.3 Sample Efﬁciency: comparing the approaches . . . . . . . . . . . . . . . . . . . . . . . . . . 78

剩余82页未读，继续阅读

评论收藏

内容反馈

teresa_lin

粉丝: 489
资源: 5

Reinforcement Learning-Theory and Algorithm.pdf

最新资源

Reinforcement Learning-Theory and Algorithm.pdf

Reinforcement Learning.pdf

An Introduction to Reinforcement Learning.pdf

Algorithm Theory.pdf

Reinforcement Learning An Introduction.pdf

Reinforcement Learning：An Introduction.pdf

Machine Learning - Tom Mitchell

Hands-On Machine Learning with Scikit-Learn and TensorFlow: Concepts, Tools, and

Hands-On Machine Learning with Scikit-Learn and TensorFlow [EPUB]

Hands-On Machine Learning with Scikit-Learn and TensorFlow [Kindle Edition]

Hands-On Machine Learning with Scikit-Learn and TensorFlow (epub)

An_introduction_to_Reinforcement_Learning.pdf

Recent Advances in Reinforcement Learning Theory.pdf

A Tour of Reinforcement Learning.pdf

Bishop Pattern Recognition and Machine Learning

斯坦福大学-机器学习公开课课件.rar

Pattern Recognition and Machine Learning (Bishop)

Pattern Recogintion and Machine Learning

An Introduction to Deep Reinforcement Learning.pdf

Reinforcement Learning An Introduction2019.pdf.zip

Algorithm-Hierarchical-Meta-Reinforcement-Learning.zip

Recurrent Reinforcement Learning Algorithm Matlab Implementation

An Introduction to Machine Learning, 2nd Edition

论文研究-基于排队模型和强化学习的动态云任务调度算法 .pdf

浙江大学人工智能课程课件

最新资源