Truly Proximal Policy Optimization

Wang, Yuhui; He, Hao; Wen, Chao; Tan, Xiaoyang

Computer Science > Machine Learning

arXiv:1903.07940 (cs)

[Submitted on 19 Mar 2019 (v1), last revised 14 Jan 2020 (this version, v2)]

Title:Truly Proximal Policy Optimization

Authors:Yuhui Wang, Hao He, Chao Wen, Xiaoyang Tan

View PDF

Abstract:Proximal policy optimization (PPO) is one of the most successful deep reinforcement-learning methods, achieving state-of-the-art performance across a wide range of challenging tasks. However, its optimization behavior is still far from being fully understood. In this paper, we show that PPO could neither strictly restrict the likelihood ratio as it attempts to do nor enforce a well-defined trust region constraint, which means that it may still suffer from the risk of performance instability. To address this issue, we present an enhanced PPO method, named Truly PPO. Two critical improvements are made in our method: 1) it adopts a new clipping function to support a rollback behavior to restrict the difference between the new policy and the old one; 2) the triggering condition for clipping is replaced with a trust region-based one, such that optimizing the resulted surrogate objective function provides guaranteed monotonic improvement of the ultimate policy performance. It seems, by adhering more truly to making the algorithm proximal - confining the policy within the trust region, the new algorithm improves the original PPO on both sample efficiency and performance.

Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Cite as:	arXiv:1903.07940 [cs.LG]
	(or arXiv:1903.07940v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.1903.07940

Submission history

From: Yuhui Wang [view email]
[v1] Tue, 19 Mar 2019 11:18:29 UTC (7,807 KB)
[v2] Tue, 14 Jan 2020 03:59:49 UTC (9,190 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.LG

< prev | next >

new | recent | 2019-03

Change to browse by:

cs
cs.AI
stat
stat.ML

References & Citations

DBLP - CS Bibliography

listing | bibtex

Yuhui Wang
Hao He
Xiaoyang Tan

export BibTeX citation

Computer Science > Machine Learning

Title:Truly Proximal Policy Optimization

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Truly Proximal Policy Optimization

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators