Regularized Behavior Value Estimation

Gulcehre, Caglar; Colmenarejo, Sergio Gómez; Wang, Ziyu; Sygnowski, Jakub; Paine, Thomas; Zolna, Konrad; Chen, Yutian; Hoffman, Matthew; Pascanu, Razvan; de Freitas, Nando

Computer Science > Machine Learning

arXiv:2103.09575 (cs)

[Submitted on 17 Mar 2021]

Title:Regularized Behavior Value Estimation

Authors:Caglar Gulcehre, Sergio Gómez Colmenarejo, Ziyu Wang, Jakub Sygnowski, Thomas Paine, Konrad Zolna, Yutian Chen, Matthew Hoffman, Razvan Pascanu, Nando de Freitas

View PDF

Abstract:Offline reinforcement learning restricts the learning process to rely only on logged-data without access to an environment. While this enables real-world applications, it also poses unique challenges. One important challenge is dealing with errors caused by the overestimation of values for state-action pairs not well-covered by the training data. Due to bootstrapping, these errors get amplified during training and can lead to divergence, thereby crippling learning. To overcome this challenge, we introduce Regularized Behavior Value Estimation (R-BVE). Unlike most approaches, which use policy improvement during training, R-BVE estimates the value of the behavior policy during training and only performs policy improvement at deployment time. Further, R-BVE uses a ranking regularisation term that favours actions in the dataset that lead to successful outcomes. We provide ample empirical evidence of R-BVE's effectiveness, including state-of-the-art performance on the RL Unplugged ATARI dataset. We also test R-BVE on new datasets, from bsuite and a challenging DeepMind Lab task, and show that R-BVE outperforms other state-of-the-art discrete control offline RL methods.

Subjects:	Machine Learning (cs.LG)
Cite as:	arXiv:2103.09575 [cs.LG]
	(or arXiv:2103.09575v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2103.09575

Submission history

From: Çağlar Gülçehre [view email]
[v1] Wed, 17 Mar 2021 11:34:54 UTC (3,844 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.LG

< prev | next >

new | recent | 2021-03

Change to browse by:

References & Citations

DBLP - CS Bibliography

listing | bibtex

Çaglar Gülçehre
Ziyu Wang
Jakub Sygnowski
Thomas Paine
Konrad Zolna

…

export BibTeX citation

Computer Science > Machine Learning

Title:Regularized Behavior Value Estimation

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Regularized Behavior Value Estimation

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators