Adaptive Trade-Offs in Off-Policy Learning

Rowland, Mark; Dabney, Will; Munos, Rémi

Computer Science > Machine Learning

arXiv:1910.07478 (cs)

[Submitted on 16 Oct 2019 (v1), last revised 30 Jul 2020 (this version, v2)]

Title:Adaptive Trade-Offs in Off-Policy Learning

Authors:Mark Rowland, Will Dabney, Rémi Munos

View PDF

Abstract:A great variety of off-policy learning algorithms exist in the literature, and new breakthroughs in this area continue to be made, improving theoretical understanding and yielding state-of-the-art reinforcement learning algorithms. In this paper, we take a unifying view of this space of algorithms, and consider their trade-offs of three fundamental quantities: update variance, fixed-point bias, and contraction rate. This leads to new perspectives of existing methods, and also naturally yields novel algorithms for off-policy evaluation and control. We develop one such algorithm, C-trace, demonstrating that it is able to more efficiently make these trade-offs than existing methods in use, and that it can be scaled to yield state-of-the-art performance in large-scale environments.

Comments:	AISTATS 2020 camera-ready version
Subjects:	Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as:	arXiv:1910.07478 [cs.LG]
	(or arXiv:1910.07478v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.1910.07478

Submission history

From: Mark Rowland [view email]
[v1] Wed, 16 Oct 2019 17:09:19 UTC (8,552 KB)
[v2] Thu, 30 Jul 2020 11:24:06 UTC (10,399 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.LG

< prev | next >

new | recent | 2019-10

Change to browse by:

cs
stat
stat.ML

References & Citations

DBLP - CS Bibliography

listing | bibtex

Mark Rowland
Will Dabney
Rémi Munos

export BibTeX citation

Computer Science > Machine Learning

Title:Adaptive Trade-Offs in Off-Policy Learning

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Adaptive Trade-Offs in Off-Policy Learning

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators