Meta-Learning with Warped Gradient Descent

Flennerhag, Sebastian; Rusu, Andrei A.; Pascanu, Razvan; Visin, Francesco; Yin, Hujun; Hadsell, Raia

Computer Science > Machine Learning

arXiv:1909.00025 (cs)

[Submitted on 30 Aug 2019 (v1), last revised 18 Feb 2020 (this version, v2)]

Title:Meta-Learning with Warped Gradient Descent

Authors:Sebastian Flennerhag, Andrei A. Rusu, Razvan Pascanu, Francesco Visin, Hujun Yin, Raia Hadsell

View PDF

Abstract:Learning an efficient update rule from data that promotes rapid learning of new tasks from the same distribution remains an open problem in meta-learning. Typically, previous works have approached this issue either by attempting to train a neural network that directly produces updates or by attempting to learn better initialisations or scaling factors for a gradient-based update rule. Both of these approaches pose challenges. On one hand, directly producing an update forgoes a useful inductive bias and can easily lead to non-converging behaviour. On the other hand, approaches that try to control a gradient-based update rule typically resort to computing gradients through the learning process to obtain their meta-gradients, leading to methods that can not scale beyond few-shot task adaptation. In this work, we propose Warped Gradient Descent (WarpGrad), a method that intersects these approaches to mitigate their limitations. WarpGrad meta-learns an efficiently parameterised preconditioning matrix that facilitates gradient descent across the task distribution. Preconditioning arises by interleaving non-linear layers, referred to as warp-layers, between the layers of a task-learner. Warp-layers are meta-learned without backpropagating through the task training process in a manner similar to methods that learn to directly produce updates. WarpGrad is computationally efficient, easy to implement, and can scale to arbitrarily large meta-learning problems. We provide a geometrical interpretation of the approach and evaluate its effectiveness in a variety of settings, including few-shot, standard supervised, continual and reinforcement learning.

Comments:	28 pages, 13 figures, 3 tables. Published as a conference paper at ICLR 2020
Subjects:	Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
Cite as:	arXiv:1909.00025 [cs.LG]
	(or arXiv:1909.00025v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.1909.00025

Submission history

From: Sebastian Flennerhag [view email]
[v1] Fri, 30 Aug 2019 18:27:35 UTC (2,177 KB)
[v2] Tue, 18 Feb 2020 08:57:58 UTC (2,178 KB)

Computer Science > Machine Learning

Title:Meta-Learning with Warped Gradient Descent

Submission history

Access Paper:

References & Citations

1 blog link

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Meta-Learning with Warped Gradient Descent

Submission history

Access Paper:

References & Citations

1 blog link

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators