Rethinking the Hyperparameters for Fine-tuning

Li, Hao; Chaudhari, Pratik; Yang, Hao; Lam, Michael; Ravichandran, Avinash; Bhotika, Rahul; Soatto, Stefano

Computer Science > Computer Vision and Pattern Recognition

arXiv:2002.11770 (cs)

[Submitted on 19 Feb 2020]

Title:Rethinking the Hyperparameters for Fine-tuning

Authors:Hao Li, Pratik Chaudhari, Hao Yang, Michael Lam, Avinash Ravichandran, Rahul Bhotika, Stefano Soatto

View PDF

Abstract:Fine-tuning from pre-trained ImageNet models has become the de-facto standard for various computer vision tasks. Current practices for fine-tuning typically involve selecting an ad-hoc choice of hyperparameters and keeping them fixed to values normally used for training from scratch. This paper re-examines several common practices of setting hyperparameters for fine-tuning. Our findings are based on extensive empirical evaluation for fine-tuning on various transfer learning benchmarks. (1) While prior works have thoroughly investigated learning rate and batch size, momentum for fine-tuning is a relatively unexplored parameter. We find that the value of momentum also affects fine-tuning performance and connect it with previous theoretical findings. (2) Optimal hyperparameters for fine-tuning, in particular, the effective learning rate, are not only dataset dependent but also sensitive to the similarity between the source domain and target domain. This is in contrast to hyperparameters for training from scratch. (3) Reference-based regularization that keeps models close to the initial model does not necessarily apply for "dissimilar" datasets. Our findings challenge common practices of fine-tuning and encourages deep learning practitioners to rethink the hyperparameters for fine-tuning.

Comments:	Published as a conference paper at ICLR 2020
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as:	arXiv:2002.11770 [cs.CV]
	(or arXiv:2002.11770v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2002.11770

Submission history

From: Hao Li [view email]
[v1] Wed, 19 Feb 2020 18:59:52 UTC (1,332 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Rethinking the Hyperparameters for Fine-tuning

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Rethinking the Hyperparameters for Fine-tuning

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators