Efficient Device Scheduling with Multi-Job Federated Learning

Zhou, Chendi; Liu, Ji; Jia, Juncheng; Zhou, Jingbo; Zhou, Yang; Dai, Huaiyu; Dou, Dejing

Computer Science > Distributed, Parallel, and Cluster Computing

arXiv:2112.05928 (cs)

[Submitted on 11 Dec 2021 (v1), last revised 15 Dec 2021 (this version, v2)]

Title:Efficient Device Scheduling with Multi-Job Federated Learning

Authors:Chendi Zhou, Ji Liu, Juncheng Jia, Jingbo Zhou, Yang Zhou, Huaiyu Dai, Dejing Dou

View PDF

Abstract:Recent years have witnessed a large amount of decentralized data in multiple (edge) devices of end-users, while the aggregation of the decentralized data remains difficult for machine learning jobs due to laws or regulations. Federated Learning (FL) emerges as an effective approach to handling decentralized data without sharing the sensitive raw data, while collaboratively training global machine learning models. The servers in FL need to select (and schedule) devices during the training process. However, the scheduling of devices for multiple jobs with FL remains a critical and open problem. In this paper, we propose a novel multi-job FL framework to enable the parallel training process of multiple jobs. The framework consists of a system model and two scheduling methods. In the system model, we propose a parallel training process of multiple jobs, and construct a cost model based on the training time and the data fairness of various devices during the training process of diverse jobs. We propose a reinforcement learning-based method and a Bayesian optimization-based method to schedule devices for multiple jobs while minimizing the cost. We conduct extensive experimentation with multiple jobs and datasets. The experimental results show that our proposed approaches significantly outperform baseline approaches in terms of training time (up to 8.67 times faster) and accuracy (up to 44.6% higher).

Comments:	14 pages, 7 figures, 6 tables
Subjects:	Distributed, Parallel, and Cluster Computing (cs.DC); Artificial Intelligence (cs.AI); Systems and Control (eess.SY)
Cite as:	arXiv:2112.05928 [cs.DC]
	(or arXiv:2112.05928v2 [cs.DC] for this version)
	https://doi.org/10.48550/arXiv.2112.05928

Submission history

From: Ji Liu [view email]
[v1] Sat, 11 Dec 2021 08:05:11 UTC (899 KB)
[v2] Wed, 15 Dec 2021 11:40:35 UTC (897 KB)

Computer Science > Distributed, Parallel, and Cluster Computing

Title:Efficient Device Scheduling with Multi-Job Federated Learning

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Distributed, Parallel, and Cluster Computing

Title:Efficient Device Scheduling with Multi-Job Federated Learning

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators