Distributed Training with Heterogeneous Data: Bridging Median- and Mean-Based Algorithms

Chen, Xiangyi; Chen, Tiancong; Sun, Haoran; Wu, Zhiwei Steven; Hong, Mingyi

Computer Science > Machine Learning

arXiv:1906.01736 (cs)

[Submitted on 4 Jun 2019 (v1), last revised 6 Jun 2019 (this version, v2)]

Title:Distributed Training with Heterogeneous Data: Bridging Median- and Mean-Based Algorithms

Authors:Xiangyi Chen, Tiancong Chen, Haoran Sun, Zhiwei Steven Wu, Mingyi Hong

View PDF

Abstract:Recently, there is a growing interest in the study of median-based algorithms for distributed non-convex optimization. Two prominent such algorithms include signSGD with majority vote, an effective approach for communication reduction via 1-bit compression on the local gradients, and medianSGD, an algorithm recently proposed to ensure robustness against Byzantine workers. The convergence analyses for these algorithms critically rely on the assumption that all the distributed data are drawn iid from the same distribution. However, in applications such as Federated Learning, the data across different nodes or machines can be inherently heterogeneous, which violates such an iid assumption. This work analyzes signSGD and medianSGD in distributed settings with heterogeneous data. We show that these algorithms are non-convergent whenever there is some disparity between the expected median and mean over the local gradients. To overcome this gap, we provide a novel gradient correction mechanism that perturbs the local gradients with noise, together with a series results that provable close the gap between mean and median of the gradients. The proposed methods largely preserve nice properties of these methods, such as the low per-iteration communication complexity of signSGD, and further enjoy global convergence to stationary solutions. Our perturbation technique can be of independent interest when one wishes to estimate mean through a median estimator.

Subjects:	Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC); Optimization and Control (math.OC); Machine Learning (stat.ML)
Cite as:	arXiv:1906.01736 [cs.LG]
	(or arXiv:1906.01736v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.1906.01736

Submission history

From: Xiangyi Chen [view email]
[v1] Tue, 4 Jun 2019 21:48:50 UTC (3,538 KB)
[v2] Thu, 6 Jun 2019 08:04:54 UTC (3,538 KB)

Computer Science > Machine Learning

Title:Distributed Training with Heterogeneous Data: Bridging Median- and Mean-Based Algorithms

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Distributed Training with Heterogeneous Data: Bridging Median- and Mean-Based Algorithms

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators