Search results
(1 - 1 of 1)
- Title
- Efficient Distributed Algorithms : Better Theory and Communication Compression
- Creator
- LI, YAO
- Date
- 2022
- Collection
- Electronic Theses & Dissertations
- Description
-
Large-scale machine learning models are often trained by distributed algorithms over either centralized or decentralized networks. The former uses a central server to aggregate the information of local computing agents and broadcast the averaged parameters in a master-slave architecture. The latter considers a connected network formed by all agents. The information can only be exchanged with accessible neighbors with a mixing matrix of communication weights encoding the network's topology....
Show moreLarge-scale machine learning models are often trained by distributed algorithms over either centralized or decentralized networks. The former uses a central server to aggregate the information of local computing agents and broadcast the averaged parameters in a master-slave architecture. The latter considers a connected network formed by all agents. The information can only be exchanged with accessible neighbors with a mixing matrix of communication weights encoding the network's topology. Compared with centralized optimization, decentralization facilitates data privacy and reduces the communication burden of the single central agent due to model synchronization, but the connectivity of the communication network weakens the theoretical convergence complexity of the decentralized algorithms. Therefore, there are still gaps between decentralized and centralized algorithms in terms of convergence conditions and rates. In the first part of this dissertation, we consider two decentralized algorithms: EXTRA and NIDS, which both converge linearly with strongly convex objective functions and answer two questions regarding them. \textit{What are the optimal upper bounds for their stepsizes?} \textit{Do decentralized algorithms require more properties on the functions for linear convergence than centralized ones?} More specifically, we relax the required conditions for linear convergence of both algorithms. For EXTRA, we show that the stepsize is comparable to that of centralized algorithms. For NIDS, the upper bound of the stepsize is shown to be exactly the same as the centralized ones. In addition, we relax the requirement for the objective functions and the mixing matrices. We provide the linear convergence results for both algorithms under the weakest conditions.As the number of computing agents and the dimension of the model increase, the communication cost of parameter synchronization becomes the major obstacle to efficient learning. Communication compression techniques have exhibited great potential as an antidote to accelerate distributed machine learning by mitigating the communication bottleneck. In the rest of the dissertation, we propose compressed residual communication frameworks for both centralized and decentralized optimization and design different algorithms to achieve efficient communication. For centralized optimization, we propose DORE, a modified parallel stochastic gradient descent method with a bidirectional residual compression, to reduce over $95\%$ of the overall communication. Our theoretical analysis demonstrates that the proposed strategy has superior convergence properties for both strongly convex and nonconvex objective functions. Existing works mainly focus on smooth problems and compressing DGD-type algorithms for decentralized optimization. The class of smooth objective functions and the sublinear convergence rate under relatively strong assumptions limit these algorithms' application and practical performance. Motivated by primal-dual algorithms, we propose Prox-LEAD, a linear convergent decentralized algorithm with compression, to tackle strongly convex problems with a nonsmooth regularizer. Our theory describes the coupled dynamics of the inexact primal and dual update as well as compression error without assuming bounded gradients. The superiority of the proposed algorithm is demonstrated through the comparison with state-of-the-art algorithms in terms of convergence complexities and numerical experiments. Our algorithmic framework also generally enlightens the compressed communication on other primal-dual algorithms by reducing the impact of inexact iterations.
Show less