Tracing Distributed Algorithms Using Replay Clocks

In this thesis, we introduce replay clocks ($\repcl$), a novel clock infrastructure that allows us to do offline analyses of distributed computations. The replay clock structure provides a methodology to \textit{replay} a computation as it happened, with the ability to represent concurrent events effectively. It builds on the structures introduced by vector clocks ($\vc$) and the Hybrid Logical Clock ($\hlc$), combining their infrastructures to provide efficient replay. With such a clock, a user can replay a computation whilst considering multiple paths of executions, and check for constraint violations and properties that potential pathways could take, especially in the presence of concurrent events. Specifically, if event $e$ must occur before $f$ then the replay clock must ensure that $e$ is replayed before $f$. On the other hand, if $e$ and $f$ could occur in any order, replay should not force an order between them. After identifying the limitations of existing clocks to provide the replay primitive, we present the $\repcl$ structure and identify an efficient representation for the same. We demonstrate that $\repcl$ can be implemented with less than four integers for 64 processes for various system parameters if clocks are synchronized within $1ms$. Furthermore, the overhead of $\repcl$ (for computing/comparing timestamps and message size) is proportional to the size of the clock. Using simulations in a custom distributed system and NS-3, a state-of-the-art network simulator, we identify the expected overhead of $\repcl$ based on the given system settings. We also identify how a user can then identify feasibility region for $\repcl$. Specifically, given the desired overhead of $\repcl$, it identifies the region where unabridged replay is possible. Using the $\repcl$, we provide a tracer for distributed computations, that allows any computation using the $\repcl$ to be replayed efficiently. The visualization allows users to analyze specific properties and constraints in an online fashion, with the ability to consider concurrent paths independently. The visualization provides per-process views and an overarching view of the whole computation based on the time recorded by the $\repcl$ for each event.

Read

In Collections: Electronic Theses & Dissertations

Copyright Status: Attribution 4.0 International

Material Type: Theses

Authors: Lagwankar, Ishaan Kiran

Thesis Advisors: Kulkarni, Sandeep S.

Committee Members: Xiao, Li
McKinley, Phillip K.

Date Published: 2024

Subjects: Computer science

Program of Study: Computer Science - Master of Science

Degree Level: Masters

Language: English

Pages: 64 pages

Permalink: https://doi.org/doi:10.25335/cbq9-be16