Certifying Safety when Implementing Consensus
Ensuring the correctness of distributed system implementations remains a challenging and largely unaddressed problem. In this paper we present a protocol that can be used to certify the safety of consensus implementations. Our proposed protocol is efficient both in terms of the number of additional messages sent and their size, and is designed to operate correctly in the presence of n-1 nodes failing in an n node distributed system (assuming fail-stop failures). We also comment on how our construction might be generalized to certify other protocols and invariants.
READ FULL TEXT