Code Debugger: Replication

Monday, 13 September 2021

Keeping datacenters in sync with asynchronous replication using durable queue

I had often wondered what eventual consistency actually implied in the context of large-scale distributed systems. Does it mean that data would eventually be replicated achieving consistency at a later point in time or that all best attempts would be made to keep disjoint data storage systems in sync before eventually giving up leading to a quasi-consistency scheme in effect?

Over the years, having worked with AWS, third-party clouds and after conceiving a potential working scheme for consumers of such distributed cloud vendors, I was beginning to understand that the idea of eventual consistency more closely mirrored to what was described about it in theory and resembled less with the cynical view I had first carried about a best-effort, possibly vain scheme to keep data in sync.

Here is a relatively simplistic, high-level depiction about an architecture on the backend that could keep geographically-distributed, trans-continental (just indulging my artistic freedom now) data centers in sync. You can think of the asynchronous queue forwarder as a process/service that collects the writes to be made to the individual data centers albeit indirectly via queuing system that could support concurrent multiple consumers (such as Kafka or SQS), which in this case would be the data centers that would need to be kept in sync. The durability guarantees of the queue should be sufficient to ensure that the data written to would never be lost.

The queue forwarder service could be a standalone, micro-service or otherwise a daemon that would be running locally inside your application container or pod. I have more to comment on micro-service v/s daemon design so stay tuned for a follow-on post. At this point though, I don't necessarily see why one design should be better than the other.

Thursday, 24 December 2020

Replication for Beginners

Replication helps improve read throughput by boosting data redundancy even if it comes at the expense of having to manage data consistency between different replicas.

Replication strategies -

Leader based replication - write to a leader, followers catch up with leader eventually

A more durable approach is for write from client to be waiting till the write has propagated to at least one follower in addition to the leader.

Purely Synchronous replication, Most Durable- Client writes to leader and blocks until all followers have gotten a copy of the write from the leader

Semi-Synchronous replication, Less Durable than Purely synchronous but More Available - Client writes to a leader and blocks until some number of followers, “w” have obtained a copy of the write.

How does follower catch up with leader ?

When writes happen, one of the parties, client or the server (leader/ cluster manager) generates a sequence id for the write that is either auto-incrementing or corresponds to the timestamp when the write was received from the client.

In 1, we mentioned leader or cluster manager does this sequence id generation operation but there are merits to both parties discharging this responsibility. In typical distributed systems that do not have a dedicated cluster manager, leader could assume this responsibility. In others, cluster manager may be a better bet to introduce greater separation and distribution of concerns. There are challenges with both approaches, when leader generates sequence ids, the reliability of sequence id generation are more tightly dependent on the reliability of leader election and subsequent new leader catching up with the former one’s write logs. When cluster manager is introduced in a set up, it is recommended that a high-performant, costly instance is used to reduce the likelihood of this system not failing or otherwise reliability of the system needs to be improved with a secondary cluster manager that will be catching up with the primary one from time to time to maintain durability of writes at all times.

Going back to 1, after leader has received a write, it typically writes it to a write log (Similar to a write log in databases) first before writing it to its own index and subsequently to the disk. Note that individual implementations may differ based on the challenges involved in keeping writes durable in its environment or application scenarios. For example, some systems may treat a write to leader as completed immediately after a memory-based index has been written to while others may wait until the memory-based index write has also been written to a storage system or write block on disk.

Once the write has completed on the leader, the writes happen to followers at a very simplistic level, based on a push or pull model.

In Push-based replication models, leader typically writes to a durable and robust queue that guarantees writes to it are never lost and that reads from it also never fail to deliver to any of the consumers registered to it as subscribers. If the robustness of a separate queue aren’t needed, writes can also propagate from the leader to followers in a peering-based or star-based model or ring-based model. Note that replication from leader to followers themselves may be architected in a multitude of different ways based on the specific application scenarios and conditions.

In Pull-based replication models, followers could poll the leader for all writes that happened on the leader since a sequence-id passed to it by the follower and the changes received from the leader may be applied to the followers’ own indexes and storage systems.