Sunday, 23 October 2016

Key insight- Consistent hashing

I had glossed over the overview of Consistent Hashing many times in the past, and as much as I knew it was a highly important distributed system hashing technique, I never really appreciated the myriad of applications that it underlay.

Today, I started reading Facebook's paper on scaling memcached for supporting its highly scalable and distributed system architecture. I read the term Consistent Hashing once again and this time, I took the onus to read it out in depth to understand the specifics as much as the high level overview I had already read many times in the past. To those of you reading this, I will try to keep it as technical as possible with key mathematical insights from related sources I have already read. Additionally, I will provide references to the original posts as always so that you can catch a glimpse of the material from the biblical sources themselves. Let's get started now.

Most of you know that naive hashing techniques hash a key, say 'k' to a bucketing system with 'n' buckets, 0,1,2,3,...,n-1 using the formula:

hash(k) mod (n)

While this is a good, simple technique for single system, in-memory hashtables, it doesn't quite scale well in multiple node, distributed system architectures. Let me illustrate with an example.

Suppose you add a new bucket, 'n' to the system. You now have n+1 buckets in your system and this requires you to re-hash each of your already present K keys using the new formula:

hash(k) mod (n+1)

Mathematically speaking, this will move n/(n+1) keys in your existing system to a new bucket. This actually involves an awful lot of network transfers and the communication overhead involved is significantly high for most real-time production applications. When you think about it, if this kind of addition of buckets or removal of buckets happens frequently enough, as is the case with most distributed systems that require frequent horizontal scaling in the form of addition of new nodes or removal of existing nodes in the event of node failures, such re-hashing and consequent movement of keys to new buckets has the potential to bring down whole back-end content servers by bombarding them with great number of write requests within a short interval of time.

Such was the need to revisit hashing from a new perspective and, in 1997, David Karger published a new paper on Consistent Hashing which still retains wide applicability in most real-time, massive, distributed hashing systems today.

With consistent hashing, the idea is to keep hash value of the key nearly the same and largely independent of the number of buckets/nodes in your distributed system. The simplest and earliest implementations of consistent hashing hashed both keys and buckets using the same hash function that ensured both were normalized to a [0,1) interval.  

Suppose hash of your key is 0.6 and you have you have three buckets with hash values, 0.2, 0.5 and 0.8, respectively. You pick the bucket with hash value that is closest to the hash value of your key in a clockwise direction, i.e., 0.8. Let me quickly illustrate this idea:

In a clockwise direction starting at 0 corresponding to 00:00 hours, the above hash values appear in order, 0.2, 0.5, 0.6 and 0.8 respectively. This would map your key in bucket corresponding to hash value 0.8 as you pick the nearest bucket in the clockwise direction.

If you have understood so much, it can be anticipated with reasonable common sense that you will be concerned by many apparent limitations with this approach, more particularly, concerns about its rather non-uniform mapping from keys to buckets. This is understandable and valid and you will see it being addressed shortly, later in this post but for the time being, I will continue to present some key insights about this approach that will make it seemingly more attractive to the naive hashing technique that was discussed earlier. In this technique, if, say, a new bucket, n is added to the system that has n buckets, 1,2,3,4,....,n-1, only 1/(n+1) keys need to be moved to the new node while the remaining keys continue to stay mapped to their original buckets.

Let me make sure you understand this with an illustration. Any time you add a new bucket to this system, it is trivial to note that only a subset of the keys from the bucket preceding this one in the clockwise direction are going to be mapped to it. As a result, it is rather straight-forward and simple to derive the potential set of keys that require movement as a result of the underlying change in the hashing system itself.

Now, I will proceed further to discuss how a more sophisticated variant of consistent hashing addresses the issue concerning the non-uniform key distribution among the existing buckets. This new design is called consistent hashing with virtual nodes.

The idea behind this new approach is to assign to each bucket, a set of disjoint key ranges or partitions as they are usually called as opposed to a single one as we see in the primordial implementation above. To visualize this partition assignment to buckets, check this link on how consistent hashing is implemented in Cassandra, Cassandra Consistent Hashing.

As you can see it's relatively simple to view this organization as a pre-configured mapping from key partitions to nodes themselves. Such table-like mapping may be conceived at application start up and stored for lookup later in cluster leaders, node managers, master nodes or such other distributed system entities that are entrusted with the responsibility to manage a set of machines. 

I hope this post managed to educate you about consistent hashing or otherwise at least, arouse an interest to learn more about hashing systems in general. 

Related sources:






Sunday, 9 October 2016

Enumerate Primes from Elements of programming interviews

I read the solution to this problem about a year back for the first time and while one time was enough to come to terms with the reality of its ingenuity, one time was never enough to appreciate how it really works.

I would have solved it roughly 5 times over a span of a year and today, I note how a subtle idea can come between you thinking you got it right and actually getting it right. When it comes to problem solving, there are questions you cannot solve, you can solve and you think you can solve.

This is likely to be one of those problems you believe you can solve and you also get the output on your console which if not noted carefully and not rigorously tested would pass of as right and which only when seen through the lenses of authors of the great book like EPI would show as invalid for the first time.

Aziz, Lee and Prakash state that once a number has been identified as prime, it suffices to mark as non-prime, all of its multiples starting its square. The number, say i itself is in index, (i - 3)/2 of a lookup list, isPrime. In other words, if index is i, the value at index i would be 2 * i +3. Now, the square of (2 * i + 3) would be  (4 * i * i + 12 * i + 9) and this square would be present in (4 * i * i + 12 * i + 9 - 3) /2= 2 * i * i + 6 * i + 3

Trust me the genius of these 3 authors would preclude them from providing these step-by-step explanations but it took me good time to decode all these identities for myself.

Now, just tell yourself time and again that the lookup list isPrime only indicates if an odd number starting with 3 is prime because even numbers after 2 are never prime.

if(isPrime.get(i) == Boolean.TRUE) {
/* add number at index i to prime numbers result */
int p  = 2 * i +  3;
primes.add(p);

for (long j = (2 * i * i + 6 * i + 3) ; j  < size ; j += p)  //How obvious is this?{
isPrime.set((int) j , Boolean.FALSE);
}
}

If you read my comment on the code snippet, that's exactly the little subtlety which comes between the way of your getting this one and thinking you got this one.

Interestingly enough, this works. What he is implying is that starting with the index of the prime number square, it suffices to mark all indices at steps of the prime number itself and this works.

E.g. p = 3
p^2 = 9

p_i = 0
p^2_i = 3
p^2_i  + 3 = 6 corresponds to 2 * 6 + 3 = 15
p^2_i + 3 + 3 = 9 corresponds to 2 * 9 + 3 = 21

If you see carefully, he is wisely avoiding all even multiples between 9 and 15 and 15 and 21 and so on. If you test this identity for any other prime, you will see the brilliance of this idea.


  

Thursday, 6 October 2016

Python safeguards


  • Never create a resource with open() or with open() inside run method of a worker thread. Create a resource in the main thread and pass the reference to the resource to the worker instead.
  • with open() is better than open() for writers because it closes the writer automatically when the control exits its scope. This flushes all write buffer to the disk.
  • If open() is used for write(), explicitly call f.close() or f.flush() as you would normally do in Java.