Sunday 23 October 2016

Key insight- Consistent hashing

I had glossed over the overview of Consistent Hashing many times in the past, and as much as I knew it was a highly important distributed system hashing technique, I never really appreciated the myriad of applications that it underlay.

Today, I started reading Facebook's paper on scaling memcached for supporting its highly scalable and distributed system architecture. I read the term Consistent Hashing once again and this time, I took the onus to read it out in depth to understand the specifics as much as the high level overview I had already read many times in the past. To those of you reading this, I will try to keep it as technical as possible with key mathematical insights from related sources I have already read. Additionally, I will provide references to the original posts as always so that you can catch a glimpse of the material from the biblical sources themselves. Let's get started now.

Most of you know that naive hashing techniques hash a key, say 'k' to a bucketing system with 'n' buckets, 0,1,2,3,...,n-1 using the formula:

hash(k) mod (n)

While this is a good, simple technique for single system, in-memory hashtables, it doesn't quite scale well in multiple node, distributed system architectures. Let me illustrate with an example.

Suppose you add a new bucket, 'n' to the system. You now have n+1 buckets in your system and this requires you to re-hash each of your already present K keys using the new formula:

hash(k) mod (n+1)

Mathematically speaking, this will move n/(n+1) keys in your existing system to a new bucket. This actually involves an awful lot of network transfers and the communication overhead involved is significantly high for most real-time production applications. When you think about it, if this kind of addition of buckets or removal of buckets happens frequently enough, as is the case with most distributed systems that require frequent horizontal scaling in the form of addition of new nodes or removal of existing nodes in the event of node failures, such re-hashing and consequent movement of keys to new buckets has the potential to bring down whole back-end content servers by bombarding them with great number of write requests within a short interval of time.

Such was the need to revisit hashing from a new perspective and, in 1997, David Karger published a new paper on Consistent Hashing which still retains wide applicability in most real-time, massive, distributed hashing systems today.

With consistent hashing, the idea is to keep hash value of the key nearly the same and largely independent of the number of buckets/nodes in your distributed system. The simplest and earliest implementations of consistent hashing hashed both keys and buckets using the same hash function that ensured both were normalized to a [0,1) interval.  

Suppose hash of your key is 0.6 and you have you have three buckets with hash values, 0.2, 0.5 and 0.8, respectively. You pick the bucket with hash value that is closest to the hash value of your key in a clockwise direction, i.e., 0.8. Let me quickly illustrate this idea:

In a clockwise direction starting at 0 corresponding to 00:00 hours, the above hash values appear in order, 0.2, 0.5, 0.6 and 0.8 respectively. This would map your key in bucket corresponding to hash value 0.8 as you pick the nearest bucket in the clockwise direction.

If you have understood so much, it can be anticipated with reasonable common sense that you will be concerned by many apparent limitations with this approach, more particularly, concerns about its rather non-uniform mapping from keys to buckets. This is understandable and valid and you will see it being addressed shortly, later in this post but for the time being, I will continue to present some key insights about this approach that will make it seemingly more attractive to the naive hashing technique that was discussed earlier. In this technique, if, say, a new bucket, n is added to the system that has n buckets, 1,2,3,4,....,n-1, only 1/(n+1) keys need to be moved to the new node while the remaining keys continue to stay mapped to their original buckets.

Let me make sure you understand this with an illustration. Any time you add a new bucket to this system, it is trivial to note that only a subset of the keys from the bucket preceding this one in the clockwise direction are going to be mapped to it. As a result, it is rather straight-forward and simple to derive the potential set of keys that require movement as a result of the underlying change in the hashing system itself.

Now, I will proceed further to discuss how a more sophisticated variant of consistent hashing addresses the issue concerning the non-uniform key distribution among the existing buckets. This new design is called consistent hashing with virtual nodes.

The idea behind this new approach is to assign to each bucket, a set of disjoint key ranges or partitions as they are usually called as opposed to a single one as we see in the primordial implementation above. To visualize this partition assignment to buckets, check this link on how consistent hashing is implemented in Cassandra, Cassandra Consistent Hashing.

As you can see it's relatively simple to view this organization as a pre-configured mapping from key partitions to nodes themselves. Such table-like mapping may be conceived at application start up and stored for lookup later in cluster leaders, node managers, master nodes or such other distributed system entities that are entrusted with the responsibility to manage a set of machines. 

I hope this post managed to educate you about consistent hashing or otherwise at least, arouse an interest to learn more about hashing systems in general. 

Related sources:






Sunday 9 October 2016

Enumerate Primes from Elements of programming interviews

I read the solution to this problem about a year back for the first time and while one time was enough to come to terms with the reality of its ingenuity, one time was never enough to appreciate how it really works.

I would have solved it roughly 5 times over a span of a year and today, I note how a subtle idea can come between you thinking you got it right and actually getting it right. When it comes to problem solving, there are questions you cannot solve, you can solve and you think you can solve.

This is likely to be one of those problems you believe you can solve and you also get the output on your console which if not noted carefully and not rigorously tested would pass of as right and which only when seen through the lenses of authors of the great book like EPI would show as invalid for the first time.

Aziz, Lee and Prakash state that once a number has been identified as prime, it suffices to mark as non-prime, all of its multiples starting its square. The number, say i itself is in index, (i - 3)/2 of a lookup list, isPrime. In other words, if index is i, the value at index i would be 2 * i +3. Now, the square of (2 * i + 3) would be  (4 * i * i + 12 * i + 9) and this square would be present in (4 * i * i + 12 * i + 9 - 3) /2= 2 * i * i + 6 * i + 3

Trust me the genius of these 3 authors would preclude them from providing these step-by-step explanations but it took me good time to decode all these identities for myself.

Now, just tell yourself time and again that the lookup list isPrime only indicates if an odd number starting with 3 is prime because even numbers after 2 are never prime.

if(isPrime.get(i) == Boolean.TRUE) {
/* add number at index i to prime numbers result */
int p  = 2 * i +  3;
primes.add(p);

for (long j = (2 * i * i + 6 * i + 3) ; j  < size ; j += p)  //How obvious is this?{
isPrime.set((int) j , Boolean.FALSE);
}
}

If you read my comment on the code snippet, that's exactly the little subtlety which comes between the way of your getting this one and thinking you got this one.

Interestingly enough, this works. What he is implying is that starting with the index of the prime number square, it suffices to mark all indices at steps of the prime number itself and this works.

E.g. p = 3
p^2 = 9

p_i = 0
p^2_i = 3
p^2_i  + 3 = 6 corresponds to 2 * 6 + 3 = 15
p^2_i + 3 + 3 = 9 corresponds to 2 * 9 + 3 = 21

If you see carefully, he is wisely avoiding all even multiples between 9 and 15 and 15 and 21 and so on. If you test this identity for any other prime, you will see the brilliance of this idea.


  

Thursday 6 October 2016

Python safeguards


  • Never create a resource with open() or with open() inside run method of a worker thread. Create a resource in the main thread and pass the reference to the resource to the worker instead.
  • with open() is better than open() for writers because it closes the writer automatically when the control exits its scope. This flushes all write buffer to the disk.
  • If open() is used for write(), explicitly call f.close() or f.flush() as you would normally do in Java.

Friday 9 September 2016

Long.parseLong(String val, 16) gives incorrect Hex value

I recently noted a bug in the Java Long.parseLong function that causes the returned hexadecimal value to be incorrect sometimes when run on different machines. I presume this is likely due to inconsistency in implementations between different java versions, however this needs to be verified.

Interestingly enough, I also verified that the alternate Long.toHexString function gives the correct hexadecimal value.

Just a heads up!

Friday 6 May 2016

Brian Kernighan's technique to count set bits

Brian Kernighan's iterative technique to count set bits in an integer is described in the link below. Just a little help here to see how it works.

Let's start with an example.
Say for x = 9, you want to count number of set bits (9 = (1001) base 2)
Initialize a count variable to 0;

while (x > 0) {
Compute x - 1 = 8. 
Compute x = x & ( x - 1) 
increment count
}

Number of set bits = final value of "count" variable. Now, to see how it works, it is a good idea to rely on a method of back propagation. If you observe carefully, the stopping condition for the algorithm is when x = 0. It is good to understand when x becomes zero. x becomes zero during the iteration in which the value of x is a power of 2 which would also be when only one bit would be set in x.



Wednesday 4 May 2016

WORKING!!! A decent hack for an unresolvable Gmail issue!

It has been long and I have something interesting to share here.

About 6 months back, one of my Gmail accounts became victimized by a series of spoofing attacks that continues till this day. For starters, spoofing attacks are not absolutely worrisome. Your account does not get compromised here. It happens when an unknown attacker starts spamming people with your email address in the from or sender section due to which you receive these great many email bounces everyday and also get your account blocked from further email compositions on that day. This is because Gmail has an outbound mail threshold of 500 per day after which any further outbound activity from your mailbox is temporarily blocked.

What is interesting about spoofing itself is that there is no known technique to remediate the attack once it is initiated on your account. As for me, I tried everything possible from filing a bug to reporting to Gmail and I finally understood that I would simply just have to let go. But, that was until my account itself got suspended by Gmail at which point I realized I had to either shut down my account or try to workaround it separately.

If you every find yourself in a similar predicament, here is a simple thing you can do to override account suspension. Gmail lets you set a default email address for all your outbound mails and if you can create a new sender account and have it configured to be the default mail address for all your outgoing mails, what you are in essence doing is let this account take over all unwanted outgoing spams that would have otherwise gone from your mailbox. Of course, this would not be a one time job and you would need to repeat this exercise when inevitably your designated sender account would get suspended too at some point. But, at least, you have something to begin with here.

I have been following the changes for about a week now and I am excited to announce that it has been working till this time :) :)