Wednesday, November 24, 2010

Consistent Hashing

Hashing is a common method of mapping a key to a location. This is useful for many things, but in more relevant terms, this can be used to map keys to a server with great effect. Simple hashing use a Key mod N algorithm, where K is the number of keys and N is the number of slots or servers. This ensures that keys are mapped evenly across N slots. The problem with this algorithm is that adding or removing a slot or server would require a complete rehash of all the keys. And in case of huge data set, it is ideally not feasable to rehash and re-distribute the keys.

Consistent Hashing is a specific implementation of hashing that is well suited for many of today’s web-scale load balancing problems. Consistent Hashing is used particularly because it provides a solution for the typical “K mod N” method of distributing keys across a series of servers. It does this by allowing servers to be added or removed without significantly upsetting the distribution of keys, nor does it require that all keys be rehashed to accommodate the change in the number of servers. When using consistent hashing, only K/N keys need to be remapped on average.

Implementing Consistent Hashing is done by mapping keys and servers onto edge of a circle. All servers are mapped on to a series of angles around a circle. Each key is also hashed onto the circle. Each hashed server contains all keys between itself and the next clock-wise server hashed onto the circle. The bucket where each item should be stored is chosen by selecting the next highest angle which an available bucket maps to. So, each bucket contains resources mapping to an angle between it and the next smallest angle. If a bucket becomes unavailable, the keys being mapped to that bucket get mapped to the next highest bucket (or the next bucket in the circle). So, only keys which were in the bucket which became unavailable is lost. Similarly when a bucket is added, the keys between the new bucket and the next smallest bucket is mapped to the new bucket. Keys which should be associated with the new bucket and were stored previously will become unavailable.

As shown in Figure 1, Keys 1, 2, 3 and 4 map to slots A, B and C. To find which slot a key goes in, we move around the circle until we find a slot. So here key 1 goes into slot A, 2 goes into slot B and 3 goes into slot C, key 4 goes into slot A again. If C is removed, key 3 would belong to slot A.

If another slot D is added as shown in Figure 2, it will take keys 3 and 4 and only leave key 1 belonging to A.

Fig 1: Keys distribution with Consistent hashing
Fig 2: Keys re-distribution with Consistent hashing

No comments:

Post a Comment