Consistent hashing

In computer science, consistent hashing is a special kind of hashing such that when a hash table is resized, only K / n {displaystyle K/n} keys need to be remapped on average, where K {displaystyle K} is the number of keys, and n {displaystyle n} is the number of slots. In contrast, in most traditional hash tables, a change in the number of array slots causes nearly all keys to be remapped because the mapping between the keys and the slots is defined by a modular operation. In computer science, consistent hashing is a special kind of hashing such that when a hash table is resized, only K / n {displaystyle K/n} keys need to be remapped on average, where K {displaystyle K} is the number of keys, and n {displaystyle n} is the number of slots. In contrast, in most traditional hash tables, a change in the number of array slots causes nearly all keys to be remapped because the mapping between the keys and the slots is defined by a modular operation. Consistent hashing achieves some of the goals of rendezvous hashing (also called HRW Hashing), which is more general, since consistent hashing has been shown to be a special case of rendezvous hashing. Rendezvous hashing was first described in 1996, while consistent hashing appeared in 1997. The two techniques use different algorithms. The term 'consistent hashing' was introduced by Karger et al. at MIT for use in distributed caching. This academic paper from 1997 introduced the term 'consistent hashing' as a way of distributing requests among a changing population of Web servers. Each slot is then represented by a node in a distributed system. The addition (joins) and removal (leaves/failures) of nodes only requires K / n {displaystyle K/n} items to be re-shuffled when the number of slots/nodes change. The authors mention Linear hashing and its ability to handle sequential addition and removal of nodes, while consistent hashing allows buckets to be added and removed in arbitrary order. Teradata used this technique in their distributed database, released in 1986, although they did not use this term. Teradata still use the concept of a Hash table to fulfill exactly this purpose. Akamai Technologies was founded in 1998 by the scientists Daniel Lewin and F. Thomson Leighton (co-authors of the article coining 'consistent hashing') to apply this algorithm, which gave birth to the content delivery network industry. Consistent hashing has also been used to reduce the impact of partial system failures in large Web applications as to allow for robust caches without incurring the system wide fallout of a failure. The consistent hashing concept also applies to the design of distributed hash tables (DHTs). DHTs use consistent hashing to partition a keyspace among a distributed set of nodes, and additionally provide an overlay network that connects nodes such that the node responsible for any key can be efficiently located. Rendezvous hashing, designed at the same time as consistent hashing, achieves the same goals using the very different Highest Random Weight (HRW) algorithm. While running collections of caching machines some limitations are experienced. A common way of load balancing n {displaystyle n} cache machines is to put object o {displaystyle o} in cache machine number hash ( o ) ( mod n ) {displaystyle { ext{hash}}(o);left({ ext{mod }}n ight)} . But this will not work if a cache machine is added or removed because n {displaystyle n} changes and every object is hashed to a new location. This can be disastrous since the originating content servers are flooded with requests from the cache machines. Hence consistent hashing is needed to avoid swamping of servers.

Parent Topic

Child Topic

No Parent Topic