Consistent hashing is a distributed systems technique that solves the problem of data redistribution when nodes are added or removed from a system. Traditional hashing (like key % num_servers) has a major flaw: when the number of servers changes, nearly all keys get redistributed to different servers, causing massive data movement.
Consistent hashing arranges both nodes and keys on a conceptual circle (or "ring"). Each node and key is assigned a position on this ring using a hash function. To find which node handles a key, you move clockwise around the ring from the key's position until you encounter the first node.
The genius of this approach is in its stability. When you add a new node, it only affects the keys that fall between the new node and the previous node clockwise from it. Similarly, removing a node only redistributes its keys to the next node clockwise. This means that most keys stay with their original nodes, minimizing data movement.
Virtual nodes are a crucial enhancement to basic consistent hashing. Instead of placing each physical server at just one position on the ring, we create multiple "virtual" copies of each server at different positions. This dramatically improves load distribution because it reduces the chance that one server gets a disproportionately large segment of the ring.
For example, if server A and server B are placed next to each other on the ring, server A might handle 90% of the keys while server B handles only 10%. With virtual nodes, we might place "server-A-1", "server-A-2", "server-A-3" and "server-B-1", "server-B-2", "server-B-3" at different positions, creating smaller, more evenly distributed segments.
1import hashlib
2
3def simple_hash_ring_demo():
4 # Simple demonstration of consistent hashing concept
5 nodes = ['server1', 'server2', 'server3']
6
7 # Hash nodes to positions on the ring
8 node_positions = {}
9 for node in nodes:
10 hash_val = int(hashlib.md5(node.encode()).hexdigest(), 16)
11 node_positions[hash_val] = node
12
13 # Sort positions for clockwise traversal
14 sorted_positions = sorted(node_positions.keys())
15 print(f"Nodes on ring: {sorted_positions[:3]}...") # Show first 3 positions
16
17 # Find node for a key
18 key = "user123"
19 key_hash = int(hashlib.md5(key.encode()).hexdigest(), 16)
20
21 # Find first node clockwise from key position
22 for pos in sorted_positions:
23 if pos >= key_hash:
24 assigned_node = node_positions[pos]
25 break
26 else:
27 # Wrap around to first node
28 assigned_node = node_positions[sorted_positions[0]]
29
30 print(f"Key '{key}' assigned to: {assigned_node}")
31
32simple_hash_ring_demo()