Consistent Hashing Ring

mediumPython

Lesson

Understanding Consistent Hashing

Consistent hashing is a distributed systems technique that solves the problem of data redistribution when nodes are added or removed from a system. Traditional hashing (like key % num_servers) has a major flaw: when the number of servers changes, nearly all keys get redistributed to different servers, causing massive data movement.

Consistent hashing arranges both nodes and keys on a conceptual circle (or "ring"). Each node and key is assigned a position on this ring using a hash function. To find which node handles a key, you move clockwise around the ring from the key's position until you encounter the first node.

The genius of this approach is in its stability. When you add a new node, it only affects the keys that fall between the new node and the previous node clockwise from it. Similarly, removing a node only redistributes its keys to the next node clockwise. This means that most keys stay with their original nodes, minimizing data movement.

Virtual nodes are a crucial enhancement to basic consistent hashing. Instead of placing each physical server at just one position on the ring, we create multiple "virtual" copies of each server at different positions. This dramatically improves load distribution because it reduces the chance that one server gets a disproportionately large segment of the ring.

For example, if server A and server B are placed next to each other on the ring, server A might handle 90% of the keys while server B handles only 10%. With virtual nodes, we might place "server-A-1", "server-A-2", "server-A-3" and "server-B-1", "server-B-2", "server-B-3" at different positions, creating smaller, more evenly distributed segments.

Example
1import hashlib 2 3def simple_hash_ring_demo(): 4 # Simple demonstration of consistent hashing concept 5 nodes = ['server1', 'server2', 'server3'] 6 7 # Hash nodes to positions on the ring 8 node_positions = {} 9 for node in nodes: 10 hash_val = int(hashlib.md5(node.encode()).hexdigest(), 16) 11 node_positions[hash_val] = node 12 13 # Sort positions for clockwise traversal 14 sorted_positions = sorted(node_positions.keys()) 15 print(f"Nodes on ring: {sorted_positions[:3]}...") # Show first 3 positions 16 17 # Find node for a key 18 key = "user123" 19 key_hash = int(hashlib.md5(key.encode()).hexdigest(), 16) 20 21 # Find first node clockwise from key position 22 for pos in sorted_positions: 23 if pos >= key_hash: 24 assigned_node = node_positions[pos] 25 break 26 else: 27 # Wrap around to first node 28 assigned_node = node_positions[sorted_positions[0]] 29 30 print(f"Key '{key}' assigned to: {assigned_node}") 31 32simple_hash_ring_demo()
L6Hash each node to get its position on the ring
L15Find the first node clockwise from the key's hash position
L20Handle wraparound - if no node found, use the first node on the ring

Key Takeaways

  • •Consistent hashing minimizes key redistribution when nodes are added/removed by using a ring structure
  • •Virtual nodes improve load distribution by giving each physical node multiple positions on the ring
  • •The technique is widely used in distributed databases and caching systems like Amazon DynamoDB and Cassandra
Loading...