The token bucket algorithm is one of the most popular techniques for rate limiting in software systems. It provides a flexible way to control the rate of requests while allowing for short bursts of activity.
Imagine a physical bucket that can hold a maximum number of tokens. Tokens are added to the bucket at a constant rate (the refill rate). When a request comes in, it must "spend" a token to be processed. If no tokens are available, the request is denied.
This algorithm has several key advantages:
Burst Handling: Unlike simple rate limiting that strictly enforces "X requests per second," token buckets allow bursts up to the bucket capacity. If your bucket holds 10 tokens and no requests have been made recently, a client can immediately make 10 requests.
Smooth Rate Control: The constant refill rate ensures that over time, the average request rate doesn't exceed your desired limit, even with occasional bursts.
Memory Efficiency: The algorithm only needs to track a few variables: current token count, last refill time, and configuration parameters.
The trickiest part of implementing a token bucket is handling time correctly. You need to calculate how much time has elapsed since the last refill and convert that to tokens. For example, if your refill rate is 2 tokens per second and 1.5 seconds have passed, you should add 3 tokens to the bucket.
Another important detail is ensuring tokens never exceed the maximum capacity. When adding tokens, always use the minimum of (current + new tokens) and the maximum bucket size.
Token buckets are widely used in API gateways, web servers, and networking equipment because they provide predictable rate limiting while maintaining good user experience through burst allowances.
1import time
2
3class SimpleRateLimiter:
4 def __init__(self, max_requests, time_window):
5 self.max_requests = max_requests
6 self.time_window = time_window
7 self.requests = []
8
9 def allow_request(self):
10 now = time.time()
11 # Remove old requests outside the time window
12 self.requests = [req_time for req_time in self.requests
13 if now - req_time < self.time_window]
14
15 if len(self.requests) < self.max_requests:
16 self.requests.append(now)
17 return True
18 return False