Signa - Become a Better Engineer

The token bucket algorithm is one of the most popular techniques for rate limiting in software systems. It provides a flexible way to control the rate of requests while allowing for short bursts of activity.

How Token Buckets Work

Imagine a physical bucket that can hold a maximum number of tokens. Tokens are added to the bucket at a constant rate (the refill rate). When a request comes in, it must "spend" a token to be processed. If no tokens are available, the request is denied.

This algorithm has several key advantages:

Burst Handling: Unlike simple rate limiting that strictly enforces "X requests per second," token buckets allow bursts up to the bucket capacity. If your bucket holds 10 tokens and no requests have been made recently, a client can immediately make 10 requests.

Smooth Rate Control: The constant refill rate ensures that over time, the average request rate doesn't exceed your desired limit, even with occasional bursts.

Memory Efficiency: The algorithm only needs to track a few variables: current token count, last refill time, and configuration parameters.

Key Implementation Considerations

The trickiest part of implementing a token bucket is handling time correctly. You need to calculate how much time has elapsed since the last refill and convert that to tokens. For example, if your refill rate is 2 tokens per second and 1.5 seconds have passed, you should add 3 tokens to the bucket.

Another important detail is ensuring tokens never exceed the maximum capacity. When adding tokens, always use the minimum of (current + new tokens) and the maximum bucket size.

Token buckets are widely used in API gateways, web servers, and networking equipment because they provide predictable rate limiting while maintaining good user experience through burst allowances.

Token Bucket Rate Limiter

Lesson

Understanding the Token Bucket Algorithm

How Token Buckets Work

Key Implementation Considerations

Key Takeaways