When building resilient systems, network calls and external API requests can fail for various reasons - temporary network issues, server overload, or rate limiting. Rather than immediately giving up, a well-designed system implements retry logic with exponential backoff to gracefully handle transient failures.
Exponential backoff is a strategy where the delay between retry attempts increases exponentially. Instead of retrying immediately or with fixed intervals, you wait progressively longer: 1 second, then 2 seconds, then 4 seconds, and so on. This approach prevents overwhelming an already struggling service while giving it time to recover.
Jitter adds randomness to retry delays to prevent the "thundering herd" problem. When many clients retry simultaneously with identical delays, they can create synchronized waves of requests that continue to overload the server. By adding random variation (typically ±25%), requests spread out more naturally.
The key components of effective retry logic are:
This pattern is ubiquitous in distributed systems, cloud services, and API integrations where network reliability cannot be guaranteed.
1import time
2import random
3
4def retry_with_backoff(func, max_attempts=3):
5 """Simple retry wrapper with exponential backoff"""
6 for attempt in range(max_attempts):
7 try:
8 return func() # Try the operation
9 except (ConnectionError, TimeoutError) as e:
10 if attempt == max_attempts - 1:
11 raise e # Last attempt, re-raise the error
12
13 # Calculate exponential delay: 1, 2, 4 seconds...
14 delay = 2 ** attempt
15 # Add jitter to prevent thundering herd
16 jittered_delay = delay * random.uniform(0.8, 1.2)
17
18 print(f"Attempt {attempt + 1} failed, retrying in {jittered_delay:.2f}s")
19 time.sleep(jittered_delay)