Definition
Retry Logic is a fault-tolerance pattern that automatically reattempts failed API requests when transient errors occur (network timeouts, temporary service unavailability, rate limiting). Retries should use exponential backoff with jitter to avoid overwhelming recovering services and must distinguish between retryable errors (5xx, timeouts) and non-retryable errors (4xx client errors).
Key principles:
- Idempotency - Only retry idempotent operations (GET, PUT, DELETE) or use idempotency keys for POST
- Exponential Backoff - Increase delays between retries (1s, 2s, 4s, 8s)
- Jitter - Add randomness to backoff to prevent thundering herd
- Max Retries - Limit attempts (typically 3-5) to fail fast
- Selective Retry - Only retry errors that might succeed on subsequent attempts
Example
AWS SDK Retry Strategy:
The AWS SDK implements sophisticated retry logic:
// Automatic retries with exponential backoff
const s3 = new AWS.S3({
maxRetries: 3,
retryDelayOptions: {
base: 100, // Base delay in ms
customBackoff: (retryCount) => {
// Exponential backoff with jitter
const delay = Math.pow(2, retryCount) * 100;
const jitter = Math.random() * 100;
return delay + jitter;
}
}
});
// Retry behavior:
// - 500/503 errors: Retry with backoff
// - 429 throttling: Retry with backoff
// - Network timeouts: Retry
// - 400 client errors: Don't retry
Code Example
class RetryableClient {
constructor(config = {}) {
this.maxRetries = config.maxRetries || 3;
this.baseDelay = config.baseDelay || 1000; // 1 second
this.maxDelay = config.maxDelay || 30000; // 30 seconds
}
async request(url, options = {}) {
let lastError;
for (let attempt = 0; attempt <= this.maxRetries; attempt++) {
try {
const response = await fetch(url, options);
// Success - return immediately
if (response.ok) {
return await response.json();
}
// Check if error is retryable
if (!this.isRetryable(response.status)) {
const error = await response.json();
throw new Error(`Non-retryable error ${response.status}: ${error.message}`);
}
// Rate limited - respect Retry-After
if (response.status === 429) {
const retryAfter = response.headers.get('Retry-After');
const delay = retryAfter ? parseInt(retryAfter) * 1000 : this.calculateDelay(attempt);
console.warn(`Rate limited. Retrying after ${delay}ms`);
await this.sleep(delay);
continue;
}
// Server error - retry with backoff
if (response.status >= 500) {
lastError = new Error(`Server error ${response.status}`);
if (attempt < this.maxRetries) {
const delay = this.calculateDelay(attempt);
console.warn(`Server error. Retry ${attempt + 1}/${this.maxRetries} after ${delay}ms`);
await this.sleep(delay);
continue;
}
}
} catch (error) {
// Network errors are retryable
if (error.name === 'TypeError' || error.name === 'AbortError') {
lastError = error;
if (attempt < this.maxRetries) {
const delay = this.calculateDelay(attempt);
console.warn(`Network error. Retry ${attempt + 1}/${this.maxRetries} after ${delay}ms`);
await this.sleep(delay);
continue;
}
}
throw error;
}
}
throw new Error(`Max retries (${this.maxRetries}) exceeded: ${lastError.message}`);
}
isRetryable(status) {
// Retry server errors and rate limiting
if (status === 429 || status >= 500) return true;
// Don't retry client errors
if (status >= 400 && status < 500) return false;
return false;
}
calculateDelay(attempt) {
// Exponential backoff: 1s, 2s, 4s, 8s
const exponentialDelay = Math.pow(2, attempt) * this.baseDelay;
// Add jitter (0-25% of delay) to prevent thundering herd
const jitter = Math.random() * 0.25 * exponentialDelay;
// Cap at maxDelay
return Math.min(exponentialDelay + jitter, this.maxDelay);
}
sleep(ms) {
return new Promise(resolve => setTimeout(resolve, ms));
}
}
// Usage
const client = new RetryableClient({
maxRetries: 3,
baseDelay: 1000,
maxDelay: 30000
});
try {
const data = await client.request('https://api.example.com/data', {
method: 'GET',
headers: { 'Authorization': 'Bearer token' }
});
console.log('Success:', data);
} catch (error) {
console.error('Failed after retries:', error);
}
Diagram
sequenceDiagram
participant Client
participant API
participant Monitor
Note over Client: Attempt 1
Client->>API: GET /data
API-->>Client: 503 Service Unavailable
Client->>Client: Wait 1s (exponential backoff)
Note over Client: Attempt 2
Client->>API: GET /data
API-->>Client: 500 Internal Server Error
Client->>Client: Wait 2s + jitter
Note over Client: Attempt 3
Client->>API: GET /data
API-->>Client: 429 Rate Limited
Retry-After: 5
Client->>Client: Wait 5s (honor Retry-After)
Note over Client: Attempt 4
Client->>API: GET /data
API-->>Client: 200 OK + data
Client->>Monitor: Log successful retry after 3 attempts
Best Practices
1. Only Retry Idempotent Operations GET, PUT, DELETE are safe to retry. For POST, use idempotency keys to prevent duplicate operations.
2. Use Exponential Backoff with Jitter Increase delays exponentially (1s, 2s, 4s, 8s) and add random jitter (0-25%) to prevent synchronized retries.
3. Respect Retry-After Headers When receiving 429 or 503 responses, honor the Retry-After header instead of using backoff calculations.
4. Limit Max Retries Set reasonable limits (3-5 retries) to fail fast and avoid wasting resources on permanently broken requests.
5. Classify Errors Correctly Never retry 4xx client errors (except 429). Always retry 5xx server errors and network timeouts.
6. Implement Circuit Breakers After consecutive failures, stop retrying and enter a cooldown period to prevent cascading failures.
7. Log Retry Attempts Record retry counts, delays, and final outcomes for observability and debugging.
8. Set Timeout Per Attempt Each retry should have its own timeout to prevent total request time from exceeding reasonable limits.
9. Use Idempotency Keys For non-idempotent operations (POST), include idempotency keys to safely retry without duplicating side effects.
10. Monitor Retry Rates Track retry success rates, average attempts, and patterns to identify systemic issues.
Common Mistakes
1. Retrying Non-Idempotent Operations Without Keys Retrying POST requests without idempotency keys can create duplicate resources (e.g., double charging).
2. Fixed Delay Between Retries Using constant delays (e.g., always 1s) can overwhelm recovering services. Always use exponential backoff.
3. No Jitter Without jitter, all clients retry simultaneously after outages, causing thundering herd problems.
4. Retrying Client Errors Retrying 400/404 errors wastes resources since they will never succeed without fixing the request.
5. Infinite Retries Not capping retry attempts leads to resource leaks and delays in surfacing failures to users.
6. Ignoring Rate Limit Headers Not respecting Retry-After headers causes aggressive retries that worsen rate limiting.
7. No Timeout Per Retry Without per-attempt timeouts, a single slow retry can block the client indefinitely.
8. Not Logging Retries Missing retry metrics makes it impossible to diagnose intermittent issues or optimize retry strategies.