Retry Logic

Definition

Retry Logic is a fault-tolerance pattern that automatically reattempts failed API requests when transient errors occur (network timeouts, temporary service unavailability, rate limiting). Retries should use exponential backoff with jitter to avoid overwhelming recovering services and must distinguish between retryable errors (5xx, timeouts) and non-retryable errors (4xx client errors).

Key principles:

Idempotency - Only retry idempotent operations (GET, PUT, DELETE) or use idempotency keys for POST
Exponential Backoff - Increase delays between retries (1s, 2s, 4s, 8s)
Jitter - Add randomness to backoff to prevent thundering herd
Max Retries - Limit attempts (typically 3-5) to fail fast
Selective Retry - Only retry errors that might succeed on subsequent attempts

Example

AWS SDK Retry Strategy:

The AWS SDK implements sophisticated retry logic:

// Automatic retries with exponential backoff
const s3 = new AWS.S3({
  maxRetries: 3,
  retryDelayOptions: {
    base: 100, // Base delay in ms
    customBackoff: (retryCount) => {
      // Exponential backoff with jitter
      const delay = Math.pow(2, retryCount) * 100;
      const jitter = Math.random() * 100;
      return delay + jitter;
    }
  }
});

// Retry behavior:
// - 500/503 errors: Retry with backoff
// - 429 throttling: Retry with backoff
// - Network timeouts: Retry
// - 400 client errors: Don't retry

Code Example

class RetryableClient {
  constructor(config = {}) {
    this.maxRetries = config.maxRetries || 3;
    this.baseDelay = config.baseDelay || 1000; // 1 second
    this.maxDelay = config.maxDelay || 30000;  // 30 seconds
  }

  async request(url, options = {}) {
    let lastError;

    for (let attempt = 0; attempt <= this.maxRetries; attempt++) {
      try {
        const response = await fetch(url, options);

        // Success - return immediately
        if (response.ok) {
          return await response.json();
        }

        // Check if error is retryable
        if (!this.isRetryable(response.status)) {
          const error = await response.json();
          throw new Error(`Non-retryable error ${response.status}: ${error.message}`);
        }

        // Rate limited - respect Retry-After
        if (response.status === 429) {
          const retryAfter = response.headers.get('Retry-After');
          const delay = retryAfter ? parseInt(retryAfter) * 1000 : this.calculateDelay(attempt);
          
          console.warn(`Rate limited. Retrying after ${delay}ms`);
          await this.sleep(delay);
          continue;
        }

        // Server error - retry with backoff
        if (response.status >= 500) {
          lastError = new Error(`Server error ${response.status}`);
          
          if (attempt < this.maxRetries) {
            const delay = this.calculateDelay(attempt);
            console.warn(`Server error. Retry ${attempt + 1}/${this.maxRetries} after ${delay}ms`);
            await this.sleep(delay);
            continue;
          }
        }

      } catch (error) {
        // Network errors are retryable
        if (error.name === 'TypeError' || error.name === 'AbortError') {
          lastError = error;
          
          if (attempt < this.maxRetries) {
            const delay = this.calculateDelay(attempt);
            console.warn(`Network error. Retry ${attempt + 1}/${this.maxRetries} after ${delay}ms`);
            await this.sleep(delay);
            continue;
          }
        }
        
        throw error;
      }
    }

    throw new Error(`Max retries (${this.maxRetries}) exceeded: ${lastError.message}`);
  }

  isRetryable(status) {
    // Retry server errors and rate limiting
    if (status === 429 || status >= 500) return true;
    
    // Don't retry client errors
    if (status >= 400 && status < 500) return false;
    
    return false;
  }

  calculateDelay(attempt) {
    // Exponential backoff: 1s, 2s, 4s, 8s
    const exponentialDelay = Math.pow(2, attempt) * this.baseDelay;
    
    // Add jitter (0-25% of delay) to prevent thundering herd
    const jitter = Math.random() * 0.25 * exponentialDelay;
    
    // Cap at maxDelay
    return Math.min(exponentialDelay + jitter, this.maxDelay);
  }

  sleep(ms) {
    return new Promise(resolve => setTimeout(resolve, ms));
  }
}

// Usage
const client = new RetryableClient({
  maxRetries: 3,
  baseDelay: 1000,
  maxDelay: 30000
});

try {
  const data = await client.request('https://api.example.com/data', {
    method: 'GET',
    headers: { 'Authorization': 'Bearer token' }
  });
  console.log('Success:', data);
} catch (error) {
  console.error('Failed after retries:', error);
}

Diagram

sequenceDiagram
    participant Client
    participant API
    participant Monitor

    Note over Client: Attempt 1
    Client->>API: GET /data
    API-->>Client: 503 Service Unavailable
    Client->>Client: Wait 1s (exponential backoff)

    Note over Client: Attempt 2
    Client->>API: GET /data
    API-->>Client: 500 Internal Server Error
    Client->>Client: Wait 2s + jitter

    Note over Client: Attempt 3
    Client->>API: GET /data
    API-->>Client: 429 Rate Limited
Retry-After: 5
    Client->>Client: Wait 5s (honor Retry-After)

    Note over Client: Attempt 4
    Client->>API: GET /data
    API-->>Client: 200 OK + data
    Client->>Monitor: Log successful retry after 3 attempts

Best Practices

1. Only Retry Idempotent Operations GET, PUT, DELETE are safe to retry. For POST, use idempotency keys to prevent duplicate operations.

2. Use Exponential Backoff with Jitter Increase delays exponentially (1s, 2s, 4s, 8s) and add random jitter (0-25%) to prevent synchronized retries.

3. Respect Retry-After Headers When receiving 429 or 503 responses, honor the Retry-After header instead of using backoff calculations.

4. Limit Max Retries Set reasonable limits (3-5 retries) to fail fast and avoid wasting resources on permanently broken requests.

5. Classify Errors Correctly Never retry 4xx client errors (except 429). Always retry 5xx server errors and network timeouts.

6. Implement Circuit Breakers After consecutive failures, stop retrying and enter a cooldown period to prevent cascading failures.

7. Log Retry Attempts Record retry counts, delays, and final outcomes for observability and debugging.

8. Set Timeout Per Attempt Each retry should have its own timeout to prevent total request time from exceeding reasonable limits.

9. Use Idempotency Keys For non-idempotent operations (POST), include idempotency keys to safely retry without duplicating side effects.

10. Monitor Retry Rates Track retry success rates, average attempts, and patterns to identify systemic issues.

Common Mistakes

1. Retrying Non-Idempotent Operations Without Keys Retrying POST requests without idempotency keys can create duplicate resources (e.g., double charging).

2. Fixed Delay Between Retries Using constant delays (e.g., always 1s) can overwhelm recovering services. Always use exponential backoff.

3. No Jitter Without jitter, all clients retry simultaneously after outages, causing thundering herd problems.

4. Retrying Client Errors Retrying 400/404 errors wastes resources since they will never succeed without fixing the request.

5. Infinite Retries Not capping retry attempts leads to resource leaks and delays in surfacing failures to users.

6. Ignoring Rate Limit Headers Not respecting Retry-After headers causes aggressive retries that worsen rate limiting.

7. No Timeout Per Retry Without per-attempt timeouts, a single slow retry can block the client indefinitely.

8. Not Logging Retries Missing retry metrics makes it impossible to diagnose intermittent issues or optimize retry strategies.

Standards & RFCs

1)RFC 7231- [HTTP/1.1](https://reference.apios.info/terms/http-1-1/) Semantics (idempotent methods)

2)RFC 6585- HTTP Status Code 429 ([Rate Limiting](https://reference.apios.info/terms/rate-limiting/))

3)RFC 7231- Retry-After header

4)- AWS Architecture Blog - Exponential Backoff and Jitter