Error Handling

Definition

Error Handling in APIs encompasses strategies for detecting failures, communicating them to clients, logging diagnostic information, and implementing recovery mechanisms. Effective error handling distinguishes between transient errors (temporary network issues) and permanent failures (invalid requests), provides actionable error messages, and maintains system stability during partial outages.

Key components include:

Detection - Identifying errors at application, network, and infrastructure layers
Classification - Distinguishing client errors (4xx) from server errors (5xx)
Communication - Returning structured, actionable error responses
Logging - Recording errors with context for debugging
Recovery - Implementing retries, fallbacks, circuit breakers
Monitoring - Tracking error rates, patterns, and trends

Example

Stripe API Error Handling:

Stripe returns structured error objects with consistent formats:

{
  "error": {
    "type": "card_error",
    "code": "card_declined",
    "decline_code": "insufficient_funds",
    "message": "Your card has insufficient funds.",
    "param": "payment_method",
    "request_id": "req_abc123"
  }
}

Features:

type: Error category (card_error, api_error, invalid_request_error)
code: Machine-readable error code for programmatic handling
message: Human-readable description
param: Which field caused the error
request_id: Unique ID for support inquiries

Client behavior:

4xx errors → Don’t retry, fix request
5xx errors → Retry with exponential backoff
429 errors → Respect Retry-After header

Code Example

// Comprehensive error handling pattern
class APIClient {
  constructor(baseURL, apiKey) {
    this.baseURL = baseURL;
    this.apiKey = apiKey;
  }

  async request(endpoint, options = {}) {
    const maxRetries = 3;
    let lastError;

    for (let attempt = 0; attempt <= maxRetries; attempt++) {
      try {
        const response = await fetch(`${this.baseURL}${endpoint}`, {
          ...options,
          headers: {
            'Authorization': `Bearer ${this.apiKey}`,
            'Content-Type': 'application/json',
            ...options.headers
          }
        });

        // Success
        if (response.ok) {
          return await response.json();
        }

        // Parse error response
        const errorData = await response.json();

        // Client errors (4xx) - don't retry
        if (response.status >= 400 && response.status < 500) {
          if (response.status === 429) {
            // Rate limited - check Retry-After header
            const retryAfter = response.headers.get('Retry-After');
            const delay = retryAfter ? parseInt(retryAfter) * 1000 : 60000;

            if (attempt < maxRetries) {
              await this.sleep(delay);
              continue;
            }
          }

          // Other 4xx - don't retry
          throw new APIClientError(
            errorData.error?.message || 'Client error',
            response.status,
            errorData.error?.code,
            errorData.error?.param
          );
        }

        // Server errors (5xx) - retry with backoff
        if (response.status >= 500) {
          lastError = new APIServerError(
            errorData.error?.message || 'Server error',
            response.status,
            errorData.error?.code
          );

          if (attempt < maxRetries) {
            const backoff = Math.pow(2, attempt) * 1000; // Exponential backoff
            await this.sleep(backoff);
            continue;
          }

          throw lastError;
        }

      } catch (error) {
        // Network errors - retry
        if (error instanceof TypeError && error.message.includes('fetch')) {
          lastError = new NetworkError('Network request failed', error);

          if (attempt < maxRetries) {
            const backoff = Math.pow(2, attempt) * 1000;
            await this.sleep(backoff);
            continue;
          }
        }

        throw error;
      }
    }

    throw lastError;
  }

  sleep(ms) {
    return new Promise(resolve => setTimeout(resolve, ms));
  }
}

// Custom error classes
class APIClientError extends Error {
  constructor(message, status, code, param) {
    super(message);
    this.name = 'APIClientError';
    this.status = status;
    this.code = code;
    this.param = param;
    this.retryable = false;
  }
}

class APIServerError extends Error {
  constructor(message, status, code) {
    super(message);
    this.name = 'APIServerError';
    this.status = status;
    this.code = code;
    this.retryable = true;
  }
}

class NetworkError extends Error {
  constructor(message, cause) {
    super(message);
    this.name = 'NetworkError';
    this.cause = cause;
    this.retryable = true;
  }
}

// Usage
const client = new APIClient('https://api.example.com', 'sk_live_...');

try {
  const payment = await client.request('/v1/payments', {
    method: 'POST',
    body: JSON.stringify({
      amount: 1000,
      currency: 'usd'
    })
  });
  console.log('Payment created:', payment.id);
} catch (error) {
  if (error instanceof APIClientError) {
    console.error(`Client error (${error.status}): ${error.message}`);
    if (error.param) {
      console.error(`Invalid parameter: ${error.param}`);
    }
  } else if (error instanceof APIServerError) {
    console.error(`Server error (${error.status}): ${error.message}`);
    // Alert operations team
  } else if (error instanceof NetworkError) {
    console.error('Network error:', error.message);
    // Show offline UI
  }
}

Diagram

graph TB
    A[API Request] --> B{Response Status}
    B -->|2xx Success| C[Return Data]
    B -->|4xx Client Error| D{Error Type}
    B -->|5xx Server Error| E[Retry Logic]
    B -->|Network Error| E

    D -->|400/422| F[Validation Error
Don't Retry]
    D -->|401/403| GAuth Error
[Refresh Token]
    D -->|404| H[Not Found
Don't Retry]
    D -->|429| I[Rate Limit
Wait Retry-After]

    E -->|Attempt 1| J[Backoff 1s]
    J -->|Attempt 2| K[Backoff 2s]
    K -->|Attempt 3| L[Backoff 4s]
    L -->|Max Retries| M[Throw Error]

    F --> N[Log Error]
    G --> O[Update Auth]
    H --> N
    I --> P[Sleep]
    P --> A
    M --> N

    N --> Q[Error Monitoring]
    Q --> R[Alert if threshold exceeded]

    style C fill:#90EE90
    style F fill:#FFD700
    style G fill:#FFD700
    style H fill:#FFD700
    style I fill:#FFA500
    style M fill:#FF6B6B
    style R fill:#FF6B6B

Best Practices

1. Use Standard HTTP Status Codes Return appropriate status codes: 400 (bad request), 401 (unauthorized), 404 (not found), 429 (rate limited), 500 (server error), 503 (service unavailable).

2. Return Structured Error Responses Always return errors in a consistent JSON format with type, code, message, and contextual fields.

3. Distinguish Retryable vs Non-Retryable Errors Client errors (4xx except 429) should not be retried. Server errors (5xx) and network errors should retry with exponential backoff.

4. Include Request IDs Return unique request IDs in error responses to correlate with server logs for debugging.

5. Implement Circuit Breakers After N consecutive failures, stop sending requests to failing services for a cooldown period to prevent cascading failures.

6. Log Errors with Context Include request/response data, user context, timestamps, and stack traces in error logs (sanitize sensitive data).

7. Respect Rate Limit Headers When receiving 429 errors, honor the Retry-After header and implement exponential backoff.

8. Provide Actionable Error Messages Error messages should explain what went wrong and how to fix it. Avoid generic “Something went wrong” messages.

9. Monitor Error Rates Track error rates per endpoint, status code, and error type. Alert when thresholds are exceeded.

10. Implement Graceful Degradation When dependent services fail, degrade functionality rather than failing entirely (e.g., use cached data, disable non-critical features).

Security Notes

SECURITY NOTES

CRITICAL: Never expose internal error details in production. Sanitize all error responses to prevent information disclosure attacks.

Information Disclosure Prevention:

No stack traces: Never include stack traces in production responses
No database details: Sanitize database queries, table names, column names from error messages
Hide file paths: Don’t expose server filesystem paths or internal directory structure
Generic server errors: Use “Internal server error” for 5xx errors; log details internally only
No internal service names: Don’t reveal microservice architecture or internal service names

Error Response Sanitization:

Validate error input: Sanitize error messages to prevent XSS injection
Don’t echo user input: If showing “invalid X”, don’t display the user’s input directly
Consistent error format: Use standardized error responses to minimize information leakage
Exclude implementation details: Don’t reveal libraries, versions, or frameworks used

Authentication & Enumeration Prevention:

Identical responses for credentials: Return same response for invalid credentials and non-existent users
Prevent user enumeration: Don’t reveal which emails are registered via error messages
Rate limit auth failures: Limit failed login attempts to prevent brute force
No information on failure: “Login failed” not “Invalid email” or “Wrong password”

Attack Detection & Monitoring:

Separate security logging: Log authentication failures, injection attempts, enumeration attacks
Monitor error patterns: Track suspicious error patterns (credential stuffing, path traversal attempts)
Alert on anomalies: Abnormal error rates could indicate attacks
Security context: Include user ID, IP address, timestamp in security error logs

Logging Best Practices:

Log internally only: Store detailed errors in server logs, not client responses
Sanitize logs: Don’t log passwords, API keys, or PII
Structured logging: Use consistent format for error logs (timestamp, severity, error code, context)
Retention policy: Define how long error logs are retained for audit/forensics

Common Mistakes

1. Exposing Stack Traces in Production Returning full stack traces reveals internal implementation details and file paths, aiding attackers.

2. Retrying Non-Retryable Errors Retrying 400/404 errors wastes resources and delays failure reporting to users.

3. No Exponential Backoff Fixed retry intervals can overload recovering services. Always use exponential backoff with jitter.

4. Generic Error Messages “Error occurred” provides no actionable information. Specify what failed and how to fix it.

5. Not Logging Error Context Logging only error messages without request IDs, user context, or timestamps makes debugging impossible.

6. Ignoring Rate Limit Headers Not respecting Retry-After headers leads to aggressive retries that worsen rate limiting.

7. No Circuit Breaker Continuously retrying a failing service causes cascading failures and resource exhaustion.

8. Inconsistent Error Formats Different endpoints returning different error structures breaks client error handling logic.

Standards & RFCs

1)RFC 7807- Problem Details for HTTP APIs (structured errors)

2)RFC 6585- HTTP Status Code 429 (Too Many Requests)

3)RFC 7231- [HTTP/1.1](https://reference.apios.info/terms/http-1-1/) Semantics (status codes)

4)OTel- Distributed tracing for error diagnosis