Logging

Definition

It’s 3 AM and your production system is down. Customers are complaining, your team is scrambling, and you need to figure out what happened - fast. You open your logs and see… nothing useful. Just a wall of unstructured text with no correlation between entries, no context about what was happening when things broke. This nightmare scenario is why logging matters: logs are your system’s memory, and good logging practices can mean the difference between a 5-minute fix and a 5-hour investigation.

Logging is the practice of recording discrete events that happen in your application - requests received, errors caught, state changes, performance data. Unlike metrics (which aggregate data over time) or traces (which follow request paths), logs capture specific moments with rich context. A log entry might say “User [email protected] failed login attempt from IP 192.168.1.1 at 2024-01-15T10:30:00Z - reason: invalid password.”

Modern logging has evolved from simple text files to structured data (JSON), centralized log aggregation (ELK, Splunk, Datadog), and intelligent analysis. Structured logging means each log entry is a parseable object with fields like timestamp, level, service, correlation_id, and message - making logs searchable, filterable, and analyzable at scale. When done right, logs answer the question “what happened?” with precision.

Example

Debugging a Production Incident: Netflix experiences slow video starts. Logs show a spike in “connection timeout” entries from a specific region, all with the same upstream service. Cross-referencing correlation IDs reveals the CDN in that region is failing. Total time to diagnosis: 5 minutes, because the logs told the story.

Security Audit Trail: A bank needs to prove what happened during a disputed transaction. Logs show: user authenticated at 10:00 AM, viewed account at 10:01, initiated transfer at 10:02, confirmed with 2FA at 10:02:30, transfer completed at 10:02:35. Each entry has timestamps, user IDs, IP addresses, and transaction details - a complete audit trail.

API Rate Limiting Investigation: A customer complains they’re being rate limited unfairly. Logs show their API key made 10,000 requests in the last minute, all from the same IP, 90% returning the same cached response - suggesting a misconfigured retry loop. The logs proved the rate limiting was correct.

Performance Regression Detection: After a deployment, logs show increased “database query slow” warnings. Filtering by correlation ID reveals a new endpoint is executing N+1 queries. The log entries include query timing, SQL statements, and the request that triggered them - enough to identify and fix the regression.

Third-Party Integration Troubleshooting: Your Stripe webhook isn’t working. Logs show the webhook arriving, signature validation passing, then a timeout calling your inventory service. Without logs, you’d blame Stripe; with logs, you found your own bug.

Analogy

The Flight Recorder (Black Box): Aircraft have flight recorders that capture everything happening during a flight. After an incident, investigators analyze the recordings to understand what happened. Your application logs serve the same purpose - they’re the black box for your software.

The Security Camera System: Security cameras don’t prevent theft, but they let you see exactly what happened, when, and by whom. Logs are your application’s security cameras - recording events for later investigation.

The Ship’s Log: Historically, ships kept detailed logs of their voyages - weather, position, incidents, decisions. If something went wrong, the log told the story. The captain didn’t log “sailed somewhere” - they logged “departed London at 0800, heading SW at 12 knots, crew of 45, cargo: 200 barrels of rum.”

The Medical Record: Doctors don’t just remember your history - they write it down. Every visit, test result, and prescription is logged. Years later, a new doctor can understand your complete health history. Application logs work the same way for your software’s “health.”

Code Example

// Structured logging with correlation and context
import winston from 'winston';

// Configure structured JSON logging
const logger = winston.createLogger({
  level: process.env.LOG_LEVEL || 'info',
  format: winston.format.combine(
    winston.format.timestamp(),
    winston.format.json()
  ),
  defaultMeta: {
    service: 'user-api',
    version: process.env.APP_VERSION,
    environment: process.env.NODE_ENV
  },
  transports: [
    new winston.transports.Console(),
    new winston.transports.File({ filename: 'logs/error.log', level: 'error' }),
    new winston.transports.File({ filename: 'logs/combined.log' })
  ]
});

// Request logging middleware
function requestLogger(req: Request, res: Response, next: NextFunction) {
  const correlationId = req.headers['x-correlation-id'] || uuid();
  const startTime = Date.now();

  // Attach to request for use in handlers
  req.correlationId = correlationId;
  req.logger = logger.child({ correlationId });

  // Log request start
  req.logger.info('Request started', {
    method: req.method,
    path: req.path,
    query: req.query,
    userAgent: req.headers['user-agent'],
    ip: req.ip
  });

  // Log response on finish
  res.on('finish', () => {
    const duration = Date.now() - startTime;
    const level = res.statusCode >= 500 ? 'error' :
                  res.statusCode >= 400 ? 'warn' : 'info';

    req.logger.log(level, 'Request completed', {
      method: req.method,
      path: req.path,
      statusCode: res.statusCode,
      durationMs: duration,
      contentLength: res.get('content-length')
    });
  });

  next();
}

// Application logging with context
async function createUser(req: Request, res: Response) {
  const { email, name } = req.body;

  req.logger.info('Creating user', { email });

  try {
    const user = await db.users.create({ email, name });

    req.logger.info('User created successfully', {
      userId: user.id,
      email
    });

    res.status(201).json(user);
  } catch (error) {
    req.logger.error('Failed to create user', {
      email,
      error: error.message,
      stack: error.stack,
      code: error.code
    });

    res.status(500).json({ error: 'Failed to create user' });
  }
}

// Log levels for different purposes
req.logger.debug('Detailed debugging info', { rawData });  // Development only
req.logger.info('Normal operations', { action: 'user_login' });  // Business events
req.logger.warn('Potential problems', { memoryUsage: '85%' });  // Warnings
req.logger.error('Errors occurred', { error, stack });  // Errors
req.logger.fatal('System critical', { service: 'down' });  // Critical failures

// Output example (JSON structured log):
// {
//   "timestamp": "2024-01-15T10:30:00.000Z",
//   "level": "info",
//   "service": "user-api",
//   "version": "1.2.3",
//   "environment": "production",
//   "correlationId": "abc-123-xyz",
//   "message": "User created successfully",
//   "userId": "user_456",
//   "email": "[email protected]"
// }

Diagram

flowchart LR
    subgraph Applications["Applications"]
        A1[Web API]
        A2[Worker]
        A3[Mobile Backend]
    end

    subgraph LogPipeline["Log Pipeline"]
        B[Log Shipper
Filebeat/Fluentd]
        CMessage Queue
[Kafka]
        D[Log Processor
Logstash]
    end

    subgraph Storage["Storage & Analysis"]
        E[(Elasticsearch
Index)]
        F[Kibana
Dashboard]
        G[Alerts
PagerDuty]
    end

    A1 --> B
    A2 --> B
    A3 --> B
    B --> C
    C --> D
    D --> E
    E --> F
    E --> G

    style A1 fill:#93c5fd
    style A2 fill:#93c5fd
    style A3 fill:#93c5fd
    style E fill:#86efac
    style F fill:#fcd34d

Security Notes

SECURITY NOTES

CRITICAL: Logging exposes sensitive data. Never log passwords, tokens, PII, or API keys.

Sensitive Data Redaction:

Never log credentials: Exclude passwords, API keys, tokens from logs
Redact PII: Mask credit cards, SSNs, email addresses, phone numbers
Redact headers: Exclude Authorization, Cookie, or Set-Cookie headers
Redact request body: Exclude sensitive fields from request/response logs
Redact URLs: Exclude sensitive query parameters from logged URLs

Log Injection Prevention:

Sanitize input: Sanitize user input before logging
Prevent CRLF injection: Don’t allow in log entries
Escape special characters: Properly escape special characters
Structured logging: Use JSON or structured format to prevent injection
Log parser security: Ensure log parsers are secure

Log Access Control:

Restrict log access: Only authorized personnel can view logs
Encryption at rest: Encrypt logs on disk
Encryption in transit: Encrypt logs in transit to log storage
Access logging: Log who accessed what logs and when
Multi-factor auth: Require MFA for log access

Audit & Retention:

Security events: Log authentication failures, authorization denials, anomalies
Retention policy: Define how long logs are retained
Immutability: Prevent log tampering or deletion
Backup: Maintain backups of important logs
Monitoring: Alert on suspicious log activity

Performance & Scalability:

Asynchronous logging: Log asynchronously to avoid blocking requests
Log levels: Use appropriate log levels (INFO, WARN, ERROR)
Sampling: Sample high-volume logs to avoid overwhelming storage
Compression: Compress logs to reduce storage requirements
Monitoring: Monitor log storage to prevent disk exhaustion

Compliance:

Regulatory requirements: Meet GDPR, HIPAA, PCI-DSS logging requirements
Log comprehensiveness: Log sufficient detail for forensics and audits
Tamper evidence: Ensure logs cannot be tampered with
Data minimization: Only log necessary information

Best Practices

Use structured logging (JSON) - Makes logs searchable, parseable, and analyzable at scale
Include correlation IDs - Every request should have an ID that follows it through all services
Use appropriate log levels - DEBUG for development, INFO for normal operations, WARN for problems, ERROR for failures
Add context to every log - Include relevant data like user IDs, request IDs, timing information
Centralize logs - Aggregate logs from all services into one searchable system (ELK, Splunk, Datadog)
Redact sensitive data - Never log passwords, tokens, PII, or payment information
Log at service boundaries - Entry and exit points of your service are critical logging locations
Include timestamps with timezone - Use ISO 8601 format with UTC for consistency
Don’t log too much - Excessive logging impacts performance and costs money; log what matters
Make logs actionable - A log entry should help someone understand what happened and what to do

Common Mistakes

Unstructured text logs: console.log("error: " + error) becomes impossible to search at scale. Use structured JSON.

Missing correlation IDs: Without correlation IDs, you can’t trace a request across services. Every log entry needs one.

Logging sensitive data: Passwords, tokens, and PII in logs are security and compliance nightmares.

Wrong log levels: Using ERROR for everything (or DEBUG in production) makes important logs impossible to find.

No context: “Error occurred” tells you nothing. “Failed to process payment for order_id=123, user_id=456, error=card_declined” tells you everything.

Log volume extremes: Logging nothing leaves you blind; logging everything is expensive and noisy. Find the balance.

Not logging errors with stack traces: When you catch an exception, log the full stack trace - you’ll need it for debugging.

Inconsistent log formats: Different formats across services make centralized analysis difficult. Standardize on a format.

Standards & RFCs

1)RFC 5424- The Syslog Protocol (structured syslog)

2)RFC 3164- BSD Syslog Protocol (original syslog)

3)OTel- Logging Specification - Modern observability standard

4)- Elastic Common [Schema](https://reference.apios.info/terms/schema/) (ECS) - Standardized log field names