Kafka

Protocols & Transport Security Notes Jan 6, 2025 JAVASCRIPT

Definition

Imagine a massive newspaper printing press that never throws away old editions. Every article (event) ever published is stored on the press’s master roll, organized by section (topic). Readers (consumers) can subscribe to specific sections and read at their own pace - whether they want today’s news, last week’s, or articles from six months ago. That’s essentially what Apache Kafka does for data in modern applications.

Kafka is a distributed event streaming platform - a fancy way of saying it’s a super-powered message queue designed to handle enormous amounts of data flowing through a system in real-time. Unlike traditional message queues that delete messages once they’re read, Kafka keeps messages around for a configurable period (days, weeks, or even forever). This means multiple systems can read the same data, new systems can “replay” historical events, and you never lose data even if a consumer goes down temporarily.

What makes Kafka special is its ability to handle millions of messages per second while keeping them organized and durable. It’s used by companies like Netflix, Uber, and LinkedIn to process everything from user clicks to financial transactions. Think of it as the central nervous system of data-intensive applications - every significant event flows through Kafka, and any system that cares about that event can subscribe to it and react accordingly.

Example

Real-World Scenario 1: Uber’s Ride Tracking When you request an Uber, dozens of events are published to Kafka: ride requested, driver assigned, driver location updates (every few seconds), pickup confirmed, trip started, route changes, trip completed, payment processed. Different services consume these events: the map display reads location updates, the ETA calculator reads route data, fraud detection analyzes payment events, and analytics processes everything for business insights. Kafka handles millions of these events per minute across all rides happening globally.

Real-World Scenario 2: Netflix Viewing Analytics Every time you press play, pause, rewind, or stop watching something on Netflix, an event goes to Kafka. The recommendation engine consumes these events to understand your preferences. The “Continue Watching” feature reads events to know where you left off. Content analytics uses them to understand what shows are popular. The encoding team uses viewing patterns to decide which quality levels to pre-encode. One stream of events, many consumers, each extracting different value.

Real-World Scenario 3: Bank Transaction Processing When you swipe your credit card, the transaction event goes to Kafka. The authorization service reads it immediately to approve/decline (milliseconds). The fraud detection service analyzes it against patterns (seconds). The notification service sends you an alert (seconds). The accounting system records it for your statement (minutes). The regulatory compliance system archives it (batch, overnight). Each system processes the same event at its own pace for its own purpose.

Real-World Scenario 4: E-commerce Order Flow When you click “Buy Now” on Amazon, an OrderPlaced event goes to Kafka. The inventory service reserves the items. The payment service charges your card. The warehouse service queues picking. The shipping service calculates delivery dates. The email service sends confirmation. The recommendation service notes your purchase for future suggestions. All these services are decoupled - they just listen to Kafka and do their job.

Analogy

The Central Post Office for Events: Imagine a post office where letters (events) are sorted into mailboxes (topics) but never destroyed. Subscribers can pick up mail from any mailbox, read as fast or slow as they want, and even request to read letters from weeks ago. New subscribers can start reading from any point in history. The post office handles millions of letters per day and never loses one.

The Sports News Wire Service: Think of how sports news works - every score, injury, trade, and quote is published to a central wire service (like AP or Reuters). Every sports website, app, TV channel, and newspaper subscribes to this wire and processes the news according to their needs. Some react instantly (live scores), some curate (highlight shows), some archive (statistical databases). Kafka is this wire service for application events.

The City’s Water System: Water flows from a reservoir (producers) through main pipes (Kafka topics) to many buildings (consumers). Each building taps the same water supply but uses it differently - restaurants cook with it, offices cool servers, homes drink it. The water system handles massive volume, and if one building has a plumbing issue, others aren’t affected. Kafka is the data plumbing for modern applications.

The Airport Departure Board: In an airport, flight information (events) is published to a central system. The departure boards consume and display it. Airlines use it for crew scheduling. Ground services use it for gate assignments. Passengers check it on the app. One source of truth, many consumers with different needs. If a consumer system goes down, the data isn’t lost - it can catch up when it recovers.

Diagram

flowchart TB
    subgraph Producers
        P1[Order Service]
        P2[User Service]
        P3[Payment Service]
    end

    subgraph Kafka Cluster
        subgraph Topic: orders
            Part0[Partition 0
offset: 0,1,2...] Part1[Partition 1
offset: 0,1,2...] Part2[Partition 2
offset: 0,1,2...] end end subgraph Consumer Group A CG1A[Consumer 1] CG2A[Consumer 2] end subgraph Consumer Group B CG1B[Consumer 1] end P1 -->|key: user-123| Part0 P2 -->|key: user-456| Part1 P3 -->|key: user-789| Part2 Part0 --> CG1A Part1 --> CG1A Part2 --> CG2A Part0 --> CG1B Part1 --> CG1B Part2 --> CG1B subgraph Features F1[Messages persisted
for configurable time] F2[Each consumer group
tracks its own offset] F3[Partitions enable
parallel processing] end

Code Example


// Node.js Kafka producer
const { Kafka } = require('kafkajs');

const kafka = new Kafka({
  clientId: 'api-service',
  brokers: ['kafka1:9092', 'kafka2:9092']
});

const producer = kafka.producer();

await producer.connect();
await producer.send({
  topic: 'api-events',
  messages: [
    {
      key: 'user-123',
      value: JSON.stringify({
        event: 'api.request',
        endpoint: '/users/123',
        timestamp: Date.now()
      })
    }
  ]
});

Security Notes

SECURITY NOTES

CRITICAL: Apache Kafka is message broker. Security requires authentication, encryption, and authorization.

Kafka Architecture:

  • Brokers: Cluster nodes storing messages
  • Topics: Named message streams
  • Producers: Clients sending messages
  • Consumers: Clients reading messages
  • Partitions: Distributed message storage

Security Mechanisms:

  • SASL: Authentication mechanism (PLAIN, SCRAM, Kerberos)
  • TLS: Encryption for broker communication
  • ACLs: Access control lists per user
  • Authorization: Fine-grained per topic/operation

Best Practices:

  • Enable authentication: Require SASL for all connections
  • Enable encryption: Use TLS for all communication
  • ACL enforcement: Implement strict ACL policies
  • Monitoring: Monitor broker access and operations
  • Secrets management: Secure credential storage

Standards & RFCs