Kafka Simply Explained




Real Life Use Case

flowchart LR Hitesh -->|Book Ride| UberTopic Rami -->|Send Location| UberTopic subgraph Kafka_Broker Partition0 Partition1 Partition2 end UberTopic --> Partition0 UberTopic --> Partition1 UberTopic --> Partition2 subgraph Uber_Backend_Services MatchingService PricingService NotificationService end Partition0 --> MatchingService Partition1 --> PricingService Partition2 --> NotificationService

1. What is Kafka?

Apache Kafka is a distributed system used to send, store, and process events in real-time.

Core idea: Producer sends → Kafka stores → Consumer reads

flowchart LR Producer --> Topic subgraph Broker Partition1 Partition2 Partition3 end Topic --> Partition1 Topic --> Partition2 Topic --> Partition3 Partition1 --> Consumer1 Partition2 --> Consumer2 Partition3 --> Consumer1

2. Core Roles

  • User (A) — uses the app, never touches Kafka directly
  • Producer (API) — sends events to Kafka
  • Kafka — stores and distributes events
  • Consumer (B) — processes events and responds to the user

3. End-to-End Flow

flowchart LR A[User A] --> API[Producer API] API --> Kafka Kafka --> B[Consumer B] B --> Response[Response to A]

Key points:

  • A does not talk to Kafka directly
  • B reads from Kafka and sends the result back to A

4. Event Structure

Kafka events can carry full data (preferred) or just a reference ID.

Full data approach:

{
  "rideId": "R1",
  "userId": "A",
  "pickup": "LocationX"
}

ID-only approach:

{
  "rideId": "R1"
}

Real systems prefer full data for event-driven design.

5. Topics and Partitions

flowchart LR Topic[ride_requests] --> P0[Partition 0] Topic --> P1[Partition 1] Topic --> P2[Partition 2]
  • Topic — a named category (e.g., ride_requests)
  • Partition — a unit of parallelism within a topic

Partitions define how much Kafka can scale.

6. Partition Control

You control the number of partitions and the key used for routing.

flowchart LR Producer --> Key["hash(userId)"] Key --> P0[Partition 0] Key --> P1[Partition 1]
key = "A" // userId

Kafka applies hash(key) to determine which partition receives the message. If no key is provided, Kafka uses round-robin distribution with no ordering guarantee.

7. Consumer Groups

Each partition is assigned to exactly one consumer within a group.

flowchart LR P0[Partition 0] --> B1[Consumer B1] P1[Partition 1] --> B2[Consumer B2]

Rule: number of consumers should be less than or equal to the number of partitions.

8. Partition Assignment

Kafka's Group Coordinator handles this automatically. It assigns partitions to consumers and rebalances when the group changes.

9. Rebalancing

If a consumer goes down, Kafka redistributes its partitions to the remaining consumers automatically.

flowchart LR subgraph Before P0a[Partition 0] --> B1a[Consumer B1] P1a[Partition 1] --> B2a[Consumer B2] end subgraph After B2 dies P0b[Partition 0] --> B1b[Consumer B1] P1b[Partition 1] --> B1b end

10. Message Retention

flowchart LR Producer --> Kafka[(Kafka Log)] Kafka --> Consumer Kafka --> Retained["Retained by time/size"]

Kafka does not delete messages after consumption. Messages are retained based on time (e.g., 7 days) or size limits. Consumers track their own read position using offsets.

11. What Kafka Handles

  • Message storage (disk-based)
  • Partitioning
  • Replication
  • Load balancing
  • Consumer assignment

12. What You Handle

  • Defining topics and partition count
  • Writing the producer (what to send)
  • Writing the consumer (what to do with the data)
  • All business logic

13. What Kafka Does Not Do

  • Know about the end user
  • Send data to the user directly
  • Apply business logic
  • Filter messages per user

14. How the User Gets a Response

The consumer reads from Kafka, processes the event, and delivers the response to the user through WebSocket, push notification, or API callback.

flowchart LR Kafka --> B[Consumer B] B --> WS[WebSocket] B --> Push[Push Notification] B --> API[API Callback] WS --> A[User A] Push --> A API --> A

15. Mental Model

| Component | Role |

|-----------|------|

| Kafka | Data pipeline |

| Consumer (B) | Brain — applies logic |

| User (A) | Client |

16. Scale Reference (Uber-like System)

  • Around 100K events per second
  • 300 to 1000 partitions
  • 50 to 500 consumers per service
  • 2,000 to 10,000 consumers total

Summary

Kafka is a distributed log system that stores and streams events. Consumers process events and decide how to act. Users never interact with Kafka directly.

Building Tech Startups. Experience in Full Stack Web Development & Data Engineering.