Kafka Simply Explained

Real Life Use Case

flowchart LR Hitesh -->|Book Ride| UberTopic Rami -->|Send Location| UberTopic subgraph Kafka_Broker Partition0 Partition1 Partition2 end UberTopic --> Partition0 UberTopic --> Partition1 UberTopic --> Partition2 subgraph Uber_Backend_Services MatchingService PricingService NotificationService end Partition0 --> MatchingService Partition1 --> PricingService Partition2 --> NotificationService

1. What is Kafka?

Apache Kafka is a distributed system used to send, store, and process events in real-time.

Core idea: Producer sends → Kafka stores → Consumer reads

flowchart LR Producer --> Topic subgraph Broker Partition1 Partition2 Partition3 end Topic --> Partition1 Topic --> Partition2 Topic --> Partition3 Partition1 --> Consumer1 Partition2 --> Consumer2 Partition3 --> Consumer1

2. Core Roles

User (A) — uses the app, never touches Kafka directly
Producer (API) — sends events to Kafka
Kafka — stores and distributes events
Consumer (B) — processes events and responds to the user

3. End-to-End Flow

flowchart LR A[User A] --> API[Producer API] API --> Kafka Kafka --> B[Consumer B] B --> Response[Response to A]

Key points:

A does not talk to Kafka directly
B reads from Kafka and sends the result back to A

4. Event Structure

Kafka events can carry full data (preferred) or just a reference ID.

Full data approach:

{
  "rideId": "R1",
  "userId": "A",
  "pickup": "LocationX"
}

ID-only approach:

{
  "rideId": "R1"
}

Real systems prefer full data for event-driven design.

5. Topics and Partitions

flowchart LR Topic[ride_requests] --> P0[Partition 0] Topic --> P1[Partition 1] Topic --> P2[Partition 2]

Topic — a named category (e.g., ride_requests)
Partition — a unit of parallelism within a topic

Partitions define how much Kafka can scale.

6. Partition Control

You control the number of partitions and the key used for routing.

flowchart LR Producer --> Key["hash(userId)"] Key --> P0[Partition 0] Key --> P1[Partition 1]

key = "A" // userId

Kafka applies hash(key) to determine which partition receives the message. If no key is provided, Kafka uses round-robin distribution with no ordering guarantee.

7. Consumer Groups

Each partition is assigned to exactly one consumer within a group.

flowchart LR P0[Partition 0] --> B1[Consumer B1] P1[Partition 1] --> B2[Consumer B2]

Rule: number of consumers should be less than or equal to the number of partitions.

8. Partition Assignment

Kafka's Group Coordinator handles this automatically. It assigns partitions to consumers and rebalances when the group changes.

9. Rebalancing

If a consumer goes down, Kafka redistributes its partitions to the remaining consumers automatically.

flowchart LR subgraph Before P0a[Partition 0] --> B1a[Consumer B1] P1a[Partition 1] --> B2a[Consumer B2] end subgraph After B2 dies P0b[Partition 0] --> B1b[Consumer B1] P1b[Partition 1] --> B1b end

10. Message Retention

flowchart LR Producer --> Kafka[(Kafka Log)] Kafka --> Consumer Kafka --> Retained["Retained by time/size"]

Kafka does not delete messages after consumption. Messages are retained based on time (e.g., 7 days) or size limits. Consumers track their own read position using offsets.

11. What Kafka Handles

Message storage (disk-based)
Partitioning
Replication
Load balancing
Consumer assignment

12. What You Handle

Defining topics and partition count
Writing the producer (what to send)
Writing the consumer (what to do with the data)
All business logic

13. What Kafka Does Not Do

Know about the end user
Send data to the user directly
Apply business logic
Filter messages per user

14. How the User Gets a Response

The consumer reads from Kafka, processes the event, and delivers the response to the user through WebSocket, push notification, or API callback.

flowchart LR Kafka --> B[Consumer B] B --> WS[WebSocket] B --> Push[Push Notification] B --> API[API Callback] WS --> A[User A] Push --> A API --> A

15. Mental Model

| Component | Role |

|-----------|------|

| Kafka | Data pipeline |

| Consumer (B) | Brain — applies logic |

| User (A) | Client |

16. Scale Reference (Uber-like System)

Around 100K events per second
300 to 1000 partitions
50 to 500 consumers per service
2,000 to 10,000 consumers total

Summary

Kafka is a distributed log system that stores and streams events. Consumers process events and decide how to act. Users never interact with Kafka directly.

Building Tech Startups. Experience in Full Stack Web Development & Data Engineering.

Kafka Simply Explained

Real Life Use Case

1. What is Kafka?

2. Core Roles

3. End-to-End Flow

4. Event Structure

5. Topics and Partitions

6. Partition Control

7. Consumer Groups

8. Partition Assignment

9. Rebalancing

10. Message Retention

11. What Kafka Handles

12. What You Handle

13. What Kafka Does Not Do

14. How the User Gets a Response

15. Mental Model

16. Scale Reference (Uber-like System)

Summary

Which Data Storage Format you choose? Columnar vs Row-Based File Formats

Most of things in NodeJS are streams...

How serverless functions work?

API Gateway OR AWS Lambda for creating REST APIs in AWS

Understanding What Happens When a Request Hits Node JS Server?

Kafka Simply Explained

Real Life Use Case

1. What is Kafka?

2. Core Roles

3. End-to-End Flow

4. Event Structure

5. Topics and Partitions

6. Partition Control

7. Consumer Groups

8. Partition Assignment

9. Rebalancing

10. Message Retention

11. What Kafka Handles

12. What You Handle

13. What Kafka Does Not Do

14. How the User Gets a Response

15. Mental Model

16. Scale Reference (Uber-like System)

Summary

Join the conversation