Kafka Simply Explained
Real Life Use Case
1. What is Kafka?
Apache Kafka is a distributed system used to send, store, and process events in real-time.
Core idea: Producer sends → Kafka stores → Consumer reads
2. Core Roles
- User (A) — uses the app, never touches Kafka directly
- Producer (API) — sends events to Kafka
- Kafka — stores and distributes events
- Consumer (B) — processes events and responds to the user
3. End-to-End Flow
Key points:
- A does not talk to Kafka directly
- B reads from Kafka and sends the result back to A
4. Event Structure
Kafka events can carry full data (preferred) or just a reference ID.
Full data approach:
{
"rideId": "R1",
"userId": "A",
"pickup": "LocationX"
}
ID-only approach:
{
"rideId": "R1"
}
Real systems prefer full data for event-driven design.
5. Topics and Partitions
- Topic — a named category (e.g.,
ride_requests) - Partition — a unit of parallelism within a topic
Partitions define how much Kafka can scale.
6. Partition Control
You control the number of partitions and the key used for routing.
key = "A" // userId
Kafka applies hash(key) to determine which partition receives the message. If no key is provided, Kafka uses round-robin distribution with no ordering guarantee.
7. Consumer Groups
Each partition is assigned to exactly one consumer within a group.
Rule: number of consumers should be less than or equal to the number of partitions.
8. Partition Assignment
Kafka's Group Coordinator handles this automatically. It assigns partitions to consumers and rebalances when the group changes.
9. Rebalancing
If a consumer goes down, Kafka redistributes its partitions to the remaining consumers automatically.
10. Message Retention
Kafka does not delete messages after consumption. Messages are retained based on time (e.g., 7 days) or size limits. Consumers track their own read position using offsets.
11. What Kafka Handles
- Message storage (disk-based)
- Partitioning
- Replication
- Load balancing
- Consumer assignment
12. What You Handle
- Defining topics and partition count
- Writing the producer (what to send)
- Writing the consumer (what to do with the data)
- All business logic
13. What Kafka Does Not Do
- Know about the end user
- Send data to the user directly
- Apply business logic
- Filter messages per user
14. How the User Gets a Response
The consumer reads from Kafka, processes the event, and delivers the response to the user through WebSocket, push notification, or API callback.
15. Mental Model
| Component | Role |
|-----------|------|
| Kafka | Data pipeline |
| Consumer (B) | Brain — applies logic |
| User (A) | Client |
16. Scale Reference (Uber-like System)
- Around 100K events per second
- 300 to 1000 partitions
- 50 to 500 consumers per service
- 2,000 to 10,000 consumers total
Summary
Kafka is a distributed log system that stores and streams events. Consumers process events and decide how to act. Users never interact with Kafka directly.
Join the conversation