Hands On Kafka

Hands On Kafka

Uber-Lite: Architecting High-Scale Geo-Spatial Matchmaking Systems

Lesson 33: The Boundary Problem

When Your Partition Strategy Becomes a Geographic Blindspot

May 28, 2026
∙ Paid

The Naive Approach: How Your System Is Already Failing

You shipped Module 3’s compound key partitioner. h3CellRes5 + "|" + entityId%8. Partition skew CV: 0.09. Co-partitioning contract: enforced. It looked clean.

Then your on-call fires at 2 AM: riders in one section of downtown are waiting 8 minutes. Match rate in that zone: 11%. Everywhere else: 94%.

The failure is geometric. A Res 5 hexagon covers ~252 km². In any dense urban grid, roughly 18% of active users sit within one Res 9 cell’s width of a Res 5 boundary — meaning their nearest drivers are assigned to a different partition. Your MatchingProcessor on partition 3 has a complete KTable of every driver in partition 3’s Res 5 cells. It has zero knowledge of three available drivers 80 meters away on partition 5. The stream-table join simply never fires for those riders.

The naive fix is catastrophic. Cross-partition scan: convert the O(1) KTable lookup into a broadcast lookup across all 8 partitions. At 3,000 riders/sec, you’re now executing 24,000 RocksDB range scans per second across the cluster. Block cache hit rate drops from 98% to 14% under constant thrash. process-rate in kafka.streams:type=stream-processor-node-metrics collapses to near zero within 3 minutes. This isn’t a “degraded” system — it’s a scheduled crash.

The boundary problem isn’t an edge case. It’s a systematic ~18% blind spot baked into your partition topology.

Preparing for a distributed systems interview?
→Download the free Interview Pack
→ Subscribe now to access source code repository - 200 + coding lessons

User's avatar

Continue reading this post for free, courtesy of Kafka.

Or purchase a paid subscription.
© 2026 SystemDR · Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture