Lesson 40: Locality Verification Lab — Proving the Partition Invariant Before You Build the Engine
The Naive Failure Mode
A developer does the obvious thing: produces driver events keyed
"drv-42"and rider events keyed"rdr-99", letsDefaultPartitionerassign partitions, ships to staging. At 100 events/sec, metrics look fine. At 3,000 events/sec, the matchmaking engine produces zero matches — not because it crashed, but because it never sees a driver and rider together.
DefaultPartitionerapplies Murmur2 to full key bytes."drv-42"→ partition 7."rdr-99"→ partition 14. Each partition is owned by a separate Kafka StreamsStreamThread. Without co-partitioned topics sharing the same key scheme, the join topology never executes. The system emits silence. Your on-call gets paged at 2am.
A subtler variant: the developer does implement a custom partitioner, but hashes the full compound key
"852a1073fffffff|3"— including the entity bucket suffix. Driver in bucket 2, rider in bucket 7, same H3 cell → different partition. Still broken. Same silence.
Silent correctness failures are categorically worse than crashes. This lesson installs a gate that makes the silence loud before it reaches production.
Why Locality Is a Hard Physical Requirement
Kafka Streams assigns StreamTasks to StreamThreads based on partition ownership. When driver-locations and rider-requests are co-partitioned — identical partition count (24), identical partition key scheme — a single StreamThread owns partition 7 of both topics simultaneously. The RocksDB state store join executes in-process, within the same JVM thread, touching the same block cache. No network hop. No remote state fetch. Pure mechanical sympathy: the data lives where the compute is.
Break co-partitioning and you pay all the overhead of a distributed system while getting none of the co-locality dividend. The StreamThread for driver-locations/p7 and the StreamThread for rider-requests/p14 will never share state — they’re isolated by design.
Compound Key Anatomy: h3CellRes5|entityId%8
Every event in the Uber-Lite system carries a compound partition key:
h3CellRes5|entityId%8
852a1073fffffff|3
The pipe delimiter is intentional. The two halves serve entirely different purposes:
h3CellRes5 — The primary partition selector. The H3 cell string at resolution 5 covers approximately 252 km². Every entity (driver or rider) within that hexagon resolves to the same cell string, regardless of their precise lat/lng or their entity ID. This is the only input to the hash function.
entityId%8 — The within-partition sort key. Buckets entities into one of 8 deterministic groups. This controls record ordering within the partition and enables O(log N) prefix scans against the RocksDB state store in Module 4. It plays zero role in partition selection.
GeoPartitionKey encapsulates this construction as a Java record:
public record GeoPartitionKey(String h3Cell, int bucket) {
public static GeoPartitionKey of(double lat, double lng, long entityId) {
String cell = H3.latLngToCellAddress(lat, lng, 5); // res-5 hexagon
int bucket = (int) (Math.abs(entityId) % 8);
return new GeoPartitionKey(cell, bucket);
}
public String toKeyString() { return h3Cell + "|" + bucket; }
public byte[] toKeyBytes() { return toKeyString().getBytes(StandardCharsets.UTF_8); }
}
The Partitioner: Hashing Only What Matters
UberLitePartitioner enforces the cell/bucket separation at the hash boundary:
int partitionFor(byte[] keyBytes, int numPartitions) {
String keyStr = new String(keyBytes, StandardCharsets.UTF_8);
int pipeIdx = keyStr.indexOf('|');
String cellStr = (pipeIdx >= 0) ? keyStr.substring(0, pipeIdx) : keyStr;
byte[] cellBytes = cellStr.getBytes(StandardCharsets.UTF_8);
return Utils.toPositive(Utils.murmur2(cellBytes)) % numPartitions;
}
Three implementation decisions that matter at scale:
Utils.murmur2 — Kafka’s own Murmur2 implementation from kafka-clients, the same function DefaultPartitioner uses for raw byte-array keys. Using any other hash (e.g., String.hashCode(), which is platform-dependent in older JVMs) silently breaks partition stability across producer restarts.
Utils.toPositive — Equivalent to hash & 0x7fffffff. Murmur2 returns a signed 32-bit integer; the modulo operation requires a non-negative dividend. Without this, keys mapping to the negative integer range would produce a negative partition index, causing IllegalArgumentException in the broker.
indexOf('|') with fallback — A key with no pipe delimiter is treated as a bare H3 cell string. This allows the partitioner to be used for rider-request topics whose keys may omit the bucket suffix in earlier pipeline stages without throwing.
The 6-Test Verification Suite
GitHub Link
https://github.com/sysdr/uber-lite-p/tree/main/lesson40/lesson-40-locality-verification
Each test targets a specific failure mode. They run in order — earlier tests are preconditions for later ones.
Test 1: H3 Cell Resolution Asserts that Times Square (40.7580, -73.9855) and Bryant Park (40.7536, -73.9832) — ~800m apart — resolve to the same H3 res-5 cell. At resolution 5, average hexagon edge length is 8.54 km. 800m is well within a single hexagon. If this test fails, the two coordinates straddle a cell boundary and the test data must be adjusted — not the partitioner.
Test 2: Core Locality Invariant The contract test. Driver key and rider key, with different entity IDs, must produce the same integer from partitionFor(). Uses Assumptions.assumeTrue(sameCell) to skip cleanly if Test 1’s precondition fails, avoiding a misleading failure cascade.
Test 3: Bucket Invariance Calls partitionFor() for all 8 buckets (entityId % 8 = 0…7) at the same coordinate. All 8 must map to the same partition. This is the primary regression guard against future developers who might “optimize” the partitioner by accidentally including the bucket suffix in the hash.
Test 4: Determinism Under Repetition Calls partitionFor() 1,000 times on the same key and asserts identical results each time. Non-determinism here indicates mutable static state or thread-unsafe initialization in the H3 JNI layer — a real failure mode when multiple tests initialize H3Core concurrently without synchronization.
Test 5: Cross-City H3 Cell Isolation Asserts that NYC (Times Square) and SF (37.7749, -122.4194) produce distinct H3 res-5 cells. At res-5 (~252 km² per hexagon), this trivially holds — but automating it catches misconfigured resolution constants (e.g., someone setting RESOLUTION = 1 to “reduce cardinality”).
Test 6: End-to-End Producer Validation Spins up a real Kafka broker via Testcontainers (confluentinc/cp-kafka:7.5.0), creates driver-locations and rider-requests with 24 partitions each, and produces one event to each topic using ProducerConfig.PARTITIONER_CLASS_CONFIG = UberLitePartitioner.class.getName(). Asserts RecordMetadata.partition() is equal for both events. This exercises the full RecordAccumulator → Sender → broker → metadata path. Tests 1–5 are pure in-process; Test 6 is the end-to-end gate.
Production Metrics: What Fails Without This Test
The production symptom: coordinator-rebalance-latency-max is nominal, process-latency-max is nominal, consumer lag is zero — but rider-match-rate reads 0.0 in Grafana. No exception is thrown. No queue backs up. The StreamThread is faithfully processing records; it just never sees co-local pairs.
Instrument UberLitePartitioner with a JMX counter on null-key events (which bypass the partitioner and round-robin, destroying locality). Watch kafka.producer:type=producer-metrics, attribute=record-send-total broken down by partition in Grafana. The distribution should mirror your H3 cell density map (dense urban cells get proportionally more traffic), not a flat uniform spread. A uniform spread means the partitioner is not executing — PARTITIONER_CLASS_CONFIG is missing or the class is not on the producer’s classpath.
Running the Lab
Prerequisites: Docker 24+, Java 21, Maven 3.9+.
bash project_setup.sh
cd lesson-40-locality-verification
mvn test
Testcontainers pulls confluentinc/cp-kafka:7.5.0 on first run (~650 MB). Subsequent runs use the cached image. Expected total test time including container startup: 20–30 seconds.
Interpreting failures:
Youtube Demo Link:
Lesson 41 Preview
The partition invariant is now machine-verified. Lesson 41 builds the matchmaking engine: a Kafka Streams
Processor APItopology that joinsdriver-locationsandrider-requestson the H3 cell key, backed by a RocksDB state store with a K-ring expansion (res-5 → k=1 neighbors viaH3.gridDisk()) to capture drivers near cell boundaries. TheUberLitePartitioneryou verified today is the production partitioner for both input topics. Module 3 ends here. Module 4 begins with state.




