Ask HN: How to partition tenant data in queue for predictable performance?

3 points by acidity 9 months ago

Hello HN,

We use PubSubLite but the same can be applied to any queue system. The events are then processed by service running in K8S.

I am looking for strategies/patterns on how do folks partition their data so that each worker unit can get consistent performance with optimal CPU/memory usage.

Our data is global and temporal i.e. different regions will have peak/off peak data volume based on their time zones.

We have done performance test and we know how many events / second / partition we can process per worker units.

The issue we are currently facing is that when we get auto assigned partitions and based on randomness, we can get multiple heavy traffic partitions thus causing resource contention at a worker and causing slowdown.

Some options we have considered:

* running one worker unit per partition - this results in wastage as we have to use constant resources even in off peak time. this also results in using coordinater to make sure everybody is subscribing to correct partitions.

* group tenants into their traffic size and setup workers based on that. This has side effect of again extra maintenance on our side and bit of coordination with tenant provisioning/deprovisioning.

* order traffic to certain partitions based on tenant_id but then some partitions become too big and others dont.

What other strategies you use?