Auto scaling
Needless to say, scaling out helps mitigate overload.
Load shedding
Indicators to consider:
- CPU utilization
- per-tenant quota and overall rate limits
- average latency
- concurrent connections
Don’t waste work
Request cancel propagation or timeouts. For fixed timeouts, you should plan for bounded work, which is based on, for example, pagination and hard limits to limit per-request workload.
Dependency isolation
The key is to compartmentalize dependencies and isolate concurrency. In AWS Lambda, for example, warm invoke and cold invoke resources are allocated separately in the internal “worker manager” microservice.
The purpose is to isolate unrelated APIs, and protect against modal behavior (e.g. cache miss on hot path).
Workload isolation also helps. Lambda execution environment allocates fixed resources for each request, so overloading becomes a problem of either Lambda quota (how many new execution environments can be created in a limited time frame) or the function’s dependencies. This is a less effective but simpler way to implement dependency isolation.
Constant work
It’s possible to apply the constant work principle to effectively avoid overload certain components in a distributed system, as shown in Hyperplane’s design.
Queue backlogs
- Time to Live (TTL) when historical data decays in value quickly as time passes, e.g. stale information.
- For systems where processing latency matters, use LIFO queues or an emulation of it with FIFO-based priority queues.
- Backpressure mechanism or fixed-rate throttling based on capacity.
- Surge queues as an alternative to backpressure or throttling.
Note: If moving a message between queues is costlier than handling it, emulating a LIFO queue is not worth it and you may consider using a surge queue instead.
Fairness in multi-tenant systems
Quota and shuffle sharding, as used in Hyperplane. You could also build feedback loops to scale or isolate the shards for the offending tenant.
For queue-based systems specifically, messages could be sent to the least crowded shard to further reduce impact of a noisy neighbor.
References
- https://www.youtube.com/watch?v=Fup5vHEvU50
- https://aws.amazon.com/builders-library/using-load-shedding-to-avoid-overload/
- https://aws.amazon.com/builders-library/avoiding-overload-in-distributed-systems-by-putting-the-smaller-service-in-control/
- https://aws.amazon.com/builders-library/dependency-isolation/
- https://web.archive.org/web/20250710022514/https://aws.amazon.com/builders-library/workload-isolation-using-shuffle-sharding/