Architecture Overview

One AWS region has many Hyperplane clusters handling slices of workload. See Design Principles for details.

Nodes

Hyperplane nodes are just regular EC2 instances.

Flow Tracker

Flow Tracker is a horizontally-scalable central database that tracks all flows going through the cluster.

The data store maintains 1:1 redundancy of the flow state. When a nodes goes down, state is replicated to other nodes.

Packet Processor

Packet processor is a horizontally-scalable layer that handles packet processing and forwarding.

Control Plane

Control plane distributes configuration updates to flow trackers and packet processors, and manages Hyperplane capacity by provisioning and de-provisioning nodes.

Test and Service Monitoring

Tests are run in each AZ continuously to monitor latency, availability and functional correctness.

Metrics are exported to CloudWatch via Kinesis.

Design Principles

Design for Failures

Fail Fast

No buffering or retries. Rely on TCP retransmission from the client.

S3 Configuration Loop

AWS Hyperplane nodes fetch, process and load configuration files from Amazon S3 every few seconds, even if nothing has changed.

The configuration file is sized to its maximum size right from the beginning to ensure the system is always processing and loading the maximum number of configuration changes.

Such that,

  1. The control plane scales independently of the data plane fleet, and a storm of configuration changes does not overload the data plane.
  2. if AWS Hyperplane nodes are lost, the amount of work in the system goes down, not up.

Shuffle Sharding

With 100 nodes, there is only 1.8% chance for 5-node random selections to overlap.

AWS also employs algorithmic measures to avoid overlap of more than half of the nodes between two tenants.

Cell-based

Hyperplane is a cellular and zonal service. This limits blast radius and prevents single point of failure.

Symmetric Flows

Flows have symmetric network path (i.e. no DSR) to enable collecting metrics from traffic in both directions on Hyperplane nodes.

Going through a Blackfoot or Hyperplane node have almost the same latency, so there is little negative impact.

References