Traditional approaches
Traditional approaches without SO_REUSEPORT limit the application to one accept queue for each TCP listening port, this creates a bottleneck and the thundering herd problem can cause lock contention. [[epoll#epollexclusive-flag|EPOLLEXCLUSIVE
Flag]] solved the thundering herd problem, but connections are not evenly distributed. Nginx had to re-add the socket periodically to workaround this.
Linux with EPOLLEXCLUSIVE usually notifies only the process which
was first to add the listening socket to the epoll instance. As
a result most of the connections are handled by the first worker
process. To fix this, we re-add the socket periodically, so other
workers will get a chance to accept connections.
SO_REUSEPORT
SO_REUSEPORT allows multiple sockets to listen on the same port, but an implementation problem was acknowledged when it was being merged. Closing a socket could reset connections during their 3-way handshake, so a hot reload as implemented in Nginx would lose some connections in the process even with connection draining.
In Linux 5.14, socket migration was added to address this problem.
SO_REUSEPORT locality
New connections flowing into the network stack are distributed using the usual 5-tuple hash. Packets from any of the RX queues, hitting any CPU, might flow into any of the accept queues.
But as Cloudflare says:
We weren’t able to prove definitely if improving packet locality actually improves performance for a high-level TCP application like an HTTP server. In hindsight it makes sense - the added benefit is minuscule compared to the overhead of running an HTTP server, especially with logic in a high level language like Lua.
We got reminded of the obvious - out of the box Linux is remarkably well tuned.
References
- https://lpc.events/event/11/contributions/946/attachments/783/1472/Socket_migration_for_SO_REUSEPORT.pdf
- https://www.youtube.com/watch?v=7mTH9AHVFvw
- https://blog.cloudflare.com/perfect-locality-and-three-epic-systemtap-scripts
- https://github.com/nginx/nginx/blob/145b228530c364452c14d3184f1eee5e09b324aa/src/event/ngx_event_accept.c#L321-L323
- https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c617f398edd4db2b8567a28e899a88f8f574798d