Load Balancing
Fly Proxy routes requests to individual Machines in your apps using a combination of concurrency settings specified on your app, current load, and closeness. This page describes the details of load balancing traffic to your apps on Fly.io.
Load balancing strategy
The basic load balancing strategy is:
- Send traffic to the least loaded, closest Machine
- If multiple Machines have identical load and closeness, randomly choose one
Load
Fly Proxy determines load using the concurrency settings configured for an app and the current traffic relative to those settings.
The table below describes how traffic may or may not be routed to a Machine based on configured soft_limit
and hard_limit
values.
Machine load | What happens |
---|---|
Above hard_limit |
No new traffic will be sent to the Machine |
At or above soft_limit , below hard_limit |
Traffic will only be sent to this Machine if all other Machines are also above their soft_limit |
Below soft_limit |
Traffic will be sent to the Machine when it is the closest Machine that is under soft_limit |
Closeness
Closeness is determined by RTT (round-trip time) between the Fly.io edge server receiving a connection or request, and the worker server where your Machine runs. Even within the same region, we use different datacenters with different RTTs. These RTTs are measured constantly between all servers.
You can observe live RTT values between Fly.io regions using our RTT app.
Example of load balancing for a web service
We have a hypothetical web service that we know can handle 25 concurrent requests with the configured CPU and memory settings. We set the following values in our fly.toml:
[services.concurrency]
type = "requests"
hard_limit = 25
soft_limit = 20
We set type = "requests"
so Fly.io will use concurrent HTTP requests to determine when to adjust load. We prefer this to type = "connections"
, because our web service does work for each request and our users may make multiple requests over a single connection (e.g., with HTTP/2). Fly Proxy will also pool connections to a Machine for a short time (about 4 seconds) when using type = "requests"
to avoid frequent opening and closing of connections to your app.
We set the soft_limit
to 20, so we have a little room for Fly Proxy to shift load to other Machines before a single Machine becomes overwhelmed.
We deploy 10 Machines in four regions: ams
(Amsterdam), bom
(Mumbai), sea
(Seattle), and sin
(Singapore), with three of those in ams
.
In this contrived example, all of the users are currently in Amsterdam, so the traffic is arriving at one of the Fly.io edges in Amsterdam. Here’s what happens as the number of concurrent HTTP requests from users in Amsterdam increases:
- 60 concurrent requests: Requests are divided evenly between the 3 Machines in the
ams
region. Closeness of the worker and edge will determine which of the 3 Machines each request goes to. - 61+ concurrent requests: 60 of those requests will be sent to the 3 Machines in the
ams
region and the rest will be sent to the closest Machines in other regions. - 200+ concurrent requests: All Machines are at their
soft_limit
, so Fly Proxy will start routing requests to theams
Machines again. For example, the 201st concurrent request will go to a Machine in theams
region that is currently at itssoft_limit
. - 250 concurrent requests: All Machines are at their
hard_limit
. The 251st concurrent request will get queued by Fly Proxy until a Machine is below itshard_limit
.
If traffic is far above the hard_limit
for a long period of time, Fly Proxy might start returning 503 Service Unavailable responses for requests that are not able to be routed to a Machine.