Autoscale based on metrics
The metrics-based autoscaler scales an app’s Machines based on any metric, such as pending work items or queue depth. Scaling on metrics other than requests or connections is useful for apps, like background workers, that aren’t running web services. Apps with services that need to scale based on HTTP requests can use the built-in Fly Proxy autostop/autostart feature for Machines.
The autoscaler works by collecting metrics from different sources, such as Prometheus or Temporal, and computing the number of required Machines based on those metrics. This reconciliation process happens on a loop every 15 seconds by default. The autoscaler uses the Expr language for defining target Machine counts, which gives a rich set of built-in arithmetic functions.
You run the autoscaler as an app based on the fly-autoscaler
image. The app runs within your organization so you have full control over it. You can customize the autoscaler to work with your specific scaling needs.
Quickstart
To get up and running, you’ll set up the app, configure secrets, set the configuration, and deploy the autoscaler.
As a prerequisite, you need an existing target application that you want to scale and a user-defined metric to scale on. In this example, you’ll scale on a Prometheus metric called queue_depth
but you can replace that with your own. You can also scale based on Temporal workflows.
Create the autoscaler application
First, create a new Fly.io app that will run the autoscaler. Replace the my-autoscaler
name with a unique name for your autoscaler application:
fly apps create my-autoscaler
Create a deploy token
The first auth token you’ll need is one that has permissions to deploy your target app:
fly tokens create deploy -a my-target-app
Copy the resulting token and set it as a secret on your autoscaler app:
fly secrets set -a my-autoscaler --stage FAS_API_TOKEN="FlyV1 ..."
Create a token to read from Prometheus
The next auth token you’ll need is one that has permissions to read from your organization’s Prometheus data on Fly:
fly tokens create readonly -o my-org
Copy the token and use it as a secret on your autoscaler app:
fly secrets set -a my-autoscaler --stage FAS_PROMETHEUS_TOKEN="FlyV1 ..."
Configure your autoscaler fly.toml
Next, set up a fly.toml
configuration file for your autoscaler to set environment variables. Replace the my-autoscaler
, my-target-app
, and my-org
with values for your situation.
app = "my-autoscaler"
[build]
image = "flyio/fly-autoscaler:0.3"
[env]
FAS_PROMETHEUS_ADDRESS = "https://api.fly.io/prometheus/my-org"
FAS_PROMETHEUS_METRIC_NAME = "qdepth"
FAS_PROMETHEUS_QUERY = "sum(queue_depth{app='$APP_NAME'})"
FAS_APP_NAME = "my-target-app"
FAS_CREATED_MACHINE_COUNT = "min(50, qdepth / 2)"
[metrics]
port = 9090
path = "/metrics"
Environment variables are the primary way to define your configuration.
Metrics collection settings:
FAS_PROMETHEUS_ADDRESS
defines the Prometheus URL endpoint to query.FAS_PROMETHEUS_METRIC_NAME
defines the local variable name that the metric result will be stored as. This example stores the query result value asqdepth
.FAS_PROMETHEUS_QUERY
defines the Prometheus query to run. This example computes the sum of a user-definedqueue_depth
metric.
Autoscaling settings:
FAS_APP_NAME
is the name of the target application to scale.FAS_CREATED_MACHINE_COUNT
defines an Expr expression to calculate the required number of Machines. The autoscaler creates or destroys Machines to reach the required number.
This example expression assumes that each Machine could handle two items in the queue and uses the min()
function to prevent the autoscaler from scaling more than 50
Machines.
Deploy the autoscaler
The autoscaler only works on a single Machine, so you’ll use the --ha
option to turn off the high availability feature that creates two Machines:
fly deploy --ha=false
After the autoscaler deploys, you should see the number of Machines in your target application increase as your user-defined queue_depth
gauge increases.
Note: The autoscaler creates new Machines in an application by cloning existing Machines. It will not scale to zero and will always keep at least one Machine running.
You can find a full working example of the autoscaler at our fly-autoscaler-example repo.
More use cases
Start and stop instead of create and destroy Machines
If you already have a pool of created Machines that you want to autoscale, you can use the FAS_STARTED_MACHINE_COUNT
expression to stop and start Machines instead of creating and destroying them with FAS_CREATED_MACHINE_COUNT
.
When you use FAS_STARTED_MACHINE_COUNT
, the autoscaler sends a termination signal to the Machines instead of destroying them when scaling. It will also automatically cap the number of Machines that can be started to the number of pre-created Machines.
When you scale by starting and stopping existing Machines, your Machines will start up quickly. You’ll pay for the Machine’s CPU and RAM when they’re running and for the rootfs when they’re stopped.
When you scale by creating and destroying Machines, your Machines will be slightly slower to reach a started
state since it takes longer to create a Machine than to start one. You won’t need to create a “pool” of Machines. You’ll only pay for the Machine’s CPU and RAM when they’re running and won’t need to pay for rootfs since the Machines are destroyed when not needed.
Scale multiple applications
You can scale multiple independent applications with the same autoscaler by using a wildcard expression for your application name. Your applications must all share a common prefix and they must all be in the same organization.
To enable multi-app scaling, you will need to use an organization-wide auth token rather than an app-specific deploy token:
fly tokens create org -o my-org
and then set the resulting token on your autoscaler application:
fly secrets set -a my-autoscaler --stage FAS_API_TOKEN="FlyV1 ..."
Next, set the organization name and application wildcard in your fly.toml
config:
[env]
FAS_ORG="my-org"
FAS_APP_NAME="my-app-*"
Then use $APP_NAME
or ${APP_NAME}
in your Prometheus query to identify the current application being scaled:
[env]
FAS_PROMETHEUS_QUERY = "sum(queue_depth{app='$APP_NAME'})"
You can find a working example of multi-application scaling in the fly-autoscaler-multiapp-example repository.
Scale based on pending Temporal work
The Temporal metrics collector periodically checks for the total number of workflows in a “running” state. By default, it will check every 15 seconds.
You can connect to your Temporal namespace using the FAS_TEMPORAL_
environment variables. For example:
[env]
FAS_TEMPORAL_ADDRESS = 'mynamespace.lyeth.tmprl.cloud:7233'
FAS_TEMPORAL_NAMESPACE = 'mynamespace.lyeth'
FAS_TEMPORAL_METRIC_NAME = 'queue_depth'
FAS_APP_NAME = "my-target-app"
FAS_CREATED_MACHINE_COUNT="workflow_count / 10"
In the example above the autoscaler is set up to create or destroy Machines for an app that can handle up to 10 workflows at a time (based on the current workflow count) with: FAS_CREATED_MACHINE_COUNT="workflow_count / 10
. If you want to ensure you don’t exceed a specific number of Machines, then you can use a min()
expression to cap it: FAS_CREATED_MACHINE_COUNT="min(50, workflow_count / 10)
. This ensures that no more than 50 Machines get created, regardless of how many workflows are executing.
You’ll also need to load the certificate and key data as secrets from your ca.pem
and ca.key
files:
fly secrets set --stage FAS_TEMPORAL_CERT_DATA="$(<ca.pem)"
fly secrets set --stage FAS_TEMPORAL_KEY_DATA="$(<ca.key)"
You can find a full working example of Temporal autoscaling in our fly-autoscaler-temporal-example repository.
Configuration reference
The quickstart describes how to configure the autoscaler with environment variables. You can also configure the autoscaler with a YAML config file if you don’t want to use environment variables or if you want to configure more than one metric collector.
See the reference fly-autoscaler.yml file for an example and more details.
Autoscaler config
FAS_APP_NAME
: The name of the target app to scale.FAS_CREATED_MACHINE_COUNT
: An Expr expression to calculate the required number of Machines. The autoscaler creates or destroys Machines to reach the required number.FAS_STARTED_MACHINE_COUNT
: An Expr expression to calculate the required number of Machines. The autoscaler starts or stops Machines to reach the required number.
Prometheus collector
FAS_PROMETHEUS_ADDRESS
: The URL of the Prometheus endpoint to query.FAS_PROMETHEUS_METRIC_NAME
: The local variable name that the metric result will be stored as.FAS_PROMETHEUS_QUERY
: The Prometheus query to run.
Temporal collector
FAS_TEMPORAL_ADDRESS
: The URL of the Temporal endpoint to query.FAS_TEMPORAL_METRIC_NAME
: The local variable name that the metric result will be stored as.FAS_TEMPORAL_NAMESPACE
: The Temporal namespace name.FAS_TEMPORAL_CERT_DATA
: The namespace CA certificate data (ca.pem
).FAS_TEMPORAL_KEY_DATA
: The namespace CA key data (ca.key
).