Autoscale based on metrics

The metrics-based autoscaler scales an app’s Machines based on any metric, such as pending work items or queue depth. Scaling on metrics other than requests or connections is useful for apps, like background workers, that aren’t running web services. Apps with services that need to scale based on HTTP requests can use the built-in Fly Proxy autostop/autostart feature for Machines.

The autoscaler works by collecting metrics from different sources, such as Prometheus or Temporal, and computing the number of required Machines based on those metrics. This reconciliation process happens on a loop every 15 seconds by default. The autoscaler uses the Expr language for defining target Machine counts, which gives a rich set of built-in arithmetic functions.

You run the autoscaler as an app based on the fly-autoscaler image. The app runs within your organization so you have full control over it. You can customize the autoscaler to work with your specific scaling needs.

Quickstart

To get up and running, you’ll set up the app, configure secrets, set the configuration, and deploy the autoscaler.

As a prerequisite, you need an existing target application that you want to scale and a user-defined metric to scale on. In this example, you’ll scale on a Prometheus metric called queue_depth but you can replace that with your own. You can also scale based on Temporal workflows.

Create the autoscaler application

First, create a new Fly.io app that will run the autoscaler. Replace the my-autoscaler name with a unique name for your autoscaler application:

fly apps create my-autoscaler

Create a deploy token

The first auth token you’ll need is one that has permissions to deploy your target app:

fly tokens create deploy -a my-target-app

Copy the resulting token and set it as a secret on your autoscaler app:

fly secrets set -a my-autoscaler --stage FAS_API_TOKEN="FlyV1 ..."

Create a token to read from Prometheus

The next auth token you’ll need is one that has permissions to read from your organization’s Prometheus data on Fly:

fly tokens create readonly -o my-org

Copy the token and use it as a secret on your autoscaler app:

fly secrets set -a my-autoscaler --stage FAS_PROMETHEUS_TOKEN="FlyV1 ..."

Configure your autoscaler fly.toml

Next, set up a fly.toml configuration file for your autoscaler to set environment variables. Replace the my-autoscaler, my-target-app, and my-org with values for your situation.

app = "my-autoscaler"

[build]
image = "flyio/fly-autoscaler:0.3"

[env]
FAS_PROMETHEUS_ADDRESS = "https://api.fly.io/prometheus/my-org"
FAS_PROMETHEUS_METRIC_NAME = "qdepth"
FAS_PROMETHEUS_QUERY = "sum(queue_depth{app='$APP_NAME'})"

FAS_APP_NAME = "my-target-app"
FAS_CREATED_MACHINE_COUNT = "min(50, qdepth / 2)"

[metrics]
port = 9090
path = "/metrics"

Environment variables are the primary way to define your configuration.

Metrics collection settings:

  • FAS_PROMETHEUS_ADDRESS defines the Prometheus URL endpoint to query.
  • FAS_PROMETHEUS_METRIC_NAME defines the local variable name that the metric result will be stored as. This example stores the query result value as qdepth.
  • FAS_PROMETHEUS_QUERY defines the Prometheus query to run. This example computes the sum of a user-defined queue_depth metric.

Autoscaling settings:

  • FAS_APP_NAME is the name of the target application to scale.
  • FAS_CREATED_MACHINE_COUNT defines an Expr expression to calculate the required number of Machines. The autoscaler creates or destroys Machines to reach the required number.

This example expression assumes that each Machine could handle two items in the queue and uses the min() function to prevent the autoscaler from scaling more than 50 Machines.

Deploy the autoscaler

The autoscaler only works on a single Machine, so you’ll use the --ha option to turn off the high availability feature that creates two Machines:

fly deploy --ha=false

After the autoscaler deploys, you should see the number of Machines in your target application increase as your user-defined queue_depth gauge increases.

Note: The autoscaler creates new Machines in an application by cloning existing Machines. It will not scale to zero and will always keep at least one Machine running.

You can find a full working example of the autoscaler at our fly-autoscaler-example repo.

More use cases

Start and stop instead of create and destroy Machines

If you already have a pool of created Machines that you want to autoscale, you can use the FAS_STARTED_MACHINE_COUNT expression to stop and start Machines instead of creating and destroying them with FAS_CREATED_MACHINE_COUNT.

When you use FAS_STARTED_MACHINE_COUNT, the autoscaler sends a termination signal to the Machines instead of destroying them when scaling. It will also automatically cap the number of Machines that can be started to the number of pre-created Machines.

When you scale by starting and stopping existing Machines, your Machines will start up quickly. You’ll pay for the Machine’s CPU and RAM when they’re running and for the rootfs when they’re stopped.

When you scale by creating and destroying Machines, your Machines will be slightly slower to reach a started state since it takes longer to create a Machine than to start one. You won’t need to create a “pool” of Machines. You’ll only pay for the Machine’s CPU and RAM when they’re running and won’t need to pay for rootfs since the Machines are destroyed when not needed.

Scale multiple applications

You can scale multiple independent applications with the same autoscaler by using a wildcard expression for your application name. Your applications must all share a common prefix and they must all be in the same organization.

To enable multi-app scaling, you will need to use an organization-wide auth token rather than an app-specific deploy token:

fly tokens create org -o my-org

and then set the resulting token on your autoscaler application:

fly secrets set -a my-autoscaler --stage FAS_API_TOKEN="FlyV1 ..."

Next, set the organization name and application wildcard in your fly.toml config:

[env]
FAS_ORG="my-org"
FAS_APP_NAME="my-app-*"

Then use $APP_NAME or ${APP_NAME} in your Prometheus query to identify the current application being scaled:

[env]
FAS_PROMETHEUS_QUERY = "sum(queue_depth{app='$APP_NAME'})"

You can find a working example of multi-application scaling in the fly-autoscaler-multiapp-example repository.

Scale based on pending Temporal work

The Temporal metrics collector periodically checks for the total number of workflows in a “running” state. By default, it will check every 15 seconds.

You can connect to your Temporal namespace using the FAS_TEMPORAL_ environment variables. For example:

[env]
FAS_TEMPORAL_ADDRESS = 'mynamespace.lyeth.tmprl.cloud:7233'
FAS_TEMPORAL_NAMESPACE = 'mynamespace.lyeth'
FAS_TEMPORAL_METRIC_NAME = 'queue_depth'

FAS_APP_NAME = "my-target-app"
FAS_CREATED_MACHINE_COUNT="workflow_count / 10"

In the example above the autoscaler is set up to create or destroy Machines for an app that can handle up to 10 workflows at a time (based on the current workflow count) with: FAS_CREATED_MACHINE_COUNT="workflow_count / 10. If you want to ensure you don’t exceed a specific number of Machines, then you can use a min() expression to cap it: FAS_CREATED_MACHINE_COUNT="min(50, workflow_count / 10). This ensures that no more than 50 Machines get created, regardless of how many workflows are executing.

You’ll also need to load the certificate and key data as secrets from your ca.pem and ca.key files:

fly secrets set --stage FAS_TEMPORAL_CERT_DATA="$(<ca.pem)"
fly secrets set --stage FAS_TEMPORAL_KEY_DATA="$(<ca.key)"

You can find a full working example of Temporal autoscaling in our fly-autoscaler-temporal-example repository.

Configuration reference

The quickstart describes how to configure the autoscaler with environment variables. You can also configure the autoscaler with a YAML config file if you don’t want to use environment variables or if you want to configure more than one metric collector.

See the reference fly-autoscaler.yml file for an example and more details.

Autoscaler config

  • FAS_APP_NAME: The name of the target app to scale.
  • FAS_CREATED_MACHINE_COUNT: An Expr expression to calculate the required number of Machines. The autoscaler creates or destroys Machines to reach the required number.
  • FAS_STARTED_MACHINE_COUNT: An Expr expression to calculate the required number of Machines. The autoscaler starts or stops Machines to reach the required number.

Prometheus collector

  • FAS_PROMETHEUS_ADDRESS: The URL of the Prometheus endpoint to query.
  • FAS_PROMETHEUS_METRIC_NAME: The local variable name that the metric result will be stored as.
  • FAS_PROMETHEUS_QUERY: The Prometheus query to run.

Temporal collector

  • FAS_TEMPORAL_ADDRESS: The URL of the Temporal endpoint to query.
  • FAS_TEMPORAL_METRIC_NAME: The local variable name that the metric result will be stored as.
  • FAS_TEMPORAL_NAMESPACE: The Temporal namespace name.
  • FAS_TEMPORAL_CERT_DATA: The namespace CA certificate data (ca.pem).
  • FAS_TEMPORAL_KEY_DATA: The namespace CA key data (ca.key).