It’s dangerous to go alone. Fly.io runs full-stack apps by transmuting Docker containers into Fly Machines: ultra-lightweight hardware-backed VMs. You can run all your dependencies on Fly.io, but sometimes, you’ll need to work with other clouds, and we’ve made that pretty simple. Try Fly.io out for yourself; your Rails or Node app can be up and running in just minutes.
Let’s hypopulate you an app serving generative AI cat images based on the weather forecast, running on a g4dn.xlarge
ECS task in AWS us-east-1
. It’s going great; people didn’t realize how dependent their cat pic prefs are on barometric pressure, and you’re all anyone can talk about.
Word reaches Australia and Europe, but you’re not catching on, because the… latency is too high? Just roll with us here. Anyways: fixing this is going to require replicating ECS tasks and ECR images into ap-southeast-2
and eu-central-1
while also setting up load balancing. Nah.
This is the O.G. Fly.io deployment story; one deployed app, one versioned container, one command to get it running anywhere in the world.
But you have a problem: your app relies on training data, it’s huge, your giant employer manages it, and it’s in S3. Getting this to work will require AWS credentials.
You could ask your security team to create a user, give it permissions, and hand over the AWS keypair. Then you could wash your neck and wait for the blade. Passing around AWS keypairs is the beginning of every horror story told about cloud security, and security team ain’t having it.
There’s a better way. It’s drastically more secure, so your security people will at least hear you out. It’s also so much easier on Fly.io that you might never bother creating a IAM service account again.
Let’s Get It out of the Way
We’re going to use OIDC to set up strictly limited trust between AWS and Fly.io.
- In AWS: we’ll add Fly.io as an
Identity Provider
in AWS IAM, giving us an ID we can plug into any IAMRole
. - Also in AWS: we’ll create a
Role
, give it access to the S3 bucket with our tokenized cat data, and then attach theIdentity Provider
to it. - In Fly.io, we’ll take the
Role
ARN we got from step 2 and set it as an environment variable in our app.
Our machines will now magically have access to the S3 bucket.
What the What
A reasonable question to ask here is, “where’s the credential”? Ordinarily, to give a Fly Machine access to an AWS resource, you’d use fly secrets set
to add an AWS_ACCESS_KEY_ID
and AWS_SECRET_ACCESS_KEY
to the environment in the Machine. Here, we’re not setting any secrets at all; we’re just adding an ARN — which is not a credential — to the Machine.
Here’s what’s happening.
Fly.io operates an OIDC IdP at oidc.fly.io
. It issues OIDC tokens, exclusively to Fly Machines. AWS can be configured to trust these tokens, on a role-by-role basis. That’s the “secret credential”: the pre-configured trust relationship in IAM, and the public keypairs it manages. You, the user, never need to deal with these keys directly; it all happens behind the scenes, between AWS and Fly.io.
The key actor in this picture is STS
, the AWS Security Token Service
. STS
‘s main job is to vend short-lived AWS credentials, usually through some variant of an API called AssumeRole
. Specifically, in our case: AssumeRoleWithWebIdentity
tells STS
to cough up an AWS keypair given an OIDC token (that matches a pre-configured trust relationship).
That still leaves the question: how does your code, which is reaching out to the AWS APIs to get cat weights, drive any of this?
The Init Thickens
Every Fly Machine boots up into an init
we wrote in Rust. It has slowly been gathering features.
One of those features, which has been around for awhile, is a server for a Unix socket at /.fly/api
, which exports a subset of the Fly Machines API to privileged processes in the Machine. Think of it as our answer to the EC2 Instant Metadata Service. How it works is, every time we boot a Fly Machine, we pass it a Macaroon token locked to that particular Machine; init
’s server for /.fly/api
is a proxy that attaches that token to requests.
In addition to the API proxy being tricky to SSRF to.
What’s neat about this is that the credential that drives /.fly/api
is doubly protected:
- The Fly.io platform won’t honor it unless it comes from that specific Fly Machine (
flyd
, our orchestrator, knows who it’s talking to), and - Ordinary code running in a Fly Machine never gets a copy of the token to begin with.
You could rig up a local privilege escalation vulnerability and work out how to steal the Macaroon, but you can’t exfiltrate it productively.
So now you have half the puzzle worked out: OIDC is just part of the Fly Machines API (specifically: /v1/tokens/oidc
). A Fly Machine can hit a Unix socket and get an OIDC token tailored to that machine:
{
"app_id": "3671581",
"app_name": "weather-cat",
"aud": "sts.amazonaws.com",
"image": "image:latest",
"image_digest": "sha256:dff79c6da8dd4e282ecc6c57052f7cfbd684039b652f481ca2e3324a413ee43f",
"iss": "https://oidc.fly.io/example",
"machine_id": "3d8d377ce9e398",
"machine_name": "ancient-snow-4824",
"machine_version": "01HZJXGTQ084DX0G0V92QH3XW4",
"org_id": "29873298",
"org_name": "example",
"region": "yyz",
"sub": "example:weather-cat:ancient-snow-4824"
} // some OIDC stuff trimmed
Look upon this holy blob, sealed with a published key managed by Fly.io’s OIDC vault, and see that there lies within it enough information for AWS STS
to decide to issue a session credential.
We have still not completed the puzzle, because while you can probably now see how you’d drive this process with a bunch of new code that you’d tediously write, you are acutely aware that you have not yet endured that tedium — e pur si muove!
One init
feature remains to be disclosed, and it’s cute.
If, when init
starts in a Fly Machine, it sees an AWS_ROLE_ARN
environment variable set, it initiates a little dance; it:
- goes off and generates an OIDC token, the way we just described,
- saves that OIDC token in a file, and
- sets the
AWS_WEB_IDENTITY_TOKEN_FILE
andAWS_ROLE_SESSION_NAME
environment variables for every process it launches.
The AWS SDK, linked to your application, does all the rest.
Let’s review: you add an AWS_ROLE_ARN
variable to your Fly App, launch a Machine, and have it go fetch a file from S3. What happens next is:
init
detectsAWS_ROLE_ARN
is set as an environment variable.init
sends a request to/v1/tokens/oidc
via/.api/proxy
.init
writes the response to/.fly/oidc_token.
init
setsAWS_WEB_IDENTITY_TOKEN_FILE
andAWS_ROLE_SESSION_NAME
.- The entrypoint boots, and (say) runs
aws s3 get-object.
- The AWS SDK runs through the credential provider chain
- The SDK sees that
AWS_WEB_IDENTITY_TOKEN_FILE
is set and callsAssumeRoleWithWebIdentity
with the file contents. - AWS verifies the token against
https://oidc.fly.io/
example/.well-known/openid-configuration
, which references a key Fly.io manages on isolated hardware. - AWS vends
STS
credentials for the assumedRole
. - The SDK uses the
STS
credentials to access the S3 bucket. - AWS checks the
Role
’s IAM policy to see if it has access to the S3 bucket. - AWS returns the contents of the bucket object.
How Much Better Is This?
It is a lot better.
They asymptotically approach the security properties of Macaroon tokens.
Most importantly: AWS STS
credentials are short-lived. Because they’re generated dynamically, rather than stored in a configuration file or environment variable, they’re already a little bit annoying for an attacker to recover. But they’re also dead in minutes. They have a sharply limited blast radius. They rotate themselves, and fail closed.
They’re also easier to manage. This is a rare instance where you can reasonably drive the entire AWS side of the process from within the web console. Your cloud team adds Roles
all the time; this is just a Role
with an extra snippet of JSON. The resulting ARN isn’t even a secret; your cloud team could just email or Slack message it back to you.
Finally, they offer finer-grained control.
To understand the last part, let’s look at that extra snippet of JSON (the “Trust Policy”) your cloud team is sticking on the new cat-bucket
Role
:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Federated": "arn:aws:iam::123456123456:oidc-provider/oidc.fly.io/example"
},
"Action": "sts:AssumeRoleWithWebIdentity",
"Condition": {
"StringEquals": {
"oidc.fly.io/example:aud": "sts.amazonaws.com",
},
"StringLike": {
"oidc.fly.io/example:sub": "example:weather-cat:*"
}
}
}
]
}
The aud
check guarantees STS
will only honor tokens that Fly.io deliberately vended for STS
.
Recall the OIDC token we dumped earlier; much of what’s in it, we can match in the Trust Policy. Every OIDC token Fly.io generates is going to have a sub
field formatted org:app:machine
, so we can lock IAM Roles
down to organizations, or to specific Fly Apps, or even specific Fly Machine instances.
And So
In case it’s not obvious: this pattern works for any AWS API, not just S3.
Our OIDC support on the platform and in Fly Machines will set arbitrary OIDC audience
strings, so you can use it to authenticate to any OIDC-compliant cloud provider. It won’t be as slick on Azure or GCP, because we haven’t done the init
features to light their APIs up with a single environment variable — but those features are easy, and we’re just waiting for people to tell us what they need.
For us, the gold standard for least-privilege, conditional access tokens remains Macaroons, and it’s unlikely that we’re going to do a bunch of internal stuff using OIDC. We even snuck Macaroons into this feature. But the security you’re getting from this OIDC dance closes a lot of the gap between hardcoded user credentials and Macaroons, and it’s easy to use — easier, in some ways, than it is to manage role-based access inside of a legacy EC2 deployment!