We’re Fly.io. We run apps for our users on hardware we host around the world. Fly.io happens to be a great place to run Phoenix applications. Check out how to get started!
At Fly.io, we make it super easy to spin up new apps and scale them to Machines distributed around the globe. Also, Elixir is uniquely suited to take advantage of this distributed global network. In this post I am going to show you how to use both to:
- Start a Machine anywhere on the globe.
- Execute whatever code you want on it.
- Communicate with it as if it was running locally.
All of this using only the tools provided by Elixir, the BEAM Virtual Machine and one external dependency.
Distribution
Assuming you’ve already set up clustering with your Fly.io app, and deploy your application globally, using the following command:
fly scale count 5 --region="ord,cdg,nrt,jnb,scl"
This will scale your app to 5 Machines, in Chicago, Paris, Tokyo, Johannesburg, and Santiago, covering the major continents with NA, EU, ASIA, AFRICA, and SA.
Once the deployment is complete, you now have a fully connected and globally distributed Elixir cluster. Without doing anything else, you now can communicate with any node, and any process on any node, as if it was running locally. If your application does a PubSub.subscribe(:my_app, "events")
in ORD
and an event is broadcast PubSub.broadcast(:my_app, "events", "big event!")
in JNB
it will work transparently.
Further, Fly.io handles network encryption via WireGuard, so your machines know how to communicate and securely by default.
More Machines!
Let’s say we have a very big one-off task we have to complete for our customers in ORD, but we don’t want to slow down our app. Ideally, this task runs on its own machine and reports the results back to your web server.
At Fly.io, you can create, update, and delete machines via our JSON API so let’s do that! Let’s add my favorite Library Req.
{:req, "~> 0.4.3"}
Let’s add a deployment token to our environment secrets. This will give us the access we need to create machines via API:
fly secrets set FLY_API_TOKEN=$(flyctl tokens create deploy)
Then if we want to create a machine, it’s as simple as one HTTP Request:
app_name = System.fetch_env!("FLY_APP_NAME")
token = System.fetch_env!("FLY_API_TOKEN")
image = System.fetch_env!("FLY_IMAGE_REF")
region = System.fetch_env!("FLY_REGION")
body = Req.post!("https://api.machines.dev/v1/apps/#{app_name}/machines",
auth: {:bearer, token },
json: %{
name: "#{app_name}-async-#{Enum.random(0..100000)}",
region: region,
config: %{
image: System.fetch_env!("FLY_IMAGE_REF"),
size: "shared", # Choose your size
auto_destroy: true,
env: %{
PHX_SERVER: "false", # If running phoenix
PARENT_NODE: to_string(node())
}
}
}
).body
{node_name, flyid} =
case body do
%{"private_ip" => ip, "id" => id} -> {:"#{app_name}@#{ip}", id}
error ->
Logger.error(error)
end
And that’s it, we started a machine! Breaking this code down a little bit:
app_name = System.fetch_env!("FLY_APP_NAME")
image = System.fetch_env!("FLY_IMAGE_REF")
region = System.fetch_env!("FLY_REGION")
First, we are pulling environment variables for the app name and image that’s available on every Fly Machine.
The next part is doing the HTTP Request to the Machines create API. The key bits are here:
name: "#{app_name}-async-#{Enum.random(0..100000)}",
region: region,
config: %{
image: image,
size: "performance-2x", # Choose your size
auto_destroy: true,
env: %{
PHX_SERVER: "false", # If running phoenix, skip it
PARENT_NODE: to_string(node()) # Helper if you need it
}
}
We choose the basic configuration of the app. The key bit is actually the image
setting, we need to make sure the code running on the remote Machine is the same as our current Machine. By manually selecting the image we save ourselves from running this during a deployment when the Machine image might change on us.
Next up let’s connect and execute some code on it:
# Give the process ~30 seconds to try and connect
Enum.take_while(0..30, fn _ ->
Process.sleep(1000)
not Node.connect(node_name)
end)
true = Node.alive?(node_name)
big_data = 0..100000000000000000
map_reduced = :erpc.call(node_name, fn ->
result = big_data |> MyApp.BigData.map_reducer()
PubSub.broadcast(:my_app, "events", "big data done!")
result
end, :infinity)
Machines should start quickly, but you know this is a distributed system, and it might take longer than we’d hope, so let’s simply try connecting with a sleep till it connects:
# Give the process ~30 seconds to try and connect
Enum.take_while(0..30, fn _ ->
Process.sleep(1000)
not Node.connect(node_name)
end)
# Final check to make sure we didn't fail completely
true = Node.alive?(node_name)
Finally, we simply call our anonymous function on the remote node:
big_data = 0..100000000000000000
map_reduced = :erpc.call(node_name, fn ->
result = big_data |> MyApp.BigData.map_reducer()
PubSub.broadcast(:my_app, "events", "big data done!")
result
end, :infinity)
You may have noticed we didn’t do anything at all to serialize our data to the remote node. It will just serialize the data for us, send it along, and return the result transparently. This is the magic of the BEAM, this isn’t limited to simple values either, send files, do IO, anything, the BEAM will figure it out.
Since we’re working with our existing code, we can just use it like any normal code, no special syntax or libraries. Whatever code we deploy to our normal app image we can execute and use normally. For example the PubSub we used above should just work for any node listening for events on :my_app.
When finished, use the fly_id
to destroy the Machine via API:
Req.delete!("https://api.machines.dev/v1/apps/#{app_name}/machines/#{machine_id}", auth: {:bearer, token })
Alternatively you can simply tell the Machine to shutdown like so:
:erpc.call(node_name, fn ->
System.stop()
end)
Because of auto_destroy: true
in our creation config, when the system stops the machine will destroy it’s self.
Recap
We successfully started, executed our big data code, and stopped a machine all from Elixir with only an HTTP Client. In the BEAM world we tend to take this for granted but let’s step back and list out the pieces we’d have to build or include in our running system if we did this in any other environment!
Direct Port:
- Some sort of HTTP/RPC server.
- Choose how to serialize AND deserialize our data on both ends.
- Tooling to handle errors. In our case, any error from :erpc call will propagate to the calling Machine automagically.
- Tooling to start/stop Machines:
- Deploy our code to them.
- Start the code.
- Make sure it can be discovered and communicated to securely.
Lambda/Worker Style:
- Redis/Kafka/RabbitMQ/SQS aka message broker
- Setup/Deploy/Maintain a worker
- Lambda, which is not written in Elixir and billed opaquely.
- Sidekiq/Oban, which runs continuously and can’t be scaled up/down
- Ability to monitor.
We get all of that for free thanks to the BEAM and Fly.io working together. Now obviously putting this all into production will require making engineering decisions such as:
- Is this work so important it can’t fail?
- Can I handle some downtime if the network/host fails?
- How do I handle errors?
- Durable data?
Fortunately, the BEAM has been working on these problems since the 80s and is chock-full of tools and educational content to help you make these decisions.
Conclusion
We created a one-off Machine, connected to it, executed a function, and got the result. Our Machine runs using our code in our deployment environment, no special third party code is required. Our Machine runs for only as long as it is needed and no longer keeping our spend in check. We did it with none of the usual code we’d expect to need in other languages or environments, this is incredible!
What’s next?
Soon we plan to release a library that makes all of this much easier, so if you want to run something on a new Machine it’s as simple as Task.async, keep an eye out for it! Until then, we have set up the building blocks to do it yourself.