Fly.io makes it easy to spin up compute around the world, now including powerful GPUs. Unlock the power of large language models, text transcription, and image generation with our datacenter-grade muscle!
GPUs are now available to everyone!
We know you’ve been excited about wanting to use GPUs on Fly.io and we’re happy to announce that they’re available for everyone. If you want, you can spin up GPU instances with any of the following cards:
- Ampere A100 (40GB)
a100-40gb
- Ampere A100 (80GB)
a100-80gb
- Lovelace L40s (48GB)
l40s
To use a GPU instance today, change the vm.size
for one of your apps or processes to any of the above GPU kinds. Here’s how you can spin up an Ollama server in seconds:
app = "your-app-name"
region = "ord"
vm.size = "l40s"
[http_service]
internal_port = 11434
force_https = false
auto_stop_machines = true
auto_start_machines = true
min_machines_running = 0
processes = ["app"]
[build]
image = "ollama/ollama"
[mounts]
source = "models"
destination = "/root/.ollama"
initial_size = "100gb"
Deploy this and bam, large language model inferencing from anywhere. If you want a private setup, see the article Scaling Large Language Models to zero with Ollama for more information. You never know when you have a sandwich emergency and don’t know what you can make with what you have on hand.
We are working on getting some lower-cost A10 GPUs in the next few weeks. We’ll update you when they’re ready.
If you want to explore the possibilities of GPUs on Fly.io, here’s a few articles that may give you ideas:
- Deploy Your Own (Not) MidJourney Bot On Fly GPUs
- Scaling Large Language Models to zero with Ollama
- Transcribing on Fly GPU Machines
Depending on factors such as your organization’s age and payment history, you may need to go through additional verification steps.
If you’ve been experimenting with Fly.io GPUs and have made something cool, let us know on the Community Forums or by mentioning us on Mastodon! We’ll boost the cool ones.