Fly GPUs quickstart
You can use any base image for your Dockerfile, but it is convenient to base it on
ubuntu:22.04and install libraries from NVIDIA’s official apt repository:RUN apt install -y cuda-nvcc-12-2 libcublas-12-2 libcudnn8is usually enough.Notes:- Do not install meta packages like:
cuda-runtime-* cuda-libraries-12-2is good, but a bulky start. Once you know what libs are needed at build and runtime, please pick accordingly to optimize final image size.- Use multi-stage docker builds as much as possible.
- Do not install meta packages like:
From flyctl, create an app using either
fly launchorfly apps create.Note: GPUs are not available in all regions. There are these GPU types available: Nvidia A10, L40S, A100-PCIe-40GB, and A100-SXM4-80GB.
Currently GPUs are available in the following regions:a10:ordl40s:orda100-40gb:orda100-80gb:iad,sjc,syd,ams
Create or modify the
fly.tomlconfig file in the project source directory, replacing values with your own:app = "my-gpu-app" primary_region = "ord" vm.size = "a100-40gb" # Use a volume to store LLMs or any big file that doesn't fit in a Docker image [[mounts]] source = "data" destination = "/data" [http_service] internal_port = 8080 auto_stop_machines = falseNotes:- Make sure
vm.sizeis set infly.toml, valid values area10,l40s,a100-40gbanda100-80gb. - Make sure to include a
[[mounts]]section infly.toml. - The volume gets created automatically by
fly deploy. - Use the volume to store the models and large files that can’t be shipped as a docker image.
- Make sure
Deploy your app:
fly deploy
That’s pretty much it to get an app running with a Machine on a GPU.
Volumes and GPU Machines
Important: If you create any additional volumes, they need to be created with the same constraints as your Machine.
Here’s an example of creating a new one hundred gigabyte volume for storing ML models in the ord region, on a machine with a GPU:
fly volumes create models \
--size 100 \
--vm-gpu-kind a100-40gb \
--region ord
Example Dockerfile:
FROM ubuntu:22.04 as base
RUN apt update -q && apt install -y ca-certificates wget && \
wget -qO /cuda-keyring.deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb && \
dpkg -i /cuda-keyring.deb && apt update -q
FROM base as builder
RUN apt install -y --no-install-recommends git cuda-nvcc-12-2
RUN git clone --depth=1 https://github.com/nvidia/cuda-samples.git /cuda-samples
RUN cd /cuda-samples/Samples/1_Utilities/deviceQuery && \
make && install -m 755 deviceQuery /usr/local/bin
FROM base as runtime
#RUN apt install -y --no-install-recommends libcudnn8 libcublas-12-2
COPY --from=builder /usr/local/bin/deviceQuery /usr/local/bin/deviceQuery
CMD ["sleep", "inf"]
Examples using Fly GPUs
- Elixir Llama2-13b on Fly GPUs: https://gist.github.com/chrismccord/59a5e81f144a4dfb4bf0a8c3f2673131
- Github fly-apps repos with the
gputopic: https://github.com/orgs/fly-apps/repositories?q=topic%3Agpu - Fly.io CUDA example: https://gist.github.com/dangra/f8123001fe0f2453a8cd638b89738465
- Deploying CLIP on Fly.io: https://gist.github.com/simonw/52c7734e34cac2b26ea1378845674edc