Fly GPUs quickstart
You can use any base image for your Dockerfile, but it is convenient to base it on
ubuntu:22.04
and install libraries from NVIDIA’s official apt repository:RUN apt install -y cuda-nvcc-12-2 libcublas-12-2 libcudnn8
is usually enough.Notes:- Do not install meta packages like:
cuda-runtime-*
cuda-libraries-12-2
is good, but a bulky start. Once you know what libs are needed at build and runtime, please pick accordingly to optimize final image size.- Use multi-stage docker builds as much as possible.
- Do not install meta packages like:
From flyctl, create an app using either
fly launch
orfly apps create
.Note: GPUs are not available in all regions. There are these GPU types available: Nvidia A10, L40S, A100-PCIe-40GB, and A100-SXM4-80GB.
Currently GPUs are available in the following regions:a10
:ord
l40s
:ord
a100-40gb
:ord
a100-80gb
:ams
,iad
,mia
,sjc
,syd
Create or modify the
fly.toml
config file in the project source directory, replacing values with your own:app = "my-gpu-app" primary_region = "ord" vm.size = "a100-40gb" # Use a volume to store LLMs or any big file that doesn't fit in a Docker image [[mounts]] source = "data" destination = "/data" [http_service] internal_port = 8080 auto_stop_machines = false
Notes:- Make sure
vm.size
is set infly.toml
, valid values area10
,l40s
,a100-40gb
anda100-80gb
. - Make sure to include a
[[mounts]]
section infly.toml
. - The volume gets created automatically by
fly deploy
. - Use the volume to store the models and large files that can’t be shipped as a docker image.
- Make sure
Deploy your app:
fly deploy
That’s pretty much it to get an app running with a Machine on a GPU.
Volumes and GPU Machines
Important: If you create any additional volumes, they need to be created with the same constraints as your Machine.
Here’s an example of creating a new one hundred gigabyte volume for storing ML models in the ord
region, on a machine with a GPU:
fly volumes create models \
--size 100 \
--vm-gpu-kind a100-40gb \
--region ord
Example Dockerfile:
FROM ubuntu:22.04 as base
RUN apt update -q && apt install -y ca-certificates wget && \
wget -qO /cuda-keyring.deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb && \
dpkg -i /cuda-keyring.deb && apt update -q
FROM base as builder
RUN apt install -y --no-install-recommends git cuda-nvcc-12-2
RUN git clone --depth=1 https://github.com/nvidia/cuda-samples.git /cuda-samples
RUN cd /cuda-samples/Samples/1_Utilities/deviceQuery && \
make && install -m 755 deviceQuery /usr/local/bin
FROM base as runtime
#RUN apt install -y --no-install-recommends libcudnn8 libcublas-12-2
COPY --from=builder /usr/local/bin/deviceQuery /usr/local/bin/deviceQuery
CMD ["sleep", "inf"]
Examples using Fly GPUs
- Elixir Llama2-13b on Fly GPUs: https://gist.github.com/chrismccord/59a5e81f144a4dfb4bf0a8c3f2673131
- Github fly-apps repos with the
gpu
topic: https://github.com/orgs/fly-apps/repositories?q=topic%3Agpu - Fly.io CUDA example: https://gist.github.com/dangra/f8123001fe0f2453a8cd638b89738465
- Deploying CLIP on Fly.io: https://gist.github.com/simonw/52c7734e34cac2b26ea1378845674edc