Using GPUs with FKS
Fly Kubernetes is in beta and not recommended for critical production usage. To report issues or provide feedback, email us at beta@fly.io.
GPUs are available on Fly Kubernetes. Information about our GPUs can be found in our Fly GPUs documentation.
GPUs are consumed by requesting a GPU resource, similarly to requesting cpu
or memory
. There is a custom resource gpu.fly.io/<gpu type>
that is used to request
GPUs. Note that:
- The available GPU types are:
a10
,l40s
,a100-pcie-40gb
anda100-sxm4-80gb
. Check the documentation for which regions they are available in. - Pods are deployed in the same region as your cluster. You can place your Pod in a particular region by adding a
fly.io/region: <region>
annotation to your Pod’s metadata. - GPU resource requests should only specify the
limits
section. - You can specify CPU and memory resources alongside GPU resources. The minimum number of cores supported is 2 and the minimum amount of memory is 4096 MiB. The VMs deployed will always be performance Machines.
- The valid number of GPUs you can request are: 1, 2, 4 and 8.
- You can only request one type of GPU at a time
Below is an example of a Pod with a GPU:
apiVersion: v1
kind: Pod
metadata:
name: ollama
annotations:
fly.io/region: ams # optional
spec:
containers:
- name: ollama
image: ollama/ollama:latest
resources:
limits:
gpu.fly.io/a100-80gb: 1
This will deploy a GPU Machine with the size a100-80gb
with 1 GPU core in the region ams
.