Compute resource requests in Cortex follow the syntax and meaning of compute resources in Kubernetes.
- kind: model...compute:cpu: "2"mem: "1Gi"gpu: 1
CPU and memory requests in Cortex correspond to compute resource requests in Kubernetes. In the example above, the training job will only be scheduled once 2 CPUs and 1Gi of memory are available, and the job will be guaranteed to have access to those resources throughout it's execution. In some cases, a Cortex compute resource request can be (or may default to)
One unit of CPU corresponds to one virtual CPU on AWS. Fractional requests are allowed, and can be specified as a floating point number or via the "m" suffix (
200m are equivalent).
One unit of memory is one byte. Memory can be expressed as an integer or by using one of these suffixes:
T (or their power-of two counterparts:
Ti). For example, the following values represent roughly the same memory:
One unit of GPU corresponds to one virtual GPU on AWS. Fractional requests are not allowed. Here's some information on adding GPU enabled nodes on EKS.
We recommend using GPU compute requests on API resources only if you have enough nodes in your cluster to support the number of GPU requests in model training plus APIs (ideally with an autoscaler). Otherwise, due to the nature of zero downtime rolling updates, your model training will not have sufficient GPU resources as there will always be GPUs consumed by APIs from the previous deployment.