no healthy upstream
error message (with HTTP status code 503
). This means that there are currently no live replicas running for your API. This could happen for a few reasons:cortex get API_NAME
, and inspect the logs in CloudWatch with the help of cortex logs API_NAME
.cortex describe API_NAME
will show the number of replicas that have failed to start on your API, and you can view the logs for all replicas by visiting the CloudWatch Insights URL from cortex logs API_NAME
.{"message":"Service Unavailable"}
error message (with HTTP status code 503
) after 29 seconds if your request exceeds API Gateway's 29 second timeout. If this is the case, you can either modify your code to take less time, run on faster hardware (e.g. GPUs), or don't use API Gateway (there is no timeout when using the API's endpoint directly).cortex describe API_NAME
), there are a few possible causes. Here are some things to check:cortex logs API_NAME
for a URL to view logs for your API in CloudWatch. In addition to output from your containers, you will find logs from other parts of the Cortex infrastructure that may help your troubleshooting.max_instances
for your clustermax_instances
for each node group that you specified (via the cluster configuration file, e.g. cluster.yaml
). If your cluster already has min_instances
running instances for a given node group, additional instances cannot be created and APIs may not be able to deploy, scale, or update.max_instances
for the selected node group by running cortex cluster info --config cluster.yaml
(or cortex cluster info --name <CLUSTER-NAME> --region <CLUSTER-REGION>
if you have the name and region of the cluster).max_instances
field by following the instructions to update an existing cluster.instance_distribution
or changing the cluster's region to one that has a higher availability.max_instances
to 1, or your AWS account limits you to a single g4dn.xlarge
instance (i.e. your G instance vCPU limit is 4). You have an API running which requested 1 GPU. When you update your API via cortex deploy
, Cortex attempts to deploy the updated version, and will only take down the old version once the new one is running. In this case, since there is no GPU available on the single running instance (it's taken by the old version of your API), the new version of your API requests a new instance to run on. Normally this will be ok (it might just take a few minutes since a new instance has to spin up): the new instance will become live, the new API replica will run on it, once it starts up successfully the old replica will be terminated, and eventually the old instance will spin down. In this case, however, the new version gets stuck because the second instance cannot be created, and the first instance cannot be freed up until the new version is running.max_surge
to 0 in the update_strategy
section, E.g.: