cortex.your-company.com). You will be able to upgrade Cortex versions without downtime, and you will avoid the need to updated your client code every time you migrate to a new cluster. You can find instructions for setting up a custom domain with a Route 53 hosted zone here, and instructions for updating/upgrading your cluster here.
cortexCLI or Python client.
priorityfield in node groups. You can deploy two node groups, one that is spot and another that is on-demand. Set the priority of the spot node group to be higher than the priority of the on-demand node group. This encourages the cluster-autoscaler to try to spin up instances from the spot node group first. If there are no more spot instances available, the on-demand node group will be used instead.
prometheus_instance_typeto an instance type with more memory (the default is
t3.medium, which has 4gb).
max_concurrencyis set to match the concurrency supported by your container.
max_queue_lengthto lower values if you would like to more aggressively redistribute requests to newer pods as your API scales up rather than allowing requests to linger in queues. This would mean that the clients consuming your APIs should implement retry logic with a delay (such as exponential backoff).