When attempting to deploy a model to a GPU in the local environment, you may encounter NVIDIA container runtime not found. Since Cortex uses Docker to deploy APIs in the local environment, your Docker engine must have the NVIDIA container runtime installed (the NVIDIA container runtime is responsible for exposing your GPU to the Docker engine).
Please ensure that your local machine has an NVIDIA GPU card installed. If you don't have a local machine with an NVIDIA GPU, you can find instructions for spinning up a single GPU instance to try out model serving on a GPU with Cortex here.
Mac and Windows are currently not supported by the NVIDIA container runtime. You can find the complete list of supported operating system and architectures here.
Instructions for setting up the NVIDIA container runtime can be found here.
You can verify that the NVIDIA container runtime has been installed successfully if
nvidia is listed in the available runtimes:
docker info | grep -i runtime.