Once your model is exported, you can implement one of Cortex's Predictor classes to deploy your model. A Predictor is a Python class that describes how to initialize your model and use it to make predictions.
Which Predictor you use depends on how your model is exported:
​TensorFlow Predictor if your model is exported as a TensorFlow SavedModel
​ONNX Predictor if your model is exported in the ONNX format
​Python Predictor for all other cases
Cortex makes all files in the project directory (i.e. the directory which contains cortex.yaml
) available for use in your Predictor implementation. Python bytecode files (*.pyc
, *.pyo
, *.pyd
), files or folders that start with .
, and the api configuration file (e.g. cortex.yaml
) are excluded.
The following files can also be added at the root of the project's directory:
.cortexignore
file, which follows the same syntax and behavior as a .gitignore file.
.env
file, which exports environment variables that can be used in the predictor. Each line of this file must follow the VARIABLE=value
format.
For example, if your directory looks like this:
./my-classifier/├── cortex.yaml├── values.json├── predictor.py├── ...└── requirements.txt
You can access values.json
in your Predictor like this:
import json​class PythonPredictor:def __init__(self, config):with open('values.json', 'r') as values_file:values = json.load(values_file)self.values = values
# initialization code and variables can be declared here in global scope​class PythonPredictor:def __init__(self, config, job_spec):"""(Required) Called once during each worker initialization. Performssetup such as downloading/initializing the model or downloading avocabulary.​Args:config (required): Dictionary passed from API configuration (ifspecified) merged with configuration passed in with JobSubmission API. If there are conflicting keys, values inconfiguration specified in Job submission takes precedence.job_spec (optional): Dictionary containing the following fields:"job_id": A unique ID for this job"api_name": The name of this batch API"config": The config that was provided in the job submission"workers": The number of workers for this job"total_batch_count": The total number of batches in this job"start_time": The time that this job started"""pass​def predict(self, payload, batch_id):"""(Required) Called once per batch. Preprocesses the batch payload (ifnecessary), runs inference, postprocesses the inference output (ifnecessary), and writes the predictions to storage (i.e. S3 or adatabase, if desired).​Args:payload (required): a batch (i.e. a list of one or more samples).batch_id (optional): uuid assigned to this batch.Returns:Nothing"""pass​def on_job_complete(self):"""(Optional) Called once after all batches in the job have beenprocessed. Performs post job completion tasks such as aggregatingresults, executing web hooks, or triggering other jobs."""pass
For proper separation of concerns, it is recommended to use the constructor's config
parameter for information such as from where to download the model and initialization files, or any configurable model parameters. You define config
in your API configuration, and it is passed through to your Predictor's constructor. The config
parameters in the API configuration
can be overridden by providing config
in the job submission requests.
You can find an example of a BatchAPI using a PythonPredictor in examples/batch/image-classifier.
The following Python packages are pre-installed in Python Predictors and can be used in your implementations:
boto3==1.14.53cloudpickle==1.6.0Cython==0.29.21dill==0.3.2fastapi==0.61.1joblib==0.16.0Keras==2.4.3msgpack==1.0.0nltk==3.5np-utils==0.5.12.1numpy==1.19.1opencv-python==4.4.0.42pandas==1.1.1Pillow==7.2.0pyyaml==5.3.1requests==2.24.0scikit-image==0.17.2scikit-learn==0.23.2scipy==1.5.2six==1.15.0statsmodels==0.12.0sympy==1.6.2tensorflow-hub==0.9.0tensorflow==2.3.0torch==1.6.0torchvision==0.7.0xgboost==1.2.0
The list is slightly different for inferentia-equipped APIs:
boto3==1.13.7cloudpickle==1.6.0Cython==0.29.21dill==0.3.1.1fastapi==0.54.1joblib==0.16.0msgpack==1.0.0neuron-cc==1.0.20600.0+0.b426b885fnltk==3.5np-utils==0.5.12.1numpy==1.18.2opencv-python==4.4.0.42pandas==1.1.1Pillow==7.2.0pyyaml==5.3.1requests==2.23.0scikit-image==0.17.2scikit-learn==0.23.2scipy==1.3.2six==1.15.0statsmodels==0.12.0sympy==1.6.2tensorflow==1.15.4tensorflow-neuron==1.15.3.1.0.2043.0torch==1.5.1torch-neuron==1.5.1.1.0.1721.0torchvision==0.6.1
The pre-installed system packages are listed in images/python-predictor-cpu/Dockerfile (for CPU), images/python-predictor-gpu/Dockerfile (for GPU), or images/python-predictor-inf/Dockerfile (for Inferentia).
If your application requires additional dependencies, you can install additional Python packages and system packages.
class TensorFlowPredictor:def __init__(self, tensorflow_client, config, job_spec):"""(Required) Called once during each worker initialization. Performssetup such as downloading/initializing the model or downloading avocabulary.​Args:tensorflow_client (required): TensorFlow client which is used tomake predictions. This should be saved for use in predict().config (required): Dictionary passed from API configuration (ifspecified) merged with configuration passed in with JobSubmission API. If there are conflicting keys, values inconfiguration specified in Job submission takes precedence.job_spec (optional): Dictionary containing the following fields:"job_id": A unique ID for this job"api_name": The name of this batch API"config": The config that was provided in the job submission"workers": The number of workers for this job"total_batch_count": The total number of batches in this job"start_time": The time that this job started"""self.client = tensorflow_client# Additional initialization may be done here​def predict(self, payload, batch_id):"""(Required) Called once per batch. Preprocesses the batch payload (ifnecessary), runs inference (e.g. by callingself.client.predict(model_input)), postprocesses the inference output(if necessary), and writes the predictions to storage (i.e. S3 or adatabase, if desired).​Args:payload (required): a batch (i.e. a list of one or more samples).batch_id (optional): uuid assigned to this batch.Returns:Nothing"""pass​def on_job_complete(self):"""(Optional) Called once after all batches in the job have beenprocessed. Performs post job completion tasks such as aggregatingresults, executing web hooks, or triggering other jobs."""pass
Cortex provides a tensorflow_client
to your Predictor's constructor. tensorflow_client
is an instance of TensorFlowClient that manages a connection to a TensorFlow Serving container to make predictions using your model. It should be saved as an instance variable in your Predictor, and your predict()
function should call tensorflow_client.predict()
to make an inference with your exported TensorFlow model. Preprocessing of the JSON payload and postprocessing of predictions can be implemented in your predict()
function as well.
When multiple models are defined using the Predictor's models
field, the tensorflow_client.predict()
method expects a second argument model_name
which must hold the name of the model that you want to use for inference (for example: self.client.predict(payload, "text-generator")
). See the multi model guide for more information.
For proper separation of concerns, it is recommended to use the constructor's config
parameter for information such as from where to download the model and initialization files, or any configurable model parameters. You define config
in your API configuration, and it is passed through to your Predictor's constructor. The config
parameters in the API configuration
can be overridden by providing config
in the job submission requests.
You can find an example of a BatchAPI using a TensorFlowPredictor in examples/batch/tensorflow.
The following Python packages are pre-installed in TensorFlow Predictors and can be used in your implementations:
boto3==1.14.53dill==0.3.2fastapi==0.61.1msgpack==1.0.0numpy==1.19.1opencv-python==4.4.0.42pyyaml==5.3.1requests==2.24.0tensorflow-hub==0.9.0tensorflow-serving-api==2.3.0tensorflow==2.3.0
The pre-installed system packages are listed in images/tensorflow-predictor/Dockerfile.
If your application requires additional dependencies, you can install additional Python packages and system packages.
class ONNXPredictor:def __init__(self, onnx_client, config, job_spec):"""(Required) Called once during each worker initialization. Performssetup such as downloading/initializing the model or downloading avocabulary.​Args:onnx_client (required): ONNX client which is used to makepredictions. This should be saved for use in predict().config (required): Dictionary passed from API configuration (ifspecified) merged with configuration passed in with JobSubmission API. If there are conflicting keys, values inconfiguration specified in Job submission takes precedence.job_spec (optional): Dictionary containing the following fields:"job_id": A unique ID for this job"api_name": The name of this batch API"config": The config that was provided in the job submission"workers": The number of workers for this job"total_batch_count": The total number of batches in this job"start_time": The time that this job started"""self.client = onnx_client# Additional initialization may be done here​def predict(self, payload, batch_id):"""(Required) Called once per batch. Preprocesses the batch payload (ifnecessary), runs inference (e.g. by callingself.client.predict(model_input)), postprocesses the inference output(if necessary), and writes the predictions to storage (i.e. S3 or adatabase, if desired).​Args:payload (required): a batch (i.e. a list of one or more samples).batch_id (optional): uuid assigned to this batch.Returns:Nothing"""pass​def on_job_complete(self):"""(Optional) Called once after all batches in the job have beenprocessed. Performs post job completion tasks such as aggregatingresults, executing web hooks, or triggering other jobs."""pass
Cortex provides an onnx_client
to your Predictor's constructor. onnx_client
is an instance of ONNXClient that manages an ONNX Runtime session to make predictions using your model. It should be saved as an instance variable in your Predictor, and your predict()
function should call onnx_client.predict()
to make an inference with your exported ONNX model. Preprocessing of the JSON payload and postprocessing of predictions can be implemented in your predict()
function as well.
When multiple models are defined using the Predictor's models
field, the onnx_client.predict()
method expects a second argument model_name
which must hold the name of the model that you want to use for inference (for example: self.client.predict(model_input, "text-generator")
). See the multi model guide for more information.
For proper separation of concerns, it is recommended to use the constructor's config
parameter for information such as from where to download the model and initialization files, or any configurable model parameters. You define config
in your API configuration, and it is passed through to your Predictor's constructor. The config
parameters in the API configuration
can be overridden by providing config
in the job submission requests.
You can find an example of a BatchAPI using an ONNXPredictor in examples/batch/onnx.
The following Python packages are pre-installed in ONNX Predictors and can be used in your implementations:
boto3==1.14.53dill==0.3.2fastapi==0.61.1msgpack==1.0.0numpy==1.19.1onnxruntime==1.4.0pyyaml==5.3.1requests==2.24.0
The pre-installed system packages are listed in images/onnx-predictor-cpu/Dockerfile (for CPU) or images/onnx-predictor-gpu/Dockerfile (for GPU).
If your application requires additional dependencies, you can install additional Python packages and system packages.