Predictor

Which Predictor you use depends on how your model is exported:

Project files

Cortex makes all files in the project directory (i.e. the directory which contains cortex.yaml) available for use in your Predictor implementation. Python bytecode files (*.pyc, *.pyo, *.pyd), files or folders that start with ., and the api configuration file (e.g. cortex.yaml) are excluded.

The following files can also be added at the root of the project's directory:

  • .cortexignore file, which follows the same syntax and behavior as a .gitignore file.

  • .env file, which exports environment variables that can be used in the predictor. Each line of this file must follow the VARIABLE=value format.

For example, if your directory looks like this:

./my-classifier/
├── cortex.yaml
├── values.json
├── predictor.py
├── ...
└── requirements.txt

You can access values.json in your Predictor like this:

import json
class PythonPredictor:
def __init__(self, config):
with open('values.json', 'r') as values_file:
values = json.load(values_file)
self.values = values

Python Predictor

Interface

# initialization code and variables can be declared here in global scope
class PythonPredictor:
def __init__(self, config, job_spec):
"""(Required) Called once during each worker initialization. Performs
setup such as downloading/initializing the model or downloading a
vocabulary.
Args:
config (required): Dictionary passed from API configuration (if
specified) merged with configuration passed in with Job
Submission API. If there are conflicting keys, values in
configuration specified in Job submission takes precedence.
job_spec (optional): Dictionary containing the following fields:
"job_id": A unique ID for this job
"api_name": The name of this batch API
"config": The config that was provided in the job submission
"workers": The number of workers for this job
"total_batch_count": The total number of batches in this job
"start_time": The time that this job started
"""
pass
def predict(self, payload, batch_id):
"""(Required) Called once per batch. Preprocesses the batch payload (if
necessary), runs inference, postprocesses the inference output (if
necessary), and writes the predictions to storage (i.e. S3 or a
database, if desired).
Args:
payload (required): a batch (i.e. a list of one or more samples).
batch_id (optional): uuid assigned to this batch.
Returns:
Nothing
"""
pass
def on_job_complete(self):
"""(Optional) Called once after all batches in the job have been
processed. Performs post job completion tasks such as aggregating
results, executing web hooks, or triggering other jobs.
"""
pass

TensorFlow Predictor

Uses TensorFlow version 2.3.0 by default

Interface

class TensorFlowPredictor:
def __init__(self, tensorflow_client, config, job_spec):
"""(Required) Called once during each worker initialization. Performs
setup such as downloading/initializing the model or downloading a
vocabulary.
Args:
tensorflow_client (required): TensorFlow client which is used to
make predictions. This should be saved for use in predict().
config (required): Dictionary passed from API configuration (if
specified) merged with configuration passed in with Job
Submission API. If there are conflicting keys, values in
configuration specified in Job submission takes precedence.
job_spec (optional): Dictionary containing the following fields:
"job_id": A unique ID for this job
"api_name": The name of this batch API
"config": The config that was provided in the job submission
"workers": The number of workers for this job
"total_batch_count": The total number of batches in this job
"start_time": The time that this job started
"""
self.client = tensorflow_client
# Additional initialization may be done here
def predict(self, payload, batch_id):
"""(Required) Called once per batch. Preprocesses the batch payload (if
necessary), runs inference (e.g. by calling
self.client.predict(model_input)), postprocesses the inference output
(if necessary), and writes the predictions to storage (i.e. S3 or a
database, if desired).
Args:
payload (required): a batch (i.e. a list of one or more samples).
batch_id (optional): uuid assigned to this batch.
Returns:
Nothing
"""
pass
def on_job_complete(self):
"""(Optional) Called once after all batches in the job have been
processed. Performs post job completion tasks such as aggregating
results, executing web hooks, or triggering other jobs.
"""
pass

Cortex provides a tensorflow_client to your Predictor's constructor. tensorflow_client is an instance of TensorFlowClient that manages a connection to a TensorFlow Serving container to make predictions using your model. It should be saved as an instance variable in your Predictor, and your predict() function should call tensorflow_client.predict() to make an inference with your exported TensorFlow model. Preprocessing of the JSON payload and postprocessing of predictions can be implemented in your predict() function as well.

When multiple models are defined using the Predictor's models field, the tensorflow_client.predict() method expects a second argument model_name which must hold the name of the model that you want to use for inference (for example: self.client.predict(payload, "text-generator")).

ONNX Predictor

Uses ONNX Runtime version 1.4.0 by default

Interface

class ONNXPredictor:
def __init__(self, onnx_client, config, job_spec):
"""(Required) Called once during each worker initialization. Performs
setup such as downloading/initializing the model or downloading a
vocabulary.
Args:
onnx_client (required): ONNX client which is used to make
predictions. This should be saved for use in predict().
config (required): Dictionary passed from API configuration (if
specified) merged with configuration passed in with Job
Submission API. If there are conflicting keys, values in
configuration specified in Job submission takes precedence.
job_spec (optional): Dictionary containing the following fields:
"job_id": A unique ID for this job
"api_name": The name of this batch API
"config": The config that was provided in the job submission
"workers": The number of workers for this job
"total_batch_count": The total number of batches in this job
"start_time": The time that this job started
"""
self.client = onnx_client
# Additional initialization may be done here
def predict(self, payload, batch_id):
"""(Required) Called once per batch. Preprocesses the batch payload (if
necessary), runs inference (e.g. by calling
self.client.predict(model_input)), postprocesses the inference output
(if necessary), and writes the predictions to storage (i.e. S3 or a
database, if desired).
Args:
payload (required): a batch (i.e. a list of one or more samples).
batch_id (optional): uuid assigned to this batch.
Returns:
Nothing
"""
pass
def on_job_complete(self):
"""(Optional) Called once after all batches in the job have been
processed. Performs post job completion tasks such as aggregating
results, executing web hooks, or triggering other jobs.
"""
pass

Cortex provides an onnx_client to your Predictor's constructor. onnx_client is an instance of ONNXClient that manages an ONNX Runtime session to make predictions using your model. It should be saved as an instance variable in your Predictor, and your predict() function should call onnx_client.predict() to make an inference with your exported ONNX model. Preprocessing of the JSON payload and postprocessing of predictions can be implemented in your predict() function as well.

When multiple models are defined using the Predictor's models field, the onnx_client.predict() method expects a second argument model_name which must hold the name of the model that you want to use for inference (for example: self.client.predict(model_input, "text-generator")).