Multi-model endpoints

It is possible to serve multiple models in the same Cortex API using any type of Cortex Predictor. In this guide we'll show the general outline of a multi-model deployment. The section for each predictor type is based on a corresponding example that can be found in the examples directory of the Cortex project.

Python Predictor

Specifying models in API config

The following template is based on the live-reloading/python/mpg-estimator example.

cortex.yaml

Even though it looks as if there's only a single model served, there are actually 4 different versions saved in s3://cortex-examples/sklearn/mpg-estimator/linreg/.

- name: mpg-estimator
kind: RealtimeAPI
predictor:
type: python
path: predictor.py
model_path: s3://cortex-examples/sklearn/mpg-estimator/linreg/

predictor.py

import mlflow.sklearn
import numpy as np
class PythonPredictor:
def __init__(self, config, python_client):
self.client = python_client
def load_model(self, model_path):
return mlflow.sklearn.load_model(model_path)
def predict(self, payload, query_params):
model_version = query_params.get("version")
# process the input
# ...
model = self.client.get_model(model_version=model_version)
result = model.predict(model_input)
return {"prediction": result, "model": {"version": model_version}}

Making predictions

For convenience, we'll export our API's endpoint (yours will be different from mine):

$ api_endpoint=http://a36473270de8b46e79a769850dd3372d-c67035afa37ef878.elb.us-west-2.amazonaws.com/mpg-estimator

Next, we'll make a prediction using the sentiment analyzer model by specifying the model version as a query parameter:

$ curl "${api_endpoint}?version=1" -X POST -H "Content-Type: application/json" -d @sample.json
{"prediction": 26.929889872154185, "model": {"version": "1"}}

Then we'll make a prediction using the 2nd version of the model (since they are just duplicate models, it will only return the same result):

$ curl "${api_endpoint}?version=2" -X POST -H "Content-Type: application/json" -d @sample.json
{"prediction": 26.929889872154185, "model": {"version": "2"}}

Without specifying models in API config

For the Python Predictor, the API configuration for a multi-model API is similar to single-model APIs. The Predictor's config field can be used to customize the behavior of the predictor.py implementation.

The following template is based on the pytorch/multi-model-text-analyzer example.

cortex.yaml

- name: multi-model-text-analyzer
kind: RealtimeAPI
predictor:
type: python
path: predictor.py
config: {...}
...

predictor.py

Models should be loaded within the predictor's constructor. Query parameters are encouraged to be used when selecting the model for inference.

# import modules here
class PythonPredictor:
def __init__(self, config):
# prepare the environment, download/load models/labels, etc
# ...
# load models
self.analyzer = initialize_model("sentiment-analysis")
self.summarizer = initialize_model("summarization")
def predict(self, query_params, payload):
# preprocessing
model_name = query_params.get("model")
model_input = payload["text"]
# ...
# make prediction
if model_name == "sentiment":
results = self.analyzer(model_input)
predicted_label = postprocess(results)
return {"label": predicted_label}
elif model_name == "summarizer":
results = self.summarizer(model_input)
predicted_label = postprocess(results)
return {"label": predicted_label}
else:
return JSONResponse({"error": f"unknown model: {model_name}"}, status_code=400)

Making predictions

For convenience, we'll export our API's endpoint (yours will be different from mine):

$ api_endpoint=http://a36473270de8b46e79a769850dd3372d-c67035afa37ef878.elb.us-west-2.amazonaws.com/multi-model-text-analyzer

Next, we'll make a prediction using the sentiment analyzer model by specifying the model name as a query parameter:

$ curl "${api_endpoint}?model=sentiment" -X POST -H "Content-Type: application/json" -d @sample-sentiment.json
{"label": "POSITIVE", "score": 0.9998506903648376}

Then we'll make a prediction using the text summarizer model:

$ curl "${api_endpoint}?model=summarizer" -X POST -H "Content-Type: application/json" -d @sample-summarizer.json
Machine learning is the study of algorithms and statistical models that computer systems use to perform a specific task. It is seen as a subset of artificial intelligence. Machine learning algorithms are used in a wide variety of applications, such as email filtering and computer vision. In its application across business problems, machine learning is also referred to as predictive analytics.

TensorFlow Predictor

For the TensorFlow Predictor, a multi-model API is configured by placing the list of models in the Predictor's models field (each model will specify its own unique name). The predict() method of the tensorflow_client object expects a second argument that represents the name of the model that will be used for inference.

The following template is based on the tensorflow/multi-model-classifier example.

cortex.yaml

- name: multi-model-classifier
kind: RealtimeAPI
predictor:
type: tensorflow
path: predictor.py
models:
paths:
- name: inception
model_path: s3://cortex-examples/tensorflow/image-classifier/inception/
- name: iris
model_path: s3://cortex-examples/tensorflow/iris-classifier/nn/
- name: resnet50
model_path: s3://cortex-examples/tensorflow/resnet50/
...

predictor.py

# import modules here
class TensorFlowPredictor:
def __init__(self, tensorflow_client, config):
# prepare the environment, download/load labels, etc
# ...
self.client = tensorflow_client
def predict(self, payload, query_params):
# preprocessing
model_name = query_params["model"]
model_input = preprocess(payload["url"])
# make prediction
results = self.client.predict(model_input, model_name)
# postprocess
predicted_label = postprocess(results)
return {"label": predicted_label}

Making predictions

For convenience, we'll export our API's endpoint (yours will be different from mine):

$ api_endpoint=http://a36473270de8b46e79a769850dd3372d-c67035afa37ef878.elb.us-west-2.amazonaws.com/multi-model-classifier

Next, we'll make a prediction using the iris classifier model by specifying the model name as a query parameter:

$ curl "${ENDPOINT}?model=iris" -X POST -H "Content-Type: application/json" -d @sample-iris.json
{"label": "setosa"}

Then we'll make a prediction using the resnet50 model:

$ curl "${ENDPOINT}?model=resnet50" -X POST -H "Content-Type: application/json" -d @sample-image.json
{"label": "sports_car"}

Finally we'll make a prediction using the inception model:

$ curl "${ENDPOINT}?model=inception" -X POST -H "Content-Type: application/json" -d @sample-image.json
{"label": "sports_car"}

ONNX Predictor

For the ONNX Predictor, a multi-model API is configured by placing the list of models in the Predictor's models field (each model will specify its own unique name). The predict() method of the onnx_client object expects a second argument that represents the name of the model that will be used for inference.

The following template is based on the onnx/multi-model-classifier example.

cortex.yaml

- name: multi-model-classifier
kind: RealtimeAPI
predictor:
type: onnx
path: predictor.py
models:
paths:
- name: resnet50
model_path: s3://cortex-examples/onnx/resnet50/
- name: mobilenet
model_path: s3://cortex-examples/onnx/mobilenet/
- name: shufflenet
model_path: s3://cortex-examples/onnx/shufflenet/
...

predictor.py

# import modules here
class ONNXPredictor:
def __init__(self, onnx_client, config):
# prepare the environment, download/load labels, etc
# ...
self.client = onnx_client
def predict(self, payload, query_params):
# process the input
model_name = query_params["model"]
model_input = preprocess(payload["url"])
# make prediction
results = self.client.predict(model_input, model_name)
# postprocess
predicted_label = postprocess(results)
return {"label": predicted_label}

Making predictions

For convenience, we'll export our API's endpoint (yours will be different from mine):

$ api_endpoint=http://a36473270de8b46e79a769850dd3372d-c67035afa37ef878.elb.us-west-2.amazonaws.com/multi-model-classifier

Next, we'll make a prediction using the resnet50 model by specifying the model name as a query parameter:

$ curl "${ENDPOINT}?model=resnet50" -X POST -H "Content-Type: application/json" -d @sample.json
{"label": "tabby"}

Then we'll make a prediction using the mobilenet model:

$ curl "${ENDPOINT}?model=mobilenet" -X POST -H "Content-Type: application/json" -d @sample.json
{"label": "tabby"}

Finally we'll make a prediction using the shufflenet model:

$ curl "${ENDPOINT}?model=shufflenet" -X POST -H "Content-Type: application/json" -d @sample.json
{"label": "Egyptian_cat"}