Deploy machine learning models to production

Cortex is an open source platform for deploying, managing, and scaling machine learning in production.

Model serving infrastructure

  • Supports deploying TensorFlow, PyTorch, sklearn and other models as realtime or batch APIs

  • Ensures high availability with availability zones and automated instance restarts

  • Scales to handle production workloads with request-based autoscaling

  • Runs inference on spot instances with on-demand backups

  • Manages traffic splitting for A/B testing

Configure your cluster:

# cluster.yaml
region: us-east-1
availability_zones: [us-east-1a, us-east-1b]
api_gateway: public
instance_type: g4dn.xlarge
min_instances: 10
max_instances: 100
spot: true

Spin up your cluster on your AWS account:

$ cortex cluster up --config cluster.yaml
○ configuring autoscaling ✓
○ configuring networking ✓
○ configuring logging ✓
○ configuring metrics dashboard ✓
cortex is ready!

Reproducible model deployments

  • Implement request handling in Python

  • Customize compute, autoscaling, and networking for each API

  • Package dependencies, code, and configuration for reproducible deployments

  • Test locally before deploying to your cluster

Implement a predictor:

# predictor.py
from transformers import pipeline
class PythonPredictor:
def __init__(self, config):
self.model = pipeline(task="text-generation")
def predict(self, payload):
return self.model(payload["text"])[0]

Configure an API:

# cortex.yaml
name: text-generator
kind: RealtimeAPI
predictor:
path: predictor.py
compute:
gpu: 1
mem: 4Gi
autoscaling:
min_replicas: 1
max_replicas: 10
networking:
api_gateway: public

Deploy to production:

$ cortex deploy cortex.yaml
creating https://example.com/text-generator
$ curl https://example.com/text-generator \
-X POST -H "Content-Type: application/json" \
-d '{"text": "deploy machine learning models to"}'
"deploy machine learning models to production"

API management

  • Monitor API performance

  • Aggregate and stream logs

  • Customize prediction tracking

  • Update APIs without downtime

Manage your APIs:

$ cortex get
realtime api status replicas last update latency requests
text-generator live 34 9h 247ms 71828
object-detector live 13 15h 23ms 828459
batch api running jobs last update
image-classifier 5 10h

Get started

$ pip install cortex

See the installation guide for next steps.