Search…
Deploy machine learning models to production
Cortex is an open source platform for deploying, managing, and scaling machine learning in production.

Model serving infrastructure

  • Supports deploying TensorFlow, PyTorch, sklearn and other models as realtime or batch APIs
  • Ensures high availability with availability zones and automated instance restarts
  • Scales to handle production workloads with request-based autoscaling
  • Runs inference on spot instances with on-demand backups
  • Manages traffic splitting for A/B testing

Configure your cluster:

1
# cluster.yaml
2
3
region: us-east-1
4
availability_zones: [us-east-1a, us-east-1b]
5
api_gateway: public
6
instance_type: g4dn.xlarge
7
min_instances: 10
8
max_instances: 100
9
spot: true
Copied!

Spin up your cluster on your AWS account:

1
$ cortex cluster up --config cluster.yaml
2
3
○ configuring autoscaling ✓
4
○ configuring networking ✓
5
○ configuring logging ✓
6
○ configuring metrics dashboard ✓
7
8
cortex is ready!
Copied!

Reproducible model deployments

  • Implement request handling in Python
  • Customize compute, autoscaling, and networking for each API
  • Package dependencies, code, and configuration for reproducible deployments
  • Test locally before deploying to your cluster

Implement a predictor:

1
# predictor.py
2
3
from transformers import pipeline
4
5
class PythonPredictor:
6
def __init__(self, config):
7
self.model = pipeline(task="text-generation")
8
9
def predict(self, payload):
10
return self.model(payload["text"])[0]
Copied!

Configure an API:

1
# cortex.yaml
2
3
name: text-generator
4
kind: RealtimeAPI
5
predictor:
6
path: predictor.py
7
compute:
8
gpu: 1
9
mem: 4Gi
10
autoscaling:
11
min_replicas: 1
12
max_replicas: 10
13
networking:
14
api_gateway: public
Copied!

Deploy to production:

1
$ cortex deploy cortex.yaml
2
3
creating https://example.com/text-generator
4
5
$ curl https://example.com/text-generator \
6
-X POST -H "Content-Type: application/json" \
7
-d '{"text": "deploy machine learning models to"}'
8
9
"deploy machine learning models to production"
Copied!

API management

  • Monitor API performance
  • Aggregate and stream logs
  • Customize prediction tracking
  • Update APIs without downtime

Manage your APIs:

1
$ cortex get
2
3
realtime api status replicas last update latency requests
4
5
text-generator live 34 9h 247ms 71828
6
object-detector live 13 15h 23ms 828459
7
8
9
batch api running jobs last update
10
11
image-classifier 5 10h
Copied!

Get started

1
$ pip install cortex
Copied!
See the installation guide for next steps.
Last modified 10mo ago