Cortex is an open source platform for deploying, managing, and scaling machine learning in production.
Supports deploying TensorFlow, PyTorch, sklearn and other models as realtime or batch APIs
Ensures high availability with availability zones and automated instance restarts
Scales to handle production workloads with request-based autoscaling
Runs inference on spot instances with on-demand backups
Manages traffic splitting for A/B testing
# cluster.yamlregion: us-east-1availability_zones: [us-east-1a, us-east-1b]api_gateway: publicinstance_type: g4dn.xlargemin_instances: 10max_instances: 100spot: true
$ cortex cluster up --config cluster.yaml○ configuring autoscaling ✓○ configuring networking ✓○ configuring logging ✓○ configuring metrics dashboard ✓cortex is ready!
Implement request handling in Python
Customize compute, autoscaling, and networking for each API
Package dependencies, code, and configuration for reproducible deployments
Test locally before deploying to your cluster
# predictor.pyfrom transformers import pipelineclass PythonPredictor:def __init__(self, config):self.model = pipeline(task="text-generation")def predict(self, payload):return self.model(payload["text"])[0]
# cortex.yamlname: text-generatorkind: RealtimeAPIpredictor:path: predictor.pycompute:gpu: 1mem: 4Giautoscaling:min_replicas: 1max_replicas: 10networking:api_gateway: public
$ cortex deploy cortex.yamlcreating https://example.com/text-generator$ curl https://example.com/text-generator \-X POST -H "Content-Type: application/json" \-d '{"text": "deploy machine learning models to"}'"deploy machine learning models to production"
Monitor API performance
Aggregate and stream logs
Customize prediction tracking
Update APIs without downtime
$ cortex getrealtime api status replicas last update latency requeststext-generator live 34 9h 247ms 71828object-detector live 13 15h 23ms 828459batch api running jobs last updateimage-classifier 5 10h
$ pip install cortex
See the installation guide for next steps.