Deploy realtime APIs that can respond to prediction requests on demand.
Request-based autoscaling
Multi-model endpoints
Server-side batching
Metrics and log aggregation
Rolling updates
$ pip install cortex
# cluster.yaml​region: us-east-1instance_type: g4dn.xlargemin_instances: 1max_instances: 3spot: true
$ cortex cluster up --config cluster.yaml
$ mkdir text-generator && cd text-generator$ touch predictor.py requirements.txt text_generator.yaml
# predictor.py​from transformers import pipeline​class PythonPredictor:def __init__(self, config):self.model = pipeline(task="text-generation")​def predict(self, payload):return self.model(payload["text"])[0]
# requirements.txt​transformerstorch
# text_generator.yaml​- name: text-generatorkind: RealtimeAPIpredictor:type: pythonpath: predictor.pycompute:gpu: 1
$ cortex deploy text_generator.yaml
$ cortex get text-generator --watch
$ cortex logs text-generator
$ curl http://***.elb.us-west-2.amazonaws.com/text-generator -X POST -H "Content-Type: application/json" -d '{"text": "hello world"}'
$ cortex delete text-generator