The cortex get
and cortex get API_NAME
commands display the request time (averaged over the past 2 weeks) and response code counts (summed over the past 2 weeks) for your APIs:
$ cortex get​env api status up-to-date requested last update avg request 2XXaws iris-classifier live 1 1 17m 24ms 1223aws text-generator live 1 1 8m 180ms 433aws image-classifier-resnet50 live 2 2 1h 32ms 1121126
The cortex get API_NAME
command also provides a link to a CloudWatch Metrics dashboard containing this information:
responses per minute
Shows the number of 2XX, 4XX, and 5XX responses per minute.
median response time
Shows the median response time for requests, over 1-minute periods (measured in milliseconds).
p99 response time
Shows the p99 response time for requests, over 1-minute periods (measured in milliseconds).
total in-flight requests
Shows the total number of in-flight requests.
See metric intervals.
avg in-flight requests per replica
Shows the average number of in-flight requests per replica.
See metric intervals.
active replicas
Shows the number of active replicas.
See metric intervals.
The referenced widget is aggregated over 10 second intervals because each replica reports its in-flight requests once per 10 seconds. This plot is only available for the last 3 hours (because second-granular data is aggregated to minute-granular data after 3 hours). To plot data older than 3 hours, instead change the period to 1 minute, and divide the y-axis by 6 to (since the metrics are reported every 10 seconds).*