Search…
Create

Prerequisites

    1.
    Install and run Docker on your machine.
    2.
    Subscribe to the AMI with GPU support (for GPU clusters).
    3.
    Create an IAM user with AdministratorAccess and programmatic access.
    4.
    You may need to request limit increases for your desired instance types.

Create a cluster on your AWS account

1
# install the cortex CLI
2
bash -c "$(curl -sS https://raw.githubusercontent.com/cortexlabs/cortex/v0.40.0/get-cli.sh)"
3
4
# create a cluster
5
cortex cluster up cluster.yaml
Copied!

cluster.yaml

1
# cluster name
2
cluster_name: cortex
3
4
# AWS region
5
region: us-east-1
6
7
# list of availability zones for your region
8
availability_zones: # default: 3 random availability zones in your region, e.g. [us-east-1a, us-east-1b, us-east-1c]
9
10
# list of cluster node groups;
11
node_groups:
12
- name: ng-cpu # name of the node group
13
instance_type: m5.large # instance type
14
min_instances: 1 # minimum number of instances
15
max_instances: 5 # maximum number of instances
16
priority: 1 # priority of the node group; the higher the value, the higher the priority [1-100]
17
instance_volume_size: 50 # disk storage size per instance (GB)
18
instance_volume_type: gp3 # instance volume type [gp2 | gp3 | io1 | st1 | sc1]
19
# instance_volume_iops: 3000 # instance volume iops (only applicable to io1/gp3)
20
# instance_volume_throughput: 125 # instance volume throughput (only applicable to gp3)
21
spot: false # whether to use spot instances
22
23
- name: ng-gpu
24
instance_type: g4dn.xlarge
25
min_instances: 1
26
max_instances: 5
27
instance_volume_size: 50
28
instance_volume_type: gp3
29
spot: false
30
# ...
31
32
# subnet visibility for instances [public (instances will have public IPs) | private (instances will not have public IPs)]
33
subnet_visibility: public
34
35
# NAT gateway (required when using private subnets) [none | single | highly_available (a NAT gateway per availability zone)]
36
nat_gateway: none
37
38
# API load balancer scheme [internet-facing | internal]
39
api_load_balancer_scheme: internet-facing
40
41
# operator load balancer scheme [internet-facing | internal]
42
# note: if using "internal", you must configure VPC Peering to connect your CLI to your cluster operator
43
operator_load_balancer_scheme: internet-facing
44
45
# to install Cortex in an existing VPC, you can provide a list of subnets for your cluster to use
46
# subnet_visibility (specified above in this file) must match your subnets' visibility
47
# this is an advanced feature (not recommended for first-time users) and requires your VPC to be configured correctly; see https://eksctl.io/usage/vpc-networking/#use-existing-vpc-other-custom-configuration
48
# here is an example:
49
# subnets:
50
# - availability_zone: us-west-2a
51
# subnet_id: subnet-060f3961c876872ae
52
# - availability_zone: us-west-2b
53
# subnet_id: subnet-0faed05adf6042ab7
54
55
# restrict access to APIs by cidr blocks/ip address ranges
56
api_load_balancer_cidr_white_list: [0.0.0.0/0]
57
58
# restrict access to the Operator by cidr blocks/ip address ranges
59
operator_load_balancer_cidr_white_list: [0.0.0.0/0]
60
61
# additional tags to assign to AWS resources (all resources will automatically be tagged with cortex.dev/cluster-name: <cluster_name>)
62
tags: # <string>: <string> map of key/value pairs
63
64
# SSL certificate ARN (only necessary when using a custom domain)
65
ssl_certificate_arn:
66
67
# list of IAM policies to attach to your Cortex APIs
68
iam_policy_arns: ["arn:aws:iam::aws:policy/AmazonS3FullAccess"]
69
70
# primary CIDR block for the cluster's VPC
71
vpc_cidr: 192.168.0.0/16
72
73
# instance type for prometheus (use an instance with more memory for clusters exceeding 300 nodes or 300 pods)
74
prometheus_instance_type: "t3.medium"
Copied!
The docker images used by the cluster can also be overridden. They can be configured by adding any of these keys to your cluster configuration file (default values are shown):
1
image_manager: quay.io/cortexlabs/manager:0.40.0
2
image_operator: quay.io/cortexlabs/operator:0.40.0
3
image_controller_manager: quay.io/cortexlabs/controller-manager:0.40.0
4
image_autoscaler: quay.io/cortexlabs/autoscaler:0.40.0
5
image_proxy: quay.io/cortexlabs/proxy:0.40.0
6
image_async_gateway: quay.io/cortexlabs/async-gateway:0.40.0
7
image_activator: quay.io/cortexlabs/activator:0.40.0
8
image_enqueuer: quay.io/cortexlabs/enqueuer:0.40.0
9
image_dequeuer: quay.io/cortexlabs/dequeuer:0.40.0
10
image_cluster_autoscaler: quay.io/cortexlabs/cluster-autoscaler:0.40.0
11
image_metrics_server: quay.io/cortexlabs/metrics-server:0.40.0
12
image_nvidia_device_plugin: quay.io/cortexlabs/nvidia-device-plugin:0.40.0
13
image_neuron_device_plugin: quay.io/cortexlabs/neuron-device-plugin:0.40.0
14
image_neuron_scheduler: quay.io/cortexlabs/neuron-scheduler:0.40.0
15
image_fluent_bit: quay.io/cortexlabs/fluent-bit:0.40.0
16
image_istio_proxy: quay.io/cortexlabs/istio-proxy:0.40.0
17
image_istio_pilot: quay.io/cortexlabs/istio-pilot:0.40.0
18
image_prometheus: quay.io/cortexlabs/prometheus:0.40.0
19
image_prometheus_config_reloader: quay.io/cortexlabs/prometheus-config-reloader:0.40.0
20
image_prometheus_operator: quay.io/cortexlabs/prometheus-operator:0.40.0
21
image_prometheus_statsd_exporter: quay.io/cortexlabs/prometheus-statsd-exporter:0.40.0
22
image_prometheus_dcgm_exporter: quay.io/cortexlabs/prometheus-dcgm-exporter:0.40.0
23
image_prometheus_kube_state_metrics: quay.io/cortexlabs/prometheus-kube-state-metrics:0.40.0
24
image_prometheus_node_exporter: quay.io/cortexlabs/prometheus-node-exporter:0.40.0
25
image_kube_rbac_proxy: quay.io/cortexlabs/kube-rbac-proxy:0.40.0
26
image_grafana: quay.io/cortexlabs/grafana:0.40.0
27
image_event_exporter: quay.io/cortexlabs/event-exporter:0.40.0
28
image_kubexit: quay.io/cortexlabs/kubexit:0.40.0
Copied!
Last modified 2mo ago