Search…
Create

Prerequisites

  1. 1.
    Install and run Docker on your machine.
  2. 2.
    Subscribe to the AMI with GPU support (for GPU clusters).
  3. 3.
    Create an IAM user with AdministratorAccess and programmatic access.
  4. 4.
    You may need to request limit increases for your desired instance types.

Create a cluster on your AWS account

1
# install the cortex CLI
2
bash -c "$(curl -sS https://raw.githubusercontent.com/cortexlabs/cortex/v0.42.0/get-cli.sh)"
3
4
# create a cluster
5
cortex cluster up cluster.yaml
Copied!

cluster.yaml

1
# cluster name
2
cluster_name: cortex
3
4
# AWS region
5
region: us-east-1
6
7
# list of availability zones for your region
8
availability_zones: # default: 3 random availability zones in your region, e.g. [us-east-1a, us-east-1b, us-east-1c]
9
10
# list of cluster node groups;
11
node_groups:
12
- name: ng-cpu # name of the node group
13
instance_type: m5.large # instance type
14
min_instances: 1 # minimum number of instances
15
max_instances: 5 # maximum number of instances
16
priority: 1 # priority of the node group; the higher the value, the higher the priority [1-100]
17
instance_volume_size: 50 # disk storage size per instance (GB)
18
instance_volume_type: gp3 # instance volume type [gp2 | gp3 | io1 | st1 | sc1]
19
# instance_volume_iops: 3000 # instance volume iops (only applicable to io1/gp3)
20
# instance_volume_throughput: 125 # instance volume throughput (only applicable to gp3)
21
spot: false # whether to use spot instances
22
23
- name: ng-gpu
24
instance_type: g4dn.xlarge
25
min_instances: 1
26
max_instances: 5
27
instance_volume_size: 50
28
instance_volume_type: gp3
29
spot: false
30
# ...
31
32
# subnet visibility for instances [public (instances will have public IPs) | private (instances will not have public IPs)]
33
subnet_visibility: public
34
35
# NAT gateway (required when using private subnets) [none | single | highly_available (a NAT gateway per availability zone)]
36
nat_gateway: none
37
38
# API load balancer type [nlb | elb]
39
api_load_balancer_type: nlb
40
41
# API load balancer scheme [internet-facing | internal]
42
api_load_balancer_scheme: internet-facing
43
44
# operator load balancer scheme [internet-facing | internal]
45
# note: if using "internal", you must configure VPC Peering to connect your CLI to your cluster operator
46
operator_load_balancer_scheme: internet-facing
47
48
# to install Cortex in an existing VPC, you can provide a list of subnets for your cluster to use
49
# subnet_visibility (specified above in this file) must match your subnets' visibility
50
# this is an advanced feature (not recommended for first-time users) and requires your VPC to be configured correctly; see https://eksctl.io/usage/vpc-networking/#use-existing-vpc-other-custom-configuration
51
# here is an example:
52
# subnets:
53
# - availability_zone: us-west-2a
54
# subnet_id: subnet-060f3961c876872ae
55
# - availability_zone: us-west-2b
56
# subnet_id: subnet-0faed05adf6042ab7
57
58
# restrict access to APIs by cidr blocks/ip address ranges
59
api_load_balancer_cidr_white_list: [0.0.0.0/0]
60
61
# restrict access to the Operator by cidr blocks/ip address ranges
62
operator_load_balancer_cidr_white_list: [0.0.0.0/0]
63
64
# additional tags to assign to AWS resources (all resources will automatically be tagged with cortex.dev/cluster-name: <cluster_name>)
65
tags: # <string>: <string> map of key/value pairs
66
67
# SSL certificate ARN (only necessary when using a custom domain)
68
ssl_certificate_arn:
69
70
# list of IAM policies to attach to your Cortex APIs
71
iam_policy_arns: ["arn:aws:iam::aws:policy/AmazonS3FullAccess"]
72
73
# primary CIDR block for the cluster's VPC
74
vpc_cidr: 192.168.0.0/16
75
76
# instance type for prometheus (use an instance with more memory for clusters exceeding 300 nodes or 300 pods)
77
prometheus_instance_type: "t3.medium"
Copied!
The docker images used by the cluster can also be overridden. They can be configured by adding any of these keys to your cluster configuration file (default values are shown):
1
image_manager: quay.io/cortexlabs/manager:0.42.0
2
image_operator: quay.io/cortexlabs/operator:0.42.0
3
image_controller_manager: quay.io/cortexlabs/controller-manager:0.42.0
4
image_autoscaler: quay.io/cortexlabs/autoscaler:0.42.0
5
image_proxy: quay.io/cortexlabs/proxy:0.42.0
6
image_async_gateway: quay.io/cortexlabs/async-gateway:0.42.0
7
image_activator: quay.io/cortexlabs/activator:0.42.0
8
image_enqueuer: quay.io/cortexlabs/enqueuer:0.42.0
9
image_dequeuer: quay.io/cortexlabs/dequeuer:0.42.0
10
image_cluster_autoscaler: quay.io/cortexlabs/cluster-autoscaler:0.42.0
11
image_metrics_server: quay.io/cortexlabs/metrics-server:0.42.0
12
image_nvidia_device_plugin: quay.io/cortexlabs/nvidia-device-plugin:0.42.0
13
image_neuron_device_plugin: quay.io/cortexlabs/neuron-device-plugin:0.42.0
14
image_neuron_scheduler: quay.io/cortexlabs/neuron-scheduler:0.42.0
15
image_fluent_bit: quay.io/cortexlabs/fluent-bit:0.42.0
16
image_istio_proxy: quay.io/cortexlabs/istio-proxy:0.42.0
17
image_istio_pilot: quay.io/cortexlabs/istio-pilot:0.42.0
18
image_prometheus: quay.io/cortexlabs/prometheus:0.42.0
19
image_prometheus_config_reloader: quay.io/cortexlabs/prometheus-config-reloader:0.42.0
20
image_prometheus_operator: quay.io/cortexlabs/prometheus-operator:0.42.0
21
image_prometheus_statsd_exporter: quay.io/cortexlabs/prometheus-statsd-exporter:0.42.0
22
image_prometheus_dcgm_exporter: quay.io/cortexlabs/prometheus-dcgm-exporter:0.42.0
23
image_prometheus_kube_state_metrics: quay.io/cortexlabs/prometheus-kube-state-metrics:0.42.0
24
image_prometheus_node_exporter: quay.io/cortexlabs/prometheus-node-exporter:0.42.0
25
image_kube_rbac_proxy: quay.io/cortexlabs/kube-rbac-proxy:0.42.0
26
image_grafana: quay.io/cortexlabs/grafana:0.42.0
27
image_event_exporter: quay.io/cortexlabs/event-exporter:0.42.0
28
image_kubexit: quay.io/cortexlabs/kubexit:0.42.0
Copied!
Last modified 4mo ago