github.com/pachyderm/pachyderm@v1.13.4/doc/docs/1.9.x/deploy-manage/deploy/amazon_web_services/aws-deploy-pachyderm.md (about) 1 # Deploy Pachyderm on AWS 2 3 After you deploy Kubernetes cluster by using `kops` or `eksctl`, 4 you can deploy Pachyderm on top of that cluster. 5 6 You need to complete the following steps to deploy Pachyderm: 7 8 1. Install `pachctl` as described in [Install pachctl](../../../../getting_started/local_installation#install-pachctl). 9 1. Add stateful storage for Pachyderm as described in [Add Stateful Storage](#add-stateful-storage). 10 1. Deploy Pachyderm by using an [IAM role](#deploy-pachyderm-with-an-iam-role) 11 (recommended) or [an access key](#deploy-pachyderm-with-an-access-key). 12 13 ## Add Stateful Storage 14 15 Pachyderm requires the following types of persistent storage: 16 17 An S3 object store bucket for data. The S3 bucket name 18 must be globally unique across the whole 19 Amazon region. Therefore, add a descriptive prefix to the S3 bucket 20 name, such as your username. 21 22 An Elastic Block Storage (EBS) persistent volume (PV) for Pachyderm 23 metadata. Pachyderm recommends that you assign at least 10 GB for this 24 persistent EBS volume. If you expect your cluster to be very 25 long running a scale to thousands of jobs per commits, you might 26 need to go add more storage. However, you can easily increase the 27 size of the persistent volume later. 28 29 To add stateful storage, complete the following steps: 30 31 1. Set up the following system variables: 32 33 * `BUCKET_NAME` — A globally unique S3 bucket name. 34 * `STORAGE_SIZE` — The size of the persistent volume in GB. For example, `10`. 35 * `AWS_REGION` — The AWS region of your Kubernetes cluster. For example, 36 `us-west-2` and not `us-west-2a`. 37 38 39 1. Create an S3 bucket: 40 41 * If you are creating an S3 bucket in the `us-east-1` region, run the following 42 command: 43 44 ```shell 45 $ aws s3api create-bucket --bucket ${BUCKET_NAME} --region ${AWS_REGION} 46 ``` 47 48 * If you are creating an S3 bucket in any region but the `us-east-1` 49 region, run the following command: 50 51 ```shell 52 $ aws s3api create-bucket --bucket ${BUCKET_NAME} --region ${AWS_REGION} --create-bucket-configuration LocationConstraint=${AWS_REGION} 53 ``` 54 55 1. Verify that the S3 bucket was created: 56 57 ``` 58 $ aws s3api list-buckets --query 'Buckets[].Name' 59 ``` 60 61 ## Deploy Pachyderm with an IAM Role 62 63 IAM roles provide better user management and security 64 capabilities compared to access keys. If a malicious user gains access to 65 an access key, your data might become compromised. Therefore, enterprises 66 often opt out to use IAM roles rather than access keys for production 67 deployments. 68 69 You need to configure the following IAM settings: 70 71 * The worker nodes on which Pachyderm is deployed must be associated 72 with the IAM role that is assigned to the Kubernetes cluster. 73 If you created your cluster by using `kops` or `eksctl` 74 the nodes must have a dedicated IAM role already assigned. 75 76 * The IAM role must have access to the S3 bucket that you created for 77 Pachyderm. 78 79 * The IAM role must have correct trust relationships. 80 81 You need to set a system variable `IAM_ROLE` to the name 82 of the IAM role that you will use to deploy the cluster. 83 This role is different from the Role ARN or the Instance 84 Profile ARN of the role. It is the actual role name. 85 86 To deploy Pachyderm with an IAM role, complete the following steps: 87 88 1. Find the IAM role assigned to the cluster: 89 90 1. Go to the AWS Management console. 91 1. Select an EC2 instance in the Kubernetes cluster. 92 1. Click **Description**. 93 1. Find the **IAM Role** field. 94 95 1. Enable access to the S3 bucket for the IAM role: 96 97 1. In the **IAM Role** field, click on the IAM role. 98 1. In the **Permissions** tab, click **Edit policy**. 99 1. Select the **JSON** tab. 100 1. Append the following text to the end of the existing JSON: 101 102 ```json 103 { 104 "Effect": "Allow", 105 "Action": [ 106 "s3:ListBucket" 107 ], 108 "Resource": [ 109 "arn:aws:s3:::<your-bucket>" 110 ] 111 }, 112 { 113 "Effect": "Allow", 114 "Action": [ 115 "s3:PutObject", 116 "s3:GetObject", 117 "s3:DeleteObject" 118 ], 119 "Resource": [ 120 "arn:aws:s3:::<your-bucket>/*" 121 ] 122 } 123 ``` 124 125 Replace `<your-bucket>` with the name of your S3 bucket. 126 127 **Note:** For the EKS cluster, you might need to use the 128 **Add inline policy** button and create a name for the new policy. 129 The JSON above is inserted between the square brackets for the `Statement` element. 130 131 1. Set up trust relationships for the IAM role: 132 133 1. Click the **Trust relationships > Edit trust relationship**. 134 1. Ensure that you see a statement with `sts:AssumeRole`. Example: 135 136 ```json 137 { 138 "Version": "2012-10-17", 139 "Statement": [ 140 { 141 "Effect": "Allow", 142 "Principal": { 143 "Service": "ec2.amazonaws.com" 144 }, 145 "Action": "sts:AssumeRole" 146 } 147 ] 148 } 149 ``` 150 151 1. Set the system variable `IAM_ROLE` to the IAM role name 152 for the Pachyderm deployment. 153 154 1. Deploy Pachyderm: 155 156 ```shell 157 $ pachctl deploy amazon ${BUCKET_NAME} ${AWS_REGION} ${STORAGE_SIZE} --dynamic-etcd-nodes=1 --iam-role ${IAM_ROLE} 158 ``` 159 160 The deployment takes some time. You can run `kubectl get pods` periodically 161 to check the status of deployment. When Pachyderm is deployed, the command 162 shows all pods as `READY`: 163 164 ```shell 165 $ kubectl get pods 166 NAME READY STATUS RESTARTS AGE 167 dash-6c9dc97d9c-89dv9 2/2 Running 0 1m 168 etcd-0 1/1 Running 0 4m 169 pachd-65fd68d6d4-8vjq7 1/1 Running 0 4m 170 ``` 171 172 **Note:** If you see a few restarts on the `pachd` nodes, it means that 173 Kubernetes tried to bring up those pods before `etcd` was ready. Therefore, 174 Kubernetes restarted those pods. You can safely ignore this message. 175 176 1. Verify that the Pachyderm cluster is up and running: 177 178 ```shell 179 $ pachctl version 180 181 COMPONENT VERSION 182 pachctl 1.9.7 183 pachd 1.9.7 184 ``` 185 186 * If you want to access the Pachyderm UI or use the S3 gateway, you need to 187 forward Pachyderm ports. Open a new terminal window and run the 188 following command: 189 190 ```shell 191 $ pachctl port-forward 192 ``` 193 194 ## Deploy Pachyderm with an Access Key 195 196 When you installed `kops`, you created a dedicated IAM 197 user with access credentials such as an access key and 198 secret key. You can deploy 199 Pachyderm by using the credentials of this IAM user 200 directly. However, deploying Pachyderm with an 201 access key might not satisfy your enterprise security 202 requirements. Therefore, deploying with an IAM role 203 is preferred. 204 205 To deploy Pachyderm with an access key, complete the following 206 steps: 207 208 1. Run the following command to deploy your Pachyderm cluster: 209 210 ```shell 211 $ pachctl deploy amazon ${BUCKET_NAME} ${AWS_REGION} ${STORAGE_SIZE} --dynamic-etcd-nodes=1 --credentials "${AWS_ACCESS_KEY_ID},${AWS_SECRET_ACCESS_KEY}," 212 ``` 213 214 The `,` at the end of the `credentials` flag in the deploy 215 command is for an optional temporary AWS token. You might use 216 such a token if you are just experimenting with 217 Pachyderm. However, do not use this token in a 218 production deployment. 219 220 The deployment takes some time. You can run `kubectl get pods` periodically 221 to check the status of deployment. When Pachyderm is deployed, the command 222 shows all pods as `READY`: 223 224 ```shell 225 $ kubectl get pods 226 NAME READY STATUS RESTARTS AGE 227 dash-6c9dc97d9c-89dv9 2/2 Running 0 1m 228 etcd-0 1/1 Running 0 4m 229 pachd-65fd68d6d4-8vjq7 1/1 Running 0 4m 230 ``` 231 232 **Note:** If you see a few restarts on the `pachd` nodes, it means that 233 Kubernetes tried to bring up those pods before `etcd` was ready. 234 Therefore, Kubernetes restarted those pods. You can safely ignore this 235 message. 236 237 1. Verify that the Pachyderm cluster is up and running: 238 239 ```shell 240 $ pachctl version 241 242 COMPONENT VERSION 243 pachctl 1.9.7 244 pachd 1.9.7 245 ``` 246 247 * If you want to access the Pachyderm UI or use S3 gateway, you need to 248 forward Pachyderm ports. Open a new terminal window and run the 249 following command: 250 251 ```shell 252 $ pachctl port-forward 253 ```