github.com/pachyderm/pachyderm@v1.13.4/doc/docs/1.10.x/deploy-manage/deploy/amazon_web_services/aws-deploy-pachyderm.md (about) 1 # Deploy Pachyderm on AWS 2 3 After you deploy Kubernetes cluster by using `kops` or `eksctl`, 4 you can deploy Pachyderm on top of that cluster. 5 6 You need to complete the following steps to deploy Pachyderm: 7 8 1. Install `pachctl` as described in [Install pachctl](../../../../getting_started/local_installation#install-pachctl). 9 1. Add stateful storage for Pachyderm as described in [Add Stateful Storage](#add-stateful-storage). 10 1. Deploy Pachyderm by using an [IAM role](#deploy-pachyderm-with-an-iam-role) 11 (recommended) or [an access key](#deploy-pachyderm-with-an-access-key). 12 13 ## Add Stateful Storage 14 15 Pachyderm requires the following types of persistent storage: 16 17 An S3 object store bucket for data. The S3 bucket name 18 must be globally unique across the whole 19 Amazon region. Therefore, add a descriptive prefix to the S3 bucket 20 name, such as your username. 21 22 An Elastic Block Storage (EBS) persistent volume (PV) for Pachyderm 23 metadata. Pachyderm recommends that you assign at least 10 GB for this 24 persistent EBS volume. If you expect your cluster to be very 25 long running a scale to thousands of jobs per commits, you might 26 need to go add more storage. However, you can easily increase the 27 size of the persistent volume later. 28 29 To add stateful storage, complete the following steps: 30 31 1. Set up the following system variables: 32 33 * `BUCKET_NAME` — A globally unique S3 bucket name. 34 * `STORAGE_SIZE` — The size of the persistent volume in GB. For example, `10`. 35 * `AWS_REGION` — The AWS region of your Kubernetes cluster. For example, 36 `us-west-2` and not `us-west-2a`. 37 38 39 1. Create an S3 bucket: 40 41 * If you are creating an S3 bucket in the `us-east-1` region, run the following 42 command: 43 44 ```shell 45 aws s3api create-bucket --bucket ${BUCKET_NAME} --region ${AWS_REGION} 46 ``` 47 48 * If you are creating an S3 bucket in any region but the `us-east-1` 49 region, run the following command: 50 51 ```shell 52 aws s3api create-bucket --bucket ${BUCKET_NAME} --region ${AWS_REGION} --create-bucket-configuration LocationConstraint=${AWS_REGION} 53 ``` 54 55 1. Verify that the S3 bucket was created: 56 57 ``` 58 aws s3api list-buckets --query 'Buckets[].Name' 59 ``` 60 61 ### (Optional) Set up Bucket Encryption 62 63 Amazon S3 supports two types of bucket encryption — server-side encryption 64 (SSE-S3) and AWS Key Management Service (AWS KMS), which stores customer 65 master keys. Pachyderm supports both these methods. Therefore, when you 66 are creating a bucket for your Pachyderm cluster, you can set up either 67 of them. Because Pachyderm requests to buckets do not include encryption 68 information, the method that you select for the bucket is applied. 69 Setting up communication between Pachyderm object storage clients and AWS KMS 70 to append encryption information to Pachyderm requests is not supported and 71 not recommended. 72 73 To set up bucket encryption, see [Amazon S3 Default Encryption for S3 Buckets](https://docs.aws.amazon.com/AmazonS3/latest/dev/bucket-encryption.html). 74 75 ## Deploy Pachyderm with an IAM Role 76 77 IAM roles provide better user management and security 78 capabilities compared to access keys. If a malicious user gains access to 79 an access key, your data might become compromised. Therefore, enterprises 80 often opt out to use IAM roles rather than access keys for production 81 deployments. 82 83 You need to configure the following IAM settings: 84 85 * The worker nodes on which Pachyderm is deployed must be associated 86 with the IAM role that is assigned to the Kubernetes cluster. 87 If you created your cluster by using `kops` or `eksctl` 88 the nodes must have a dedicated IAM role already assigned. 89 90 * The IAM role must have access to the S3 bucket that you created for 91 Pachyderm. 92 93 * The IAM role must have correct trust relationships. 94 95 You need to set a system variable `IAM_ROLE` to the name 96 of the IAM role that you will use to deploy the cluster. 97 This role is different from the Role ARN or the Instance 98 Profile ARN of the role. It is the actual role name. 99 100 To deploy Pachyderm with an IAM role, complete the following steps: 101 102 1. Find the IAM role assigned to the cluster: 103 104 1. Go to the AWS Management console. 105 1. Select an EC2 instance in the Kubernetes cluster. 106 1. Click **Description**. 107 1. Find the **IAM Role** field. 108 109 1. Enable access to the S3 bucket for the IAM role: 110 111 1. In the **IAM Role** field, click on the IAM role. 112 1. In the **Permissions** tab, click **Edit policy**. 113 1. Select the **JSON** tab. 114 1. Append the following text to the end of the existing JSON: 115 116 ```json 117 { 118 "Effect": "Allow", 119 "Action": [ 120 "s3:ListBucket" 121 ], 122 "Resource": [ 123 "arn:aws:s3:::<your-bucket>" 124 ] 125 }, 126 { 127 "Effect": "Allow", 128 "Action": [ 129 "s3:PutObject", 130 "s3:GetObject", 131 "s3:DeleteObject" 132 ], 133 "Resource": [ 134 "arn:aws:s3:::<your-bucket>/*" 135 ] 136 } 137 ``` 138 139 Replace `<your-bucket>` with the name of your S3 bucket. 140 141 **Note:** For the EKS cluster, you might need to use the 142 **Add inline policy** button and create a name for the new policy. 143 The JSON above is inserted between the square brackets for the `Statement` element. 144 145 1. Set up trust relationships for the IAM role: 146 147 1. Click the **Trust relationships > Edit trust relationship**. 148 1. Ensure that you see a statement with `sts:AssumeRole`. Example: 149 150 ```json 151 { 152 "Version": "2012-10-17", 153 "Statement": [ 154 { 155 "Effect": "Allow", 156 "Principal": { 157 "Service": "ec2.amazonaws.com" 158 }, 159 "Action": "sts:AssumeRole" 160 } 161 ] 162 } 163 ``` 164 165 1. Set the system variable `IAM_ROLE` to the IAM role name 166 for the Pachyderm deployment. 167 168 1. Deploy Pachyderm: 169 170 ```shell 171 pachctl deploy amazon ${BUCKET_NAME} ${AWS_REGION} ${STORAGE_SIZE} --dynamic-etcd-nodes=1 --iam-role ${IAM_ROLE} 172 ``` 173 174 The deployment takes some time. You can run `kubectl get pods` periodically 175 to check the status of deployment. When Pachyderm is deployed, the command 176 shows all pods as `READY`: 177 178 ```shell 179 kubectl get pods 180 ``` 181 182 **System Response:** 183 184 ```shell 185 NAME READY STATUS RESTARTS AGE 186 dash-6c9dc97d9c-89dv9 2/2 Running 0 1m 187 etcd-0 1/1 Running 0 4m 188 pachd-65fd68d6d4-8vjq7 1/1 Running 0 4m 189 ``` 190 191 **Note:** If you see a few restarts on the `pachd` nodes, it means that 192 Kubernetes tried to bring up those pods before `etcd` was ready. Therefore, 193 Kubernetes restarted those pods. You can safely ignore this message. 194 195 1. Verify that the Pachyderm cluster is up and running: 196 197 ```shell 198 pachctl version 199 ``` 200 201 **System Response:** 202 203 ```shell 204 COMPONENT VERSION 205 pachctl 1.9.7 206 pachd 1.9.7 207 ``` 208 209 * If you want to access the Pachyderm UI or use the S3 gateway, you need to 210 forward Pachyderm ports. Open a new terminal window and run the 211 following command: 212 213 ```shell 214 pachctl port-forward 215 ``` 216 217 ## Deploy Pachyderm with an Access Key 218 219 When you installed `kops`, you created a dedicated IAM 220 user with access credentials such as an access key and 221 secret key. You can deploy 222 Pachyderm by using the credentials of this IAM user 223 directly. However, deploying Pachyderm with an 224 access key might not satisfy your enterprise security 225 requirements. Therefore, deploying with an IAM role 226 is preferred. 227 228 To deploy Pachyderm with an access key, complete the following 229 steps: 230 231 1. Run the following command to deploy your Pachyderm cluster: 232 233 ```shell 234 pachctl deploy amazon ${BUCKET_NAME} ${AWS_REGION} ${STORAGE_SIZE} --dynamic-etcd-nodes=1 --credentials "${AWS_ACCESS_KEY_ID},${AWS_SECRET_ACCESS_KEY}," 235 ``` 236 237 The `,` at the end of the `credentials` flag in the deploy 238 command is for an optional temporary AWS token. You might use 239 such a token if you are just experimenting with 240 Pachyderm. However, do not use this token in a 241 production deployment. 242 243 The deployment takes some time. You can run `kubectl get pods` periodically 244 to check the status of deployment. When Pachyderm is deployed, the command 245 shows all pods as `READY`: 246 247 ```shell 248 kubectl get pods 249 ``` 250 251 **System Response:** 252 253 ```shell 254 NAME READY STATUS RESTARTS AGE 255 dash-6c9dc97d9c-89dv9 2/2 Running 0 1m 256 etcd-0 1/1 Running 0 4m 257 pachd-65fd68d6d4-8vjq7 1/1 Running 0 4m 258 ``` 259 260 **Note:** If you see a few restarts on the `pachd` nodes, it means that 261 Kubernetes tried to bring up those pods before `etcd` was ready. 262 Therefore, Kubernetes restarted those pods. You can safely ignore this 263 message. 264 265 1. Verify that the Pachyderm cluster is up and running: 266 267 ```shell 268 pachctl version 269 ``` 270 271 **System Response:** 272 273 ```shell 274 275 COMPONENT VERSION 276 pachctl 1.9.7 277 pachd 1.9.7 278 ``` 279 280 * If you want to access the Pachyderm UI or use S3 gateway, you need to 281 forward Pachyderm ports. Open a new terminal window and run the 282 following command: 283 284 ```shell 285 pachctl port-forward 286 ```