github.com/pachyderm/pachyderm@v1.13.4/doc/docs/1.9.x/deploy-manage/deploy/amazon_web_services/aws-deploy-pachyderm.md (about)

     1  # Deploy Pachyderm on AWS
     2  
     3  After you deploy Kubernetes cluster by using `kops` or `eksctl`,
     4  you can deploy Pachyderm on top of that cluster.
     5  
     6  You need to complete the following steps to deploy Pachyderm:
     7  
     8  1. Install `pachctl` as described in [Install pachctl](../../../../getting_started/local_installation#install-pachctl).
     9  1. Add stateful storage for Pachyderm as described in [Add Stateful Storage](#add-stateful-storage).
    10  1. Deploy Pachyderm by using an [IAM role](#deploy-pachyderm-with-an-iam-role)
    11  (recommended) or [an access key](#deploy-pachyderm-with-an-access-key).
    12  
    13  ## Add Stateful Storage
    14  
    15  Pachyderm requires the following types of persistent storage:
    16  
    17  An S3 object store bucket for data. The S3 bucket name
    18   must be globally unique across the whole
    19   Amazon region. Therefore, add a descriptive prefix to the S3 bucket
    20   name, such as your username.
    21  
    22  An Elastic Block Storage (EBS) persistent volume (PV) for Pachyderm
    23   metadata. Pachyderm recommends that you assign at least 10 GB for this
    24   persistent EBS volume. If you expect your cluster to be very
    25   long running a scale to thousands of jobs per commits, you might
    26   need to go add more storage. However, you can easily increase the
    27   size of the persistent volume later.
    28  
    29  To add stateful storage, complete the following steps:
    30  
    31  1. Set up the following system variables:
    32  
    33     * `BUCKET_NAME` — A globally unique S3 bucket name.
    34     * `STORAGE_SIZE` — The size of the persistent volume in GB. For example, `10`.
    35     * `AWS_REGION` — The AWS region of your Kubernetes cluster. For example,
    36     `us-west-2` and not `us-west-2a`.
    37    
    38  
    39  1. Create an S3 bucket:
    40  
    41     * If you are creating an S3 bucket in the `us-east-1` region, run the following
    42     command:
    43  
    44       ```shell
    45       $ aws s3api create-bucket --bucket ${BUCKET_NAME} --region ${AWS_REGION}
    46       ```
    47  
    48     * If you are creating an S3 bucket in any region but the `us-east-1`
    49     region, run the following command:
    50  
    51       ```shell
    52       $ aws s3api create-bucket --bucket ${BUCKET_NAME} --region ${AWS_REGION} --create-bucket-configuration LocationConstraint=${AWS_REGION}
    53       ```
    54  
    55  1. Verify that the S3 bucket was created:
    56  
    57     ```
    58     $ aws s3api list-buckets --query 'Buckets[].Name'
    59     ```
    60  
    61  ## Deploy Pachyderm with an IAM Role
    62  
    63  IAM roles provide better user management and security
    64  capabilities compared to access keys. If a malicious user gains access to
    65  an access key, your data might become compromised. Therefore, enterprises
    66  often opt out to use IAM roles rather than access keys for production
    67  deployments.
    68  
    69  You need to configure the following IAM settings:
    70  
    71  * The worker nodes on which Pachyderm is deployed must be associated
    72  with the IAM role that is assigned to the Kubernetes cluster.
    73  If you created your cluster by using `kops` or `eksctl`
    74  the nodes must have a dedicated IAM role already assigned.
    75  
    76  * The IAM role must have access to the S3 bucket that you created for
    77  Pachyderm.
    78  
    79  * The IAM role must have correct trust relationships.
    80  
    81    You need to set a system variable `IAM_ROLE` to the name
    82    of the IAM role that you will use to deploy the cluster.
    83    This role is different from the Role ARN or the Instance
    84    Profile ARN of the role. It is the actual role name.
    85  
    86  To deploy Pachyderm with an IAM role, complete the following steps:
    87  
    88  1. Find the IAM role assigned to the cluster:
    89  
    90     1. Go to the AWS Management console.
    91     1. Select an EC2 instance in the Kubernetes cluster.
    92     1. Click **Description**.
    93     1. Find the **IAM Role** field.
    94  
    95  1. Enable access to the S3 bucket for the IAM role:
    96  
    97     1. In the **IAM Role** field, click on the IAM role.
    98     1. In the **Permissions** tab, click **Edit policy**.
    99     1. Select the **JSON** tab.
   100     1. Append the following text to the end of the existing JSON:
   101  
   102        ```json
   103        {
   104            "Effect": "Allow",
   105                "Action": [
   106                    "s3:ListBucket"
   107                ],
   108                "Resource": [
   109                    "arn:aws:s3:::<your-bucket>"
   110                ]
   111        },
   112        {
   113            "Effect": "Allow",
   114            "Action": [
   115                "s3:PutObject",
   116            "s3:GetObject",
   117            "s3:DeleteObject"
   118            ],
   119            "Resource": [
   120                "arn:aws:s3:::<your-bucket>/*"
   121            ]
   122        }
   123        ```
   124  
   125        Replace `<your-bucket>` with the name of your S3 bucket.
   126  
   127        **Note:** For the EKS cluster, you might need to use the
   128        **Add inline policy** button and create a name for the new policy.
   129        The JSON above is inserted between the square brackets for the `Statement` element.
   130  
   131  1. Set up trust relationships for the IAM role:
   132  
   133     1. Click the **Trust relationships > Edit trust relationship**.
   134     1. Ensure that you see a statement with `sts:AssumeRole`. Example:
   135  
   136        ```json
   137        {
   138          "Version": "2012-10-17",
   139          "Statement": [
   140            {
   141              "Effect": "Allow",
   142              "Principal": {
   143                "Service": "ec2.amazonaws.com"
   144              },
   145              "Action": "sts:AssumeRole"
   146            }
   147          ]
   148        }
   149        ```
   150  
   151  1. Set the system variable `IAM_ROLE` to the IAM role name
   152     for the Pachyderm deployment.
   153  
   154  1. Deploy Pachyderm:
   155  
   156     ```shell
   157     $ pachctl deploy amazon ${BUCKET_NAME} ${AWS_REGION} ${STORAGE_SIZE} --dynamic-etcd-nodes=1 --iam-role ${IAM_ROLE}
   158     ```
   159  
   160     The deployment takes some time. You can run `kubectl get pods` periodically
   161     to check the status of deployment. When Pachyderm is deployed, the command
   162     shows all pods as `READY`:
   163  
   164     ```shell
   165     $ kubectl get pods
   166     NAME                     READY     STATUS    RESTARTS   AGE
   167     dash-6c9dc97d9c-89dv9    2/2       Running   0          1m
   168     etcd-0                   1/1       Running   0          4m
   169     pachd-65fd68d6d4-8vjq7   1/1       Running   0          4m
   170     ```
   171  
   172     **Note:** If you see a few restarts on the `pachd` nodes, it means that
   173     Kubernetes tried to bring up those pods before `etcd` was ready. Therefore,
   174     Kubernetes restarted those pods. You can safely ignore this message.
   175  
   176  1. Verify that the Pachyderm cluster is up and running:
   177  
   178     ```shell
   179     $ pachctl version
   180  
   181     COMPONENT           VERSION
   182     pachctl             1.9.7
   183     pachd               1.9.7
   184     ```
   185  
   186     * If you want to access the Pachyderm UI or use the S3 gateway, you need to
   187     forward Pachyderm ports. Open a new terminal window and run the
   188     following command:
   189  
   190       ```shell
   191       $ pachctl port-forward
   192       ```
   193  
   194  ## Deploy Pachyderm with an Access Key
   195  
   196  When you installed `kops`, you created a dedicated IAM
   197  user with access credentials such as an access key and
   198  secret key. You can deploy
   199  Pachyderm by using the credentials of this IAM user
   200  directly. However, deploying Pachyderm with an
   201  access key might not satisfy your enterprise security
   202  requirements. Therefore, deploying with an IAM role
   203  is preferred.
   204  
   205  To deploy Pachyderm with an access key, complete the following
   206  steps:
   207  
   208  1. Run the following command to deploy your Pachyderm cluster:
   209  
   210     ```shell
   211     $ pachctl deploy amazon ${BUCKET_NAME} ${AWS_REGION} ${STORAGE_SIZE} --dynamic-etcd-nodes=1 --credentials "${AWS_ACCESS_KEY_ID},${AWS_SECRET_ACCESS_KEY},"
   212     ```
   213  
   214     The `,` at the end of the `credentials` flag in the deploy
   215     command is for an optional temporary AWS token. You might use
   216     such a token if you are just experimenting with
   217     Pachyderm. However, do not use this token in a
   218     production deployment.
   219  
   220     The deployment takes some time. You can run `kubectl get pods` periodically
   221     to check the status of deployment. When Pachyderm is deployed, the command
   222     shows all pods as `READY`:
   223  
   224      ```shell
   225      $ kubectl get pods
   226      NAME                     READY     STATUS    RESTARTS   AGE
   227      dash-6c9dc97d9c-89dv9    2/2       Running   0          1m
   228      etcd-0                   1/1       Running   0          4m
   229      pachd-65fd68d6d4-8vjq7   1/1       Running   0          4m
   230      ```
   231  
   232      **Note:** If you see a few restarts on the `pachd` nodes, it means that
   233      Kubernetes tried to bring up those pods before `etcd` was ready.
   234      Therefore, Kubernetes restarted those pods. You can safely ignore this
   235      message.
   236  
   237  1. Verify that the Pachyderm cluster is up and running:
   238  
   239     ```shell
   240     $ pachctl version
   241  
   242     COMPONENT           VERSION
   243     pachctl             1.9.7
   244     pachd               1.9.7
   245     ```
   246  
   247     * If you want to access the Pachyderm UI or use S3 gateway, you need to
   248     forward Pachyderm ports. Open a new terminal window and run the
   249     following command:
   250  
   251       ```shell
   252       $ pachctl port-forward
   253       ```