github.com/pachyderm/pachyderm@v1.13.4/doc/docs/1.9.x/how-tos/ingressing_from_diff_cloud.md (about)

     1  # Ingress and Egress Data from an External Object Store
     2  
     3  Occasionally, you might need to download data from or upload data
     4  to an object store that runs in a different cloud platform. For example,
     5  you might be running a Pachyderm cluster in Microsoft Azure, but
     6  you need to ingress files from an S3 bucket that resides on Amazon AWS.
     7  
     8  You can configure Pachyderm to work with an external object
     9  store by using the following methods:
    10  
    11  * Ingress data from an external object store by using the
    12    `pachtl put file` with a URL to the S3 bucket. Example:
    13  
    14    ```
    15    $ pachctl put file repo@branch -f <s3://my_bucket/file>
    16    ```
    17  
    18  * Egress data to an external object store by configuring the
    19    `egress` files in the pipeline specification. Example:
    20  
    21    ```shell
    22    # pipeline.json
    23    "egress": {
    24      "URL": "s3://bucket/dir"
    25    ```
    26  
    27  ## Configure Credentials
    28  
    29  You can configure Pachyderm to ingress and egress from and to any
    30  number of supported cloud object stores, including Amazon S3,
    31  Microsoft Azure Blob storage, and Google Cloud Storage. You need
    32  to provide Pachyderm with the credentials to communicate with
    33  the selected cloud provider.
    34  
    35  The credentials are stored in a
    36  [Kubernetes secret](https://kubernetes.io/docs/concepts/configuration/secret/)
    37  and share the same security properties.
    38  
    39  !!! note
    40      For each cloud provider, parameters and configuration steps
    41      might vary.
    42  
    43  To provide Pachyderm with the object store credentials, complete
    44  the following steps:
    45  
    46  1. Deploy object storage:
    47  
    48     ```shell
    49     $ pachctl deploy storage <storage-provider> ...
    50     ```
    51  
    52  1. In the command above, specify `amazon`, `google`, or `microsoft` as
    53     a storage provider.
    54  
    55  1. Depending on the storage provider, configure the required
    56     parameters. Run `pachctl deploy storage <backend> --help` for more
    57     information.
    58  
    59     For example, if you select `amazon`, you need to specify the following
    60     parameters:
    61  
    62     ```shell
    63     $ pachctl deploy storage amazon <region> <access-key-id> <secret-access-key> [<session-token>]
    64     ```
    65  
    66  !!! note "See Also:"
    67      - [Custom Object Store](../deploy-manage/deploy/custom_object_stores.md)
    68      - [Create a Custom Pachyderm Deployment](../deploy-manage/deploy/deploy_custom/index.md)
    69      - [Pipeline Specification](../reference/pipeline_spec.md)