github.com/pachyderm/pachyderm@v1.13.4/doc/docs/master/deploy-manage/deploy/custom_object_stores.md

github.com/pachyderm/pachyderm@v1.13.4/doc/docs/master/deploy-manage/deploy/custom_object_stores.md (about)

     1  # Custom Object Stores
     2  
     3  In other sections of this guide was have demonstrated how to deploy Pachyderm in a single cloud using that cloud's object store offering.  However, Pachyderm can be backed by any object store, and you are not restricted to the object store service provided by the cloud in which you are deploying.
     4  
     5  As long as you are running an object store that has an S3 compatible API, you can easily deploy Pachyderm in a way that will allow you to back Pachyderm by that object store.  For example, we have seen Pachyderm be backed by [Minio](https://minio.io/), [GlusterFS](https://www.gluster.org/), [Ceph](http://ceph.com/), and more.
     6  
     7  To deploy Pachyderm with your choice of object store in Google, Azure, or AWS, see the below guides.  To deploy Pachyderm on premise with a custom object store, see the [on premise docs](http://pachyderm.readthedocs.io/en/stable/deployment/on_premises.html).
     8  
     9  ## Common Prerequisites
    10  
    11  1. A working Kubernetes cluster and `kubectl`.
    12  2. An account on or running instance of an object store with an S3 compatible API.  You should be able to get an ID, secret, bucket name, and endpoint that point to this object store.
    13  
    14  ## Google + Custom Object Store
    15  
    16  Additional prerequisites:
    17  
    18  - [Google Cloud SDK](https://cloud.google.com/sdk/) >= 124.0.0 - If this is the first time you use the SDK, make sure to follow the [quick start guide](https://cloud.google.com/sdk/docs/quickstarts).
    19  
    20  First, we need to create a persistent disk for Pachyderm's metadata:
    21  
    22  ```shell
    23  # Name this whatever you want, we chose pach-disk as a default
    24  STORAGE_NAME=pach-disk
    25  
    26  # For a demo you should only need 10 GB. This stores PFS metadata. For reference, 1GB
    27  # should work for 1000 commits on 1000 files.
    28  STORAGE_SIZE=[the size of the volume that you are going to create, in GBs. e.g. "10"]
    29  
    30  # Create the disk.
    31  gcloud compute disks create --size=${STORAGE_SIZE}GB ${STORAGE_NAME}
    32  ```
    33  
    34  Then we can deploy Pachyderm:
    35  
    36  ```shell
    37  pachctl deploy custom --persistent-disk google --object-store s3 ${STORAGE_NAME} ${STORAGE_SIZE} <object store bucket> <object store id> <object store secret> <object store endpoint> --static-etcd-volume=${STORAGE_NAME}
    38  ```
    39  
    40  ## AWS + Custom Object Store
    41  
    42  Additional prerequisites:
    43  
    44  - [AWS CLI](https://aws.amazon.com/cli/) - have it installed and have your [AWS credentials](http://docs.aws.amazon.com/cli/latest/userguide/cli-chap-getting-started.html) configured.
    45  
    46  First, we need to create a persistent disk for Pachyderm's metadata:
    47  
    48  ```shell
    49  # We recommend between 1 and 10 GB. This stores PFS metadata. For reference 1GB
    50  # should work for 1000 commits on 1000 files.
    51  STORAGE_SIZE=[the size of the EBS volume that you are going to create, in GBs. e.g. "10"]
    52  
    53  AWS_REGION=[the AWS region of your Kubernetes cluster. e.g. "us-west-2" (not us-west-2a)]
    54  
    55  AWS_AVAILABILITY_ZONE=[the AWS availability zone of your Kubernetes cluster. e.g. "us-west-2a"]
    56  
    57  # Create the volume.
    58  aws ec2 create-volume --size ${STORAGE_SIZE} --region ${AWS_REGION} --availability-zone ${AWS_AVAILABILITY_ZONE} --volume-type gp2
    59  
    60  # Store the volume ID.
    61  aws ec2 describe-volumes
    62  STORAGE_NAME=[volume id]
    63  ```
    64  
    65  The we can deploy Pachyderm:
    66  
    67  ```shell
    68  pachctl deploy custom --persistent-disk aws --object-store s3 ${STORAGE_NAME} ${STORAGE_SIZE} <object store bucket> <object store id> <object store secret> <object store endpoint> --static-etcd-volume=${STORAGE_NAME}
    69  ```
    70  
    71  ## Azure + Custom Object Store
    72  
    73  Additional prerequisites:
    74  
    75  - Install [Azure CLI](https://docs.microsoft.com/en-us/cli/azure/install-azure-cli) >= 2.0.1
    76  - Install [jq](https://stedolan.github.io/jq/download/)
    77  - Clone github.com/pachyderm/pachyderm and work from the root of that project.
    78  
    79  First, we need to create a persistent disk for Pachyderm's metadata. To do this, start by declaring some environmental variables:
    80  
    81  ```shell
    82  # Needs to be globally unique across the entire Azure location
    83  RESOURCE_GROUP=[The name of the resource group where the Azure resources will be organized]
    84  
    85  LOCATION=[The Azure region of your Kubernetes cluster. e.g. "West US2"]
    86  
    87  # Needs to be globally unique across the entire Azure location
    88  STORAGE_ACCOUNT=[The name of the storage account where your data will be stored]
    89  
    90  # Needs to end in a ".vhd" extension
    91  STORAGE_NAME=pach-disk.vhd
    92  
    93  # We recommend between 1 and 10 GB. This stores PFS metadata. For reference 1GB
    94  # should work for 1000 commits on 1000 files.
    95  STORAGE_SIZE=[the size of the data disk volume that you are going to create, in GBs. e.g. "10"]
    96  ```
    97  
    98  And then run:
    99  
   100  ```shell
   101  # Create a resource group
   102  az group create --name=${RESOURCE_GROUP} --location=${LOCATION}
   103  
   104  # Create azure storage account
   105  az storage account create \
   106    --resource-group="${RESOURCE_GROUP}" \
   107    --location="${LOCATION}" \
   108    --sku=Standard_LRS \
   109    --name="${STORAGE_ACCOUNT}" \
   110    --kind=Storage
   111  
   112  # Build microsoft tool for creating Azure VMs from an image
   113  STORAGE_KEY="$(az storage account keys list \
   114                   --account-name="${STORAGE_ACCOUNT}" \
   115                   --resource-group="${RESOURCE_GROUP}" \
   116                   --output=json \
   117                   | jq .[0].value -r
   118                )"
   119  make docker-build-microsoft-vhd 
   120  VOLUME_URI="$(docker run -it microsoft_vhd \
   121                  "${STORAGE_ACCOUNT}" \
   122                  "${STORAGE_KEY}" \
   123                  "${CONTAINER_NAME}" \
   124                  "${STORAGE_NAME}" \
   125                  "${STORAGE_SIZE}G"
   126               )"
   127  ```
   128  
   129  To check that everything has been setup correctly, try:
   130  
   131  ```shell
   132  az storage account list | jq '.[].name'
   133  ```
   134  
   135  The we can deploy Pachyderm:
   136  
   137  ```shell
   138  pachctl deploy custom --persistent-disk azure --object-store s3 ${VOLUME_URI} ${STORAGE_SIZE} <object store bucket> <object store id> <object store secret> <object store endpoint> --static-etcd-volume=${VOLUME_URI}
   139  ```