github.com/pachyderm/pachyderm@v1.13.4/doc/docs/master/deploy-manage/deploy/custom_object_stores.md (about) 1 # Custom Object Stores 2 3 In other sections of this guide was have demonstrated how to deploy Pachyderm in a single cloud using that cloud's object store offering. However, Pachyderm can be backed by any object store, and you are not restricted to the object store service provided by the cloud in which you are deploying. 4 5 As long as you are running an object store that has an S3 compatible API, you can easily deploy Pachyderm in a way that will allow you to back Pachyderm by that object store. For example, we have seen Pachyderm be backed by [Minio](https://minio.io/), [GlusterFS](https://www.gluster.org/), [Ceph](http://ceph.com/), and more. 6 7 To deploy Pachyderm with your choice of object store in Google, Azure, or AWS, see the below guides. To deploy Pachyderm on premise with a custom object store, see the [on premise docs](http://pachyderm.readthedocs.io/en/stable/deployment/on_premises.html). 8 9 ## Common Prerequisites 10 11 1. A working Kubernetes cluster and `kubectl`. 12 2. An account on or running instance of an object store with an S3 compatible API. You should be able to get an ID, secret, bucket name, and endpoint that point to this object store. 13 14 ## Google + Custom Object Store 15 16 Additional prerequisites: 17 18 - [Google Cloud SDK](https://cloud.google.com/sdk/) >= 124.0.0 - If this is the first time you use the SDK, make sure to follow the [quick start guide](https://cloud.google.com/sdk/docs/quickstarts). 19 20 First, we need to create a persistent disk for Pachyderm's metadata: 21 22 ```shell 23 # Name this whatever you want, we chose pach-disk as a default 24 STORAGE_NAME=pach-disk 25 26 # For a demo you should only need 10 GB. This stores PFS metadata. For reference, 1GB 27 # should work for 1000 commits on 1000 files. 28 STORAGE_SIZE=[the size of the volume that you are going to create, in GBs. e.g. "10"] 29 30 # Create the disk. 31 gcloud compute disks create --size=${STORAGE_SIZE}GB ${STORAGE_NAME} 32 ``` 33 34 Then we can deploy Pachyderm: 35 36 ```shell 37 pachctl deploy custom --persistent-disk google --object-store s3 ${STORAGE_NAME} ${STORAGE_SIZE} <object store bucket> <object store id> <object store secret> <object store endpoint> --static-etcd-volume=${STORAGE_NAME} 38 ``` 39 40 ## AWS + Custom Object Store 41 42 Additional prerequisites: 43 44 - [AWS CLI](https://aws.amazon.com/cli/) - have it installed and have your [AWS credentials](http://docs.aws.amazon.com/cli/latest/userguide/cli-chap-getting-started.html) configured. 45 46 First, we need to create a persistent disk for Pachyderm's metadata: 47 48 ```shell 49 # We recommend between 1 and 10 GB. This stores PFS metadata. For reference 1GB 50 # should work for 1000 commits on 1000 files. 51 STORAGE_SIZE=[the size of the EBS volume that you are going to create, in GBs. e.g. "10"] 52 53 AWS_REGION=[the AWS region of your Kubernetes cluster. e.g. "us-west-2" (not us-west-2a)] 54 55 AWS_AVAILABILITY_ZONE=[the AWS availability zone of your Kubernetes cluster. e.g. "us-west-2a"] 56 57 # Create the volume. 58 aws ec2 create-volume --size ${STORAGE_SIZE} --region ${AWS_REGION} --availability-zone ${AWS_AVAILABILITY_ZONE} --volume-type gp2 59 60 # Store the volume ID. 61 aws ec2 describe-volumes 62 STORAGE_NAME=[volume id] 63 ``` 64 65 The we can deploy Pachyderm: 66 67 ```shell 68 pachctl deploy custom --persistent-disk aws --object-store s3 ${STORAGE_NAME} ${STORAGE_SIZE} <object store bucket> <object store id> <object store secret> <object store endpoint> --static-etcd-volume=${STORAGE_NAME} 69 ``` 70 71 ## Azure + Custom Object Store 72 73 Additional prerequisites: 74 75 - Install [Azure CLI](https://docs.microsoft.com/en-us/cli/azure/install-azure-cli) >= 2.0.1 76 - Install [jq](https://stedolan.github.io/jq/download/) 77 - Clone github.com/pachyderm/pachyderm and work from the root of that project. 78 79 First, we need to create a persistent disk for Pachyderm's metadata. To do this, start by declaring some environmental variables: 80 81 ```shell 82 # Needs to be globally unique across the entire Azure location 83 RESOURCE_GROUP=[The name of the resource group where the Azure resources will be organized] 84 85 LOCATION=[The Azure region of your Kubernetes cluster. e.g. "West US2"] 86 87 # Needs to be globally unique across the entire Azure location 88 STORAGE_ACCOUNT=[The name of the storage account where your data will be stored] 89 90 # Needs to end in a ".vhd" extension 91 STORAGE_NAME=pach-disk.vhd 92 93 # We recommend between 1 and 10 GB. This stores PFS metadata. For reference 1GB 94 # should work for 1000 commits on 1000 files. 95 STORAGE_SIZE=[the size of the data disk volume that you are going to create, in GBs. e.g. "10"] 96 ``` 97 98 And then run: 99 100 ```shell 101 # Create a resource group 102 az group create --name=${RESOURCE_GROUP} --location=${LOCATION} 103 104 # Create azure storage account 105 az storage account create \ 106 --resource-group="${RESOURCE_GROUP}" \ 107 --location="${LOCATION}" \ 108 --sku=Standard_LRS \ 109 --name="${STORAGE_ACCOUNT}" \ 110 --kind=Storage 111 112 # Build microsoft tool for creating Azure VMs from an image 113 STORAGE_KEY="$(az storage account keys list \ 114 --account-name="${STORAGE_ACCOUNT}" \ 115 --resource-group="${RESOURCE_GROUP}" \ 116 --output=json \ 117 | jq .[0].value -r 118 )" 119 make docker-build-microsoft-vhd 120 VOLUME_URI="$(docker run -it microsoft_vhd \ 121 "${STORAGE_ACCOUNT}" \ 122 "${STORAGE_KEY}" \ 123 "${CONTAINER_NAME}" \ 124 "${STORAGE_NAME}" \ 125 "${STORAGE_SIZE}G" 126 )" 127 ``` 128 129 To check that everything has been setup correctly, try: 130 131 ```shell 132 az storage account list | jq '.[].name' 133 ``` 134 135 The we can deploy Pachyderm: 136 137 ```shell 138 pachctl deploy custom --persistent-disk azure --object-store s3 ${VOLUME_URI} ${STORAGE_SIZE} <object store bucket> <object store id> <object store secret> <object store endpoint> --static-etcd-volume=${VOLUME_URI} 139 ```