github.com/pachyderm/pachyderm@v1.13.4/doc/docs/1.10.x/deploy-manage/deploy/on_premises.md (about) 1 # On Premises 2 3 This document is broken down into the following sections, available at the links below 4 5 - [Introduction to on-premises deployments](#introduction) takes you through what you need to know about Kubernetes, persistent volumes, object stores and best practices. That's this page. 6 - [Customizing your Pachyderm deployment for on-premises use](deploy_custom/index.md) details the various options of the `pachctl deploy custom ...` command for an on-premises deployment. 7 - [Single-node Pachyderm deployment](./single-node.md) is the document you should read when deploying Pachyderm for personal, low-volume usage. 8 - [Registries](./docker_registries.md) takes you through on-premises, private Docker registry configuration. 9 - [Ingress](./configuring_k8s_ingress.md) details the Kubernetes ingress configuration you'd need for using `pachctl` and the dashboard outside of the Kubernetes cluster 10 - [Non-cloud object stores](./non-cloud-object-stores.md) discusses common configurations for on-premises object stores. 11 12 Need information on a particular flavor of Kubernetes or object store? Check out the [see also](#see-also) section. 13 14 Troubleshooting a deployment? Check out [Troubleshooting Deployments](../../troubleshooting/deploy_troubleshooting.md). 15 16 ## Introduction 17 18 Deploying Pachyderm successfully on-premises requires a few prerequisites and some planning. 19 Pachyderm is built on [Kubernetes](https://kubernetes.io/). 20 Before you can deploy Pachyderm, you or your Kubernetes administrator will need to perform the following actions: 21 22 1. [Deploy Kubernetes](#deploying-kubernetes) on-premises. 23 1. [Deploy a Kubernetes persistent volume](#deploying-a-persistent-volume) that Pachyderm will use to store administrative data. 24 1. [Deploy an on-premises object store](#deploying-an-object-store) using a storage provider like [MinIO](https://min.io), [EMC's ECS](https://www.dellemc.com/storage/ecs/index.htm), or [SwiftStack](https://www.swiftstack.com/) to provide S3-compatible access to your on-premises storage. 25 1. [Create a Pachyderm manifest](deploy_custom/deploy_custom_pachyderm_deployment_manifest.md) by running the `pachctl deploy custom` command with appropriate arguments and the `--dry-run` flag to create a Kubernetes manifest for the Pachyderm deployment. 26 1. [Edit the Pachyderm manifest](deploy_custom/deploy_custom_pachyderm_deployment_manifest.md) for your particular Kubernetes deployment 27 28 In this series of documents, we'll take you through the steps unique to Pachyderm. 29 We assume you have some Kubernetes knowledge. 30 We will point you to external resources for the general Kubernetes steps to give you background. 31 32 ## Best practices 33 ### Infrastructure as code 34 35 We highly encourage you to apply the best practices used in developing software to managing the deployment process. 36 37 1. Create scripts that automate as much of your processes as possible and keep them under version control. 38 1. Keep copies of all artifacts, such as manifests, produced by those scripts and keep those under version control. 39 1. Document your practices in the code and outside it. 40 41 ### Infrastructure in general 42 43 Be sure that you design your Kubernetes infrastructure in accordance with recommended guidelines. 44 Don't mix on-premises Kubernetes and cloud-based storage. 45 It's important that bandwidth to your storage deployment meet the guidelines of your storage provider. 46 47 ## Prerequisites 48 49 ### Software you will need 50 51 1. [kubectl](https://kubernetes.io/docs/user-guide/prereqs/) 52 2. [pachctl](../../../../../getting_started/local_installation/#install-pachctl) 53 54 ## Setting up to deploy on-premises 55 56 ### Deploying Kubernetes 57 58 The Kubernetes docs have instructions for [deploying Kubernetes in a variety of on-premise scenarios](https://kubernetes.io/docs/getting-started-guides/#on-premises-vms). 59 We recommend following one of these guides to get Kubernetes running on premise. 60 61 ### Deploying a persistent volume 62 63 #### Persistent volumes: how do they work? 64 65 A Kubernetes [persistent volume](https://kubernetes.io/docs/concepts/storage/persistent-volumes/) is used by Pachyderm's `etcd` for storage of system metatada. 66 In Kubernetes, [persistent volumes](https://kubernetes.io/docs/concepts/storage/persistent-volumes/) are a mechanism for providing storage for consumption by the users of the cluster. 67 They are provisioned by the cluster administrators. 68 In a typical enterprise Kubernetes deployment, the administrators have configured persistent volumes that your Pachyderm deployment will consume by means of a [persistent volume claim](https://kubernetes.io/docs/concepts/storage/persistent-volumes/#persistentvolumeclaims) in the Pachyderm manifest you generate. 69 70 You can deploy PV's to Pachyderm using our command-line arguments in three ways: using a static PV, with StatefulSets, or with StatefulSets using a StorageClass. 71 72 If your administrators are using [selectors](https://kubernetes.io/docs/concepts/storage/persistent-volumes/#selector), or you want to use StorageClasses in a different way, you'll need to [edit the Pachyderm manifest](../deploy_custom/deploy_custom_pachyderm_deployment_manifest) appropriately before applying it. 73 74 ##### Static PV 75 76 In this case, `etcd` will be deployed in Pachyderm as a [ReplicationController](https://kubernetes.io/docs/concepts/workloads/controllers/replicationcontroller/) with one (1) pod that uses a static PV. This is a common deployment for testing. 77 78 ##### StatefulSets 79 80 [StatefulSets](https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/) are a mechanism provided in Kubernetes 1.9 and newer to manage the deployment and scaling of applications. It uses either [Persistent Volume Provisioning](https://github.com/kubernetes/examples/blob/master/staging/persistent-volume-provisioning/README.md) or pre-provisioned PV's. 81 82 If you're using StatefulSets in your Kubernetes cluster, you will need to find out the particulars of your cluster's PV configuration and [use appropriate flags to `pachctl deploy custom`](#configuring-with-statefulsets) 83 84 ##### StorageClasses 85 If your administrators require specification of [classes](https://kubernetes.io/docs/concepts/storage/persistent-volumes/#class-1) to consume persistent volumes, 86 you will need to find out the particulars of your cluster's PV configuration and [use appropriate flags to `pachctl deploy custom`](#configuring-with-statefulsets-using-storageclasses). 87 88 #### Common tasks to all types of PV deployments 89 ##### Sizing the PV 90 91 You'll need to use a PV with enough space for the metadata associated with the data you plan to store in Pachyderm. 92 We're currently developing good rules of thumb for scaling this storage as your Pachyderm deployment grows, 93 but it looks like 10G of disk space is sufficient for most purposes. 94 95 ##### Creating the PV 96 97 In the case of cloud-based deployments, the `pachctl deploy` command for AWS, GCP and Azure creates persistent volumes for you, when you follow the instructions for those infrastructures. 98 99 In the case of on-premises deployments, the kind of PV you provision will be dependent on what kind of storage your Kubernetes administrators have attached to your cluster and configured, and whether you are expected to consume that storage as a static PV, with Persistent Volume Provisioning or as a StorageClass. 100 101 For example, many on-premises deployments use Network File System (NFS) to access to some kind of enterprise storage. 102 Persistent volumes are provisioned in Kubernetes like all things in Kubernetes: by means of a manifest. 103 You can learn about creating [volumes](https://kubernetes.io/docs/concepts/storage/volumes/) and [persistent volumes](https://kubernetes.io/docs/concepts/storage/persistent-volumes/) in the Kubernetes documentation. 104 105 You or your Kubernetes administrators will be responsible for configuring the PVs you create to be consumable as static PV's, with Persistent Volume Provisioning or as a StorageClass. 106 107 #### What you'll need for Pachyderm configuration of PV storage 108 109 Keep the information below at hand for when you [run `pachctl deploy custom` further on](deploy_custom/index.md) 110 111 ##### Configuring with static volumes 112 113 You'll need the name of the PV and the amount of space you can use, in gigabytes. 114 We'll refer to those, respectively, as `PVC_STORAGE_NAME` and `PVC_STORAGE_SIZE` further on. 115 With this kind of PV, 116 you'll use the flag `--static-etcd-volume` with `PVC_STORAGE_NAME` as its argument in your deployment. 117 118 Note: this will override any attempt to configure with StorageClasses, below. 119 120 ##### Configuring with StatefulSets 121 122 If you're deploying using [StatefulSets](#statefulsets), 123 you'll just need the amount of space you can use, in gigabytes, 124 which we'll refer to as `PVC_STORAGE_SIZE` further on.. 125 126 Note: The `--etcd-storage-class` flag and argument will be ignored if you use the flag `--static-etcd-volume` along with it. 127 128 ##### Configuring with StatefulSets using StorageClasses 129 130 If you're deploying using [StatefulSets](#statefulsets) with [StorageClasses](#storageclasses), 131 you'll need the name of the storage class and the amount of space you can use, in gigabytes. 132 We'll refer to those, respectively, as `PVC_STORAGECLASS` and `PVC_STORAGE_SIZE` further on. 133 With this kind of PV, 134 you'll use the flag `--etcd-storage-class` with `PVC_STORAGECLASS` as its argument in your deployment. 135 136 Note: The `--etcd-storage-class` flag and argument will be ignored if you use the flag `--static-etcd-volume` along with it. 137 138 139 ### Deploying an object store 140 141 #### Object store: what's it for? 142 An object store is used by Pachyderm's `pachd` for storing all your data. 143 The object store you use must be accessible via a low-latency, high-bandwidth connection like [Gigabit](https://en.wikipedia.org/wiki/Gigabit_Ethernet) or [10G Ethernet](https://en.wikipedia.org/wiki/10_Gigabit_Ethernet). 144 145 For an on-premises deployment, 146 it's not advisable to use a cloud-based storage mechanism. 147 Don't deploy an on-premises Pachyderm cluster against cloud-based object stores such as S3 from [AWS](amazon_web_services/index.md), GCS from [Google Cloud Platform](google_cloud_platform.md), Azure Blob Storage from [Azure](azure.md). Note that the command line parameters for the object store (`--object-store`) are specifying `s3` in reference to the S3 protocol (which is used by solutions such as MinIO and the like) and not the Amazon product with the same name. 148 149 #### Object store prerequisites 150 151 Object stores are accessible using the S3 protocol, created by Amazon. 152 Storage providers like [MinIO](https://min.io), [EMC's ECS](https://www.dellemc.com/storage/ecs/index.htm), or [SwiftStack](https://www.swiftstack.com/) provide S3-compatible access to enterprise storage for on-premises deployment. 153 You can find links to instructions for providers of particular object stores in the [See also](#see-also) section. 154 155 #### Sizing the object store 156 157 Size your object store generously. 158 Once you start using Pachyderm, you'll start versioning all your data. 159 We're currently developing good rules of thumb for scaling your object store as your Pachyderm deployment grows, 160 but it's a good idea to start with a large multiple of your current data set size. 161 162 #### What you'll need for Pachyderm configuration of the object store 163 You'll need four items to configure the object store. 164 We're prefixing each item with how we'll refer to it further on. 165 166 1. `OS_ENDPOINT`: The access endpoint. 167 For example, MinIO's endpoints are usually something like `minio-server:9000`. 168 Don't begin it with the protocol; it's an endpoint, not an url. Also, check if your object store (e.g. MinIO) is using SSL/TLS. 169 If not, disable it using `--disable-ssl`. 170 1. `OS_BUCKET_NAME`: The bucket name you're dedicating to Pachyderm. Pachyderm will need exclusive access to this bucket. 171 1. `OS_ACCESS_KEY_ID`: The access key id for the object store. This is like a user name for logging into the object store. 172 1. `OS_SECRET_KEY`: The secret key for the object store. This is like the above user's password. 173 174 Keep this information handy. 175 176 ### Next step: creating a custom deploy manifest for Pachyderm 177 Once you have Kubernetes deployed, your persistent volume created, and your object store configured, it's time to [create the Pachyderm manifest for deploying to Kubernetes](./deploy_custom/index.md). 178 179 ## See Also 180 ### Kubernetes variants 181 - [OpenShift](./openshift.md) 182 ### Object storage variants 183 - [EMC ECS](./non-cloud-object-stores.md#emc-ecs) 184 - [MinIO](./non-cloud-object-stores.md#minio) 185 - [SwiftStack](./non-cloud-object-stores.md#swiftstack)