github.com/pachyderm/pachyderm@v1.13.4/doc/docs/1.10.x/deploy-manage/deploy/on_premises.md

github.com/pachyderm/pachyderm@v1.13.4/doc/docs/1.10.x/deploy-manage/deploy/on_premises.md (about)

     1  # On Premises
     2  
     3  This document is broken down into the following sections, available at the links below
     4  
     5  - [Introduction to on-premises deployments](#introduction) takes you through what you need to know about Kubernetes, persistent volumes, object stores and best practices.  That's this page.
     6  - [Customizing your Pachyderm deployment for on-premises use](deploy_custom/index.md) details the various options of the `pachctl deploy custom ...` command for an on-premises deployment.
     7  - [Single-node Pachyderm deployment](./single-node.md) is the document you should read when deploying Pachyderm for personal, low-volume usage.
     8  - [Registries](./docker_registries.md) takes you through on-premises, private Docker registry configuration.
     9  - [Ingress](./configuring_k8s_ingress.md) details the Kubernetes ingress configuration you'd need for using `pachctl` and the dashboard outside of the Kubernetes cluster
    10  - [Non-cloud object stores](./non-cloud-object-stores.md) discusses common configurations for on-premises object stores.
    11  
    12  Need information on a particular flavor of Kubernetes or object store?  Check out the [see also](#see-also) section.
    13  
    14  Troubleshooting a deployment? Check out [Troubleshooting Deployments](../../troubleshooting/deploy_troubleshooting.md).
    15  
    16  ## Introduction
    17  
    18  Deploying Pachyderm successfully on-premises requires a few prerequisites and some planning.
    19  Pachyderm is built on [Kubernetes](https://kubernetes.io/).
    20  Before you can deploy Pachyderm, you or your Kubernetes administrator will need to perform the following actions:
    21  
    22  1. [Deploy Kubernetes](#deploying-kubernetes) on-premises.
    23  1. [Deploy a Kubernetes persistent volume](#deploying-a-persistent-volume) that Pachyderm will use to store administrative data.
    24  1. [Deploy an on-premises object store](#deploying-an-object-store) using a storage provider like [MinIO](https://min.io), [EMC's ECS](https://www.dellemc.com/storage/ecs/index.htm), or [SwiftStack](https://www.swiftstack.com/) to provide S3-compatible access to your on-premises storage.
    25  1. [Create a Pachyderm manifest](deploy_custom/deploy_custom_pachyderm_deployment_manifest.md) by running the `pachctl deploy custom` command with appropriate arguments and the `--dry-run` flag to create a Kubernetes manifest for the Pachyderm deployment.
    26  1. [Edit the Pachyderm manifest](deploy_custom/deploy_custom_pachyderm_deployment_manifest.md) for your particular Kubernetes deployment
    27  
    28  In this series of documents, we'll take you through the steps unique to Pachyderm.
    29  We assume you have some Kubernetes knowledge.
    30  We will point you to external resources for the general Kubernetes steps to give you background.
    31  
    32  ## Best practices
    33  ### Infrastructure as code
    34  
    35  We highly encourage you to apply the best practices used in developing software to managing the deployment process.
    36  
    37  1. Create scripts that automate as much of your processes as possible and keep them under version control.
    38  1. Keep copies of all artifacts, such as manifests, produced by those scripts and keep those under version control.
    39  1. Document your practices in the code and outside it.
    40  
    41  ### Infrastructure in general
    42  
    43  Be sure that you design your Kubernetes infrastructure in accordance with recommended guidelines.
    44  Don't mix on-premises Kubernetes and cloud-based storage.
    45  It's important that bandwidth to your storage deployment meet the guidelines of your storage provider.
    46  
    47  ## Prerequisites
    48  
    49  ### Software you will need 
    50      
    51  1. [kubectl](https://kubernetes.io/docs/user-guide/prereqs/)
    52  2. [pachctl](../../../../../getting_started/local_installation/#install-pachctl)
    53  
    54  ## Setting up to deploy on-premises
    55  
    56  ### Deploying Kubernetes
    57  
    58  The Kubernetes docs have instructions for [deploying Kubernetes in a variety of on-premise scenarios](https://kubernetes.io/docs/getting-started-guides/#on-premises-vms).
    59  We recommend following one of these guides to get Kubernetes running on premise.
    60  
    61  ### Deploying a persistent volume
    62  
    63  #### Persistent volumes: how do they work?
    64  
    65  A Kubernetes [persistent volume](https://kubernetes.io/docs/concepts/storage/persistent-volumes/) is used by Pachyderm's `etcd` for storage of system metatada. 
    66  In Kubernetes, [persistent volumes](https://kubernetes.io/docs/concepts/storage/persistent-volumes/) are a mechanism for providing storage for consumption by the users of the cluster.
    67  They are provisioned by the cluster administrators.
    68  In a typical enterprise Kubernetes deployment, the administrators have configured persistent volumes that your Pachyderm deployment will consume by means of a [persistent volume claim](https://kubernetes.io/docs/concepts/storage/persistent-volumes/#persistentvolumeclaims) in the Pachyderm manifest you generate.
    69  
    70  You can deploy PV's to Pachyderm using our command-line arguments in three ways: using a static PV, with StatefulSets, or with StatefulSets using a StorageClass.
    71  
    72  If your administrators are using [selectors](https://kubernetes.io/docs/concepts/storage/persistent-volumes/#selector), or you want to use StorageClasses in a different way, you'll need to [edit the Pachyderm manifest](../deploy_custom/deploy_custom_pachyderm_deployment_manifest) appropriately before applying it.
    73  
    74  ##### Static PV
    75  
    76  In this case, `etcd` will be deployed in Pachyderm as a [ReplicationController](https://kubernetes.io/docs/concepts/workloads/controllers/replicationcontroller/) with one (1) pod that uses a static PV. This is a common deployment for testing. 
    77  
    78  ##### StatefulSets
    79  
    80  [StatefulSets](https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/) are a mechanism provided in  Kubernetes 1.9 and newer to manage the deployment and scaling of applications.  It uses either [Persistent Volume Provisioning](https://github.com/kubernetes/examples/blob/master/staging/persistent-volume-provisioning/README.md) or pre-provisioned PV's. 
    81  
    82  If you're using StatefulSets in your Kubernetes cluster, you will need to find out the particulars of your cluster's PV configuration and [use appropriate flags to `pachctl deploy custom`](#configuring-with-statefulsets)
    83  
    84  ##### StorageClasses 
    85  If your administrators require specification of [classes](https://kubernetes.io/docs/concepts/storage/persistent-volumes/#class-1) to consume persistent volumes, 
    86  you will need to find out the particulars of your cluster's PV configuration and [use appropriate flags to `pachctl deploy custom`](#configuring-with-statefulsets-using-storageclasses).
    87  
    88  #### Common tasks to all types of PV deployments
    89  ##### Sizing the PV
    90  
    91  You'll need to use a PV with enough space for the metadata associated with the data you plan to store in Pachyderm. 
    92  We're currently developing good rules of thumb for scaling this storage as your Pachyderm deployment grows,
    93  but it looks like 10G of disk space is sufficient for most purposes.
    94  
    95  ##### Creating the PV
    96  
    97  In the case of cloud-based deployments, the `pachctl deploy` command for AWS, GCP and Azure creates persistent volumes for you, when you follow the instructions for those infrastructures.
    98  
    99  In the case of on-premises deployments, the kind of PV you provision will be dependent on what kind of storage your Kubernetes administrators have attached to your cluster and configured, and whether you are expected to consume that storage as a static PV, with Persistent Volume Provisioning  or as a StorageClass.
   100  
   101  For example, many on-premises deployments use Network File System (NFS) to access to some kind of enterprise storage.
   102  Persistent volumes are provisioned in Kubernetes like all things in Kubernetes: by means of a manifest.
   103  You can learn about creating [volumes](https://kubernetes.io/docs/concepts/storage/volumes/)  and [persistent volumes](https://kubernetes.io/docs/concepts/storage/persistent-volumes/) in the Kubernetes documentation.
   104  
   105  You or your Kubernetes administrators will be responsible for configuring the PVs you create to be consumable as static PV's, with Persistent Volume Provisioning or as a StorageClass.
   106  
   107  #### What you'll need for Pachyderm configuration of PV storage
   108  
   109  Keep the information below at hand for when you [run `pachctl deploy custom` further on](deploy_custom/index.md)
   110  
   111  ##### Configuring with static volumes
   112  
   113  You'll need the name of the PV and the amount of space you can use, in gigabytes.
   114  We'll refer to those, respectively, as `PVC_STORAGE_NAME` and `PVC_STORAGE_SIZE` further on.
   115  With this kind of PV,
   116  you'll use the flag `--static-etcd-volume` with `PVC_STORAGE_NAME` as its argument in your deployment.
   117  
   118  Note: this will override any attempt to configure with StorageClasses, below.
   119  
   120  ##### Configuring with StatefulSets
   121  
   122  If you're deploying using [StatefulSets](#statefulsets),
   123  you'll just need the amount of space you can use, in gigabytes, 
   124  which we'll refer to as `PVC_STORAGE_SIZE` further on..
   125  
   126  Note: The `--etcd-storage-class` flag and argument will be ignored if you use the flag `--static-etcd-volume` along with it.
   127  
   128  ##### Configuring with StatefulSets using StorageClasses
   129  
   130  If you're deploying using [StatefulSets](#statefulsets) with [StorageClasses](#storageclasses), 
   131  you'll need the name of the storage class and the amount of space you can use, in gigabytes.
   132  We'll refer to those, respectively, as `PVC_STORAGECLASS` and `PVC_STORAGE_SIZE` further on.
   133  With this kind of PV,
   134  you'll use the flag `--etcd-storage-class` with `PVC_STORAGECLASS` as its argument in your deployment. 
   135  
   136  Note: The `--etcd-storage-class` flag and argument will be ignored if you use the flag `--static-etcd-volume` along with it.
   137  
   138     
   139  ### Deploying an object store
   140  
   141  #### Object store: what's it for?
   142  An object store is used by Pachyderm's `pachd` for storing all your data. 
   143  The object store you use must be accessible via a low-latency, high-bandwidth connection like [Gigabit](https://en.wikipedia.org/wiki/Gigabit_Ethernet)  or [10G Ethernet](https://en.wikipedia.org/wiki/10_Gigabit_Ethernet).
   144  
   145  For an on-premises deployment, 
   146  it's not advisable to use a cloud-based storage mechanism.
   147  Don't deploy an on-premises Pachyderm cluster against cloud-based object stores such as S3 from [AWS](amazon_web_services/index.md), GCS from [Google Cloud Platform](google_cloud_platform.md), Azure Blob Storage from [Azure](azure.md). Note that the command line parameters for the object store (`--object-store`) are specifying `s3` in reference to the S3 protocol (which is used by solutions such as MinIO and the like) and not the Amazon product with the same name.
   148  
   149  #### Object store prerequisites
   150  
   151  Object stores are accessible using the S3 protocol, created by Amazon. 
   152  Storage providers like [MinIO](https://min.io), [EMC's ECS](https://www.dellemc.com/storage/ecs/index.htm), or [SwiftStack](https://www.swiftstack.com/) provide S3-compatible access to enterprise storage for on-premises deployment. 
   153  You can find links to instructions for providers of particular object stores in the [See also](#see-also) section.
   154  
   155  #### Sizing the object store
   156  
   157  Size your object store generously.
   158  Once you start using Pachyderm, you'll start versioning all your data.
   159  We're currently developing good rules of thumb for scaling your object store as your Pachyderm deployment grows,
   160  but it's a good idea to start with a large multiple of your current data set size.
   161  
   162  #### What you'll need for Pachyderm configuration of the object store
   163  You'll need four items to configure the object store. 
   164  We're prefixing each item with how we'll refer to it further on.
   165  
   166  1. `OS_ENDPOINT`: The access endpoint.
   167     For example, MinIO's endpoints are usually something like `minio-server:9000`. 
   168     Don't begin it with the protocol; it's an endpoint, not an url. Also, check if your object store (e.g. MinIO) is using SSL/TLS.
   169     If not, disable it using `--disable-ssl`.
   170  1. `OS_BUCKET_NAME`: The bucket name you're dedicating to Pachyderm. Pachyderm will need exclusive access to this bucket.
   171  1. `OS_ACCESS_KEY_ID`: The access key id for the object store.  This is like a user name for logging into the object store.
   172  1. `OS_SECRET_KEY`: The secret key for the object store.  This is like the above user's password.
   173  
   174  Keep this information handy.
   175  
   176  ### Next step: creating a custom deploy manifest for Pachyderm
   177  Once you have Kubernetes deployed, your persistent volume created, and your object store configured, it's time to [create the Pachyderm manifest for deploying to Kubernetes](./deploy_custom/index.md).
   178  
   179  ## See Also
   180  ### Kubernetes variants
   181  - [OpenShift](./openshift.md)
   182  ### Object storage variants
   183  - [EMC ECS](./non-cloud-object-stores.md#emc-ecs)
   184  - [MinIO](./non-cloud-object-stores.md#minio)
   185  - [SwiftStack](./non-cloud-object-stores.md#swiftstack)