github.com/pachyderm/pachyderm@v1.13.4/doc/docs/master/deploy-manage/deploy/ibmcloud-openshift.md (about)

     1  # IBM Cloud OpenShift 3.1
     2  
     3  [OpenShift](https://www.openshift.com/products/openshift-ibm-cloud/) is a popular enterprise Kubernetes distribution.
     4  Pachyderm can run on IBM Cloud OpenShift with a few small tweaks in the [OpenShift deployment process](openshift.md), the most important being the StatefulSets deployment.
     5  
     6  Please see [known issues](#known-issues) below for currently issues with OpenShift deployments.
     7  
     8  ## Prerequisites
     9  
    10  Pachyderm needs a few things to install and run successfully in IBM Cloud OpenShift environment.
    11  
    12  ##### Binaries for CLI
    13  
    14  - [oc](https://cloud.ibm.com/docs/openshift?topic=openshift-openshift-cli#cli_oc)
    15  - [pachctl](#install-pachctl)
    16  
    17  1. Since this is a stateful set based deployment, it uses either Persistent Volume Provisioning or pre-provisioned PV's using a defined storage class.
    18  1. An object store, used by Pachyderm's `pachd` for storing all your data.
    19     The object store you use will probably be dependent on where you're going to run OpenShift: S3 for [AWS](https://pachyderm.readthedocs.io/en/latest/deployment/amazon_web_services.html), GCS for [Google Cloud Platform](https://pachyderm.readthedocs.io/en/latest/deployment/google_cloud_platform.html), Azure Blob Storage for  [Azure](https://pachyderm.readthedocs.io/en/latest/deployment/azure.html), or a storage provider like Minio, EMC's ECS or Swift providing S3-compatible access to enterprise storage for on-premises deployment.
    20  1. Access to particular TCP/IP ports for communication.
    21  
    22  ### Stateful Set
    23  
    24  You'll need a storage class as defined in IBM Cloud OpenShift cluster. You would need to specify the number of replica pods needed in the configuration.
    25  A custom deploy can also create storage. 
    26  
    27  We're currently developing good rules of thumb for scaling this storage as your Pachyderm deployment grows,
    28  but it looks like 10G of disk space is sufficient for most purposes.
    29  
    30  ### Object store
    31  
    32  Size your object store generously, since this is where Pachyderm, versions and stores all your data.
    33  You'll need four items to configure object storage, in this case we have used Minio.
    34  
    35  1. The access endpoint.
    36     For example, Minio's endpoints are usually something like `minio-server:9000`. 
    37     Don't begin it with the protocol; it's an endpoint, not an url.
    38  1. The bucket name you're dedicating to Pachyderm. Pachyderm will need exclusive access to this bucket.
    39  1. The access key id for the object store.  This is like a user name for logging into the object store.
    40  1. The secret key for the object store.  This is like the above user's password.
    41  
    42  ### TCP/IP ports
    43  
    44  For more details on how Kubernetes networking and service definitions work, see the [Kubernetes services 
    45  documentation](https://kubernetes.io/docs/concepts/services-networking/service/).
    46  
    47  #### Incoming ports (port)
    48  
    49  These are the ports internal to the containers. 
    50  You'll find these on both the pachd and dash containers.
    51  OpenShift runs containers and pods as unprivileged users which don't have access to port numbers below 1024.
    52  Pachyderm's default manifests use ports below 1024, so you'll have to modify the manifests to use other port numbers.
    53  It's usually as easy as adding a "1" in front of the port numbers we use.
    54  
    55  #### Pod ports (targetPort)
    56  
    57  This is the port exposed by the pod to Kubernetes, which is forwarded to the `port`.
    58  You should leave the `targetPort` set at `0` so it will match the `port` definition. 
    59  
    60  #### External ports (nodePorts)
    61  
    62  This is the port accessible from outside of Kubernetes.
    63  You probably don't need to change `nodePort` values unless your network security requirements or architecture requires you to change to another method of access. 
    64  Please see the [Kubernetes services documentation](https://kubernetes.io/docs/concepts/services-networking/service/) for details.
    65  
    66  ## The OCPify script
    67  
    68  A bash script that automates many of the substitutions below is available [at this gist](https://gist.github.com/gabrielgrant/86c1a5b590ae3f4b3fd32d7e9d622dc8). 
    69  You can use it to modify a manifest created using the `--dry-run` flag to `pachctl deploy custom`, as detailed below, and then use this guide to ensure the modifications it makes are relevant to your OpenShift environment.
    70  It requires certain prerequisites, just as [jq](https://github.com/stedolan/jq) and [sponge, found in moreutils](https://joeyh.name/code/moreutils/).
    71  
    72  This script may be useful as a basis for automating redeploys of Pachyderm as needed. 
    73  
    74  ### Best practices: Infrastructure as code
    75  
    76  We highly encourage you to apply the best practices used in developing software to managing the deployment process.
    77  
    78  1. Create scripts that automate as much of your processes as possible and keep them under version control.
    79  1. Keep copies of all artifacts, such as manifests, produced by those scripts and keep those under version control.
    80  1. Document your practices in the code and outside it.
    81  
    82  ## Preparing to deploy Pachyderm
    83  
    84  Things you'll need:
    85  
    86  1. Your storage class
    87  
    88  1. Your object store information.
    89  
    90  1. Your project in OpenShift.
    91  
    92  1. A text editor for editing your deployment manifest.
    93  ## Deploying Pachyderm
    94  ### 1. Determine your role security policy
    95  Pachyderm is deployed by default with cluster roles.
    96  Many institutional Openshift security policies require namespace-local roles rather than cluster roles.
    97  If your security policies require namespace-local roles, use the [`pachctl deploy` command below with the `--local-roles` flag](#namespace-local-roles).
    98  ### 2. Run the deploy command with --dry-run
    99  Once you have your PV, object store, and project, you can create a manifest for editing using the `--dry-run` argument to `pachctl deploy`.
   100  That step is detailed in the deployment instructions for each type of deployment, above.
   101  
   102  Below, find examples, 
   103  with cluster roles and with namespace-local roles,
   104  using AWS elastic block storage as a persistent disk with a custom deploy.
   105  We'll show how to remove this PV in case you want to use a PV you create separately.
   106  
   107  #### Cluster roles
   108  
   109  ```
   110  pachctl deploy custom --persistent-disk aws --etcd-storage-class <storage-class-name> \
   111       --object-store <object-storage-name> etcd-volume <volume-size-in-gb> \
   112       <s3-bucket-name> <s3-access-key-id> <s3-access-secret-key> <s3-access-endpoint-url> \
   113       --dynamic-etcd-nodes <no-of-replica-pods> --dry-run > manifest-statefulset.json
   114  ```
   115  
   116  #### Namespace-local roles
   117  
   118  ```
   119  pachctl deploy custom --persistent-disk aws --etcd-storage-class <storage-class-name> \
   120       --object-store <object-storage-name> etcd-volume <volume-size-in-gb> \
   121       <s3-bucket-name> <s3-access-key-id> <s3-access-secret-key> <s3-access-endpoint-url> \
   122       --dynamic-etcd-nodes <no-of-replica-pods> --local-roles --dry-run > manifest-statefulset.json
   123  ```
   124  
   125  ### 4. Modify pachd Service ports
   126  
   127  In the deployment manifest, which we called `manifest.json`, above, find the stanza for the `pachd` Service.  An example is shown below.
   128  
   129  ```
   130  {
   131  	"kind": "Service",
   132  	"apiVersion": "v1",
   133  	"metadata": {
   134  		"name": "pachd",
   135  		"namespace": "pachyderm",
   136  		"creationTimestamp": null,
   137  		"labels": {
   138  			"app": "pachd",
   139  			"suite": "pachyderm"
   140  		},
   141  		"annotations": {
   142  			"prometheus.io/port": "9091",
   143  			"prometheus.io/scrape": "true"
   144  		}
   145  	},
   146  	"spec": {
   147  		"ports": [
   148  			{
   149  				"name": "api-grpc-port",
   150  				"port": 650,
   151  				"targetPort": 0,
   152  				"nodePort": 30650
   153  			},
   154  			{
   155  				"name": "trace-port",
   156  				"port": 651,
   157  				"targetPort": 0,
   158  				"nodePort": 30651
   159  			},
   160  			{
   161  				"name": "api-http-port",
   162  				"port": 652,
   163  				"targetPort": 0,
   164  				"nodePort": 30652
   165  			},
   166  			{
   167  				"name": "saml-port",
   168  				"port": 654,
   169  				"targetPort": 0,
   170  				"nodePort": 30654
   171  			},
   172  			{
   173  				"name": "api-git-port",
   174  				"port": 999,
   175  				"targetPort": 0,
   176  				"nodePort": 30999
   177  			},
   178  			{
   179  				"name": "s3gateway-port",
   180  				"port": 600,
   181  				"targetPort": 0,
   182  				"nodePort": 30600
   183  			}
   184  		],
   185  		"selector": {
   186  			"app": "pachd"
   187  		},
   188  		"type": "NodePort"
   189  	},
   190  	"status": {
   191  		"loadBalancer": {}
   192  	}
   193  }
   194  ```
   195  
   196  While the nodePort declarations are fine, the port declarations are too low for OpenShift. Good example values are shown below.
   197  
   198  ```
   199  	"spec": {
   200  		"ports": [
   201  			{
   202  				"name": "api-grpc-port",
   203  				"port": 1650,
   204  				"targetPort": 0,
   205  				"nodePort": 30650
   206  			},
   207  			{
   208  				"name": "trace-port",
   209  				"port": 1651,
   210  				"targetPort": 0,
   211  				"nodePort": 30651
   212  			},
   213  			{
   214  				"name": "api-http-port",
   215  				"port": 1652,
   216  				"targetPort": 0,
   217  				"nodePort": 30652
   218  			},
   219  			{
   220  				"name": "saml-port",
   221  				"port": 1654,
   222  				"targetPort": 0,
   223  				"nodePort": 30654
   224  			},
   225  			{
   226  				"name": "api-git-port",
   227  				"port": 1999,
   228  				"targetPort": 0,
   229  				"nodePort": 30999
   230  			},
   231  			{
   232  				"name": "s3gateway-port",
   233  				"port": 1600,
   234  				"targetPort": 0,
   235  				"nodePort": 30600
   236  			}
   237  		],
   238  ```
   239  
   240  ### 5. Modify pachd Deployment ports and add environment variables
   241  In this case you're editing two parts of the `pachd` Deployment json.  
   242  Here, we'll omit the example of the unmodified version.
   243  Instead, we'll show you the modified version.
   244  #### 5.1 pachd Deployment ports
   245  The `pachd` Deployment also has a set of port numbers in the spec for the `pachd` container. 
   246  Those must be modified to match the port numbers you set above for each port.
   247  
   248  ```
   249  {
   250  	"kind": "Deployment",
   251  	"apiVersion": "apps/v1beta1",
   252  	"metadata": {
   253  		"name": "pachd",
   254  		"namespace": "pachyderm",
   255  		"creationTimestamp": null,
   256  		"labels": {
   257  			"app": "pachd",
   258  			"suite": "pachyderm"
   259  		}
   260  	},
   261  	"spec": {
   262  		"replicas": 1,
   263  		"selector": {
   264  			"matchLabels": {
   265  				"app": "pachd",
   266  				"suite": "pachyderm"
   267  			}
   268  		},
   269  		"template": {
   270  			"metadata": {
   271  				"name": "pachd",
   272  				"namespace": "pachyderm",
   273  				"creationTimestamp": null,
   274  				"labels": {
   275  					"app": "pachd",
   276  					"suite": "pachyderm"
   277  				},
   278  				"annotations": {
   279  					"iam.amazonaws.com/role": ""
   280  				}
   281  			},
   282  			"spec": {
   283  				"volumes": [
   284  					{
   285  						"name": "pach-disk"
   286  					},
   287  					{
   288  						"name": "pachyderm-storage-secret",
   289  						"secret": {
   290  							"secretName": "pachyderm-storage-secret"
   291  						}
   292  					}
   293  				],
   294  				"containers": [
   295  					{
   296  						"name": "pachd",
   297  						"image": "pachyderm/pachd:{{ config.pach_latest_version }}",
   298  						"ports": [
   299  							{
   300  								"name": "api-grpc-port",
   301  								"containerPort": 650,
   302  								"protocol": "TCP"
   303  							},
   304  							{
   305  								"name": "trace-port",
   306  								"containerPort": 651
   307  							},
   308  							{
   309  								"name": "api-http-port",
   310  								"containerPort": 652,
   311  								"protocol": "TCP"
   312  							},
   313  							{
   314  								"name": "peer-port",
   315  								"containerPort": 653,
   316  								"protocol": "TCP"
   317  							},
   318  							{
   319  								"name": "api-git-port",
   320  								"containerPort": 999,
   321  								"protocol": "TCP"
   322  							},
   323  							{
   324  								"name": "saml-port",
   325  								"containerPort": 654,
   326  								"protocol": "TCP"
   327  							}
   328  						],
   329  
   330  ```
   331  
   332  #### 5.2 Add environment variables
   333  
   334  You need to configure the following environment variables for
   335  OpenShift:
   336  
   337  1. `WORKER_USES_ROOT`: This controls whether worker pipelines run as the root user or not. You'll need to set it to `false`
   338  1. `PORT`: This is the grpc port used by pachd for communication with `pachctl` and the api.  It should be set to the same value you set for `api-grpc-port` above.
   339  1. `HTTP_PORT`: The port for the api proxy.  It should be set to `api-http-port` above.
   340  1. `PEER_PORT`: Used to coordinate `pachd`'s. Same as `peer-port` above.
   341  1. `PPS_WORKER_GRPC_PORT`: Used to talk to pipelines. Should be set to a value above 1024.  The example value of 1680 below is recommended.
   342  1. `PACH_ROOT`: The Pachyderm root directory. In an OpenShift
   343  deployment, you need to set this value to a directory to which non-root users
   344  have write access, which might depend on your container image. By
   345  default, `PACH_ROOT` is set to `/pach`, which requires root privileges.
   346  Because in OpenShift, you do not have root access, you need to modify
   347  the default setting.
   348  
   349  The added values below are shown inserted above the `PACH_ROOT` value, which is typically the first value in this array.
   350  The rest of the stanza is omitted for clarity.
   351  
   352  ```
   353  						"env": [
   354                              {
   355                              "name": "WORKER_USES_ROOT",
   356                              "value": "false"
   357                              },
   358                              {
   359                              "name": "PORT",
   360                              "value": "1650"
   361                              },
   362                              {
   363                              "name": "HTTP_PORT",
   364                              "value": "1652"
   365                              },
   366                              {
   367                              "name": "PEER_PORT",
   368                              "value": "1653"
   369                              },
   370                              {
   371                              "name": "PPS_WORKER_GRPC_PORT",
   372                              "value": "1680"
   373                              },
   374  							{
   375  								"name": "PACH_ROOT",
   376  								"value": "<path-to-non-root-dir>"
   377  							},
   378  
   379  ```
   380  
   381  ### 6. Statefulset configuration
   382  It uses storage class to create a PVC based on the volume claim template in the IBM Cloud OpenShift cluster. For reference, this example is based on 'ibmc-file-silver' storage class
   383  
   384  ```
   385  {
   386  	"apiVersion": "apps/v1beta1",
   387  	"kind": "StatefulSet",
   388  	"metadata": {
   389  		"labels": {
   390  			"app": "etcd",
   391  			"suite": "pachyderm"
   392  		},
   393  		"name": "etcd",
   394  		"namespace": "pachyderm"
   395  	},
   396  	"spec": {
   397  		"replicas": 2,
   398  		"selector": {
   399  			"matchLabels": {
   400  				"app": "etcd",
   401  				"suite": "pachyderm"
   402  			}
   403  		},
   404  		"serviceName": "etcd-headless",
   405  		"template": {
   406  			"metadata": {
   407  				"labels": {
   408  					"app": "etcd",
   409  					"suite": "pachyderm"
   410  				},
   411  				"name": "etcd",
   412  				"namespace": "pachyderm"
   413  			},
   414  			"spec": {
   415  				"containers": [
   416  					{
   417  						"args": [
   418  							"\"/usr/local/bin/etcd\" \"--listen-client-urls=http://0.0.0.0:2379\" \"--advertise-client-urls=http://0.0.0.0:2379\" \"--data-dir=/var/data/etcd\" \"--auto-compaction-retention=1\" \"--max-txn-ops=10000\" \"--max-request-bytes=52428800\" \"--quota-backend-bytes=8589934592\" \"--listen-peer-urls=http://0.0.0.0:2380\" \"--initial-cluster-token=pach-cluster\" \"--initial-advertise-peer-urls=http://${ETCD_NAME}.etcd-headless.${NAMESPACE}.svc.cluster.local:2380\" \"--initial-cluster=etcd-0=http://etcd-0.etcd-headless.${NAMESPACE}.svc.cluster.local:2380,etcd-1=http://etcd-1.etcd-headless.${NAMESPACE}.svc.cluster.local:2380\""
   419  						],
   420  						"command": [
   421  							"/bin/sh",
   422  							"-c"
   423  						],
   424  						"env": [
   425  							{
   426  								"name": "ETCD_NAME",
   427  								"valueFrom": {
   428  									"fieldRef": {
   429  										"apiVersion": "v1",
   430  										"fieldPath": "metadata.name"
   431  									}
   432  								}
   433  							},
   434  							{
   435  								"name": "NAMESPACE",
   436  								"valueFrom": {
   437  									"fieldRef": {
   438  										"apiVersion": "v1",
   439  										"fieldPath": "metadata.namespace"
   440  									}
   441  								}
   442  							}
   443  						],
   444  						"image": "quay.io/coreos/etcd:v3.3.5",
   445  						"imagePullPolicy": "IfNotPresent",
   446  						"name": "etcd",
   447  						"ports": [
   448  							{
   449  								"containerPort": 2379,
   450  								"name": "client-port"
   451  							},
   452  							{
   453  								"containerPort": 2380,
   454  								"name": "peer-port"
   455  							}
   456  						],
   457  						"resources": {
   458  							"requests": {
   459  								"cpu": "1",
   460  								"memory": "2G"
   461  							}
   462  						},
   463  						"volumeMounts": [
   464  							{
   465  								"mountPath": "/var/data/etcd",
   466  								"name": "etcd-storage"
   467  							}
   468  						]
   469  					}
   470  				],
   471  				"imagePullSecrets": null
   472  			}
   473  		},
   474  		"volumeClaimTemplates": [
   475  			{
   476  				"metadata": {
   477  					"annotations": {
   478  						"volume.beta.kubernetes.io/storage-class": "ibmc-file-silver"
   479  					},
   480  					"labels": {
   481  						"app": "etcd",
   482  						"suite": "pachyderm"
   483  					},
   484  					"name": "etcd-storage",
   485  					"namespace": "pachyderm"
   486  				},
   487  				"spec": {
   488  					"accessModes": [
   489  						"ReadWriteOnce"
   490  					],
   491  					"resources": {
   492  						"requests": {
   493  							"storage": "10Gi"
   494  						}
   495  					}
   496  				}
   497  			}
   498  		]
   499  	}
   500  }
   501  ```
   502  
   503  ## 7. Deploy the Pachyderm manifest you modified.
   504  
   505  ```shell
   506  oc create -f manifest-statefulset.json
   507  ```
   508  
   509  You can see the cluster status by using `oc get pods` as in upstream OpenShift:
   510  
   511  ```shell
   512      oc get pods
   513      NAME                     READY     STATUS    RESTARTS   AGE
   514      dash-78c4b487dc-sm56p    2/2       Running   0          1m
   515      etcd-0                   1/1       Running   0          3m
   516      etcd-1                   1/1       Running   0          3m
   517      pachd-5655cffbf7-57w4p   1/1       Running   0          3m
   518  ```
   519  
   520  ### Known issues
   521  
   522  Problems related to OpenShift deployment are tracked in [issues with the "openshift" label](https://github.com/pachyderm/pachyderm/issues?utf8=%E2%9C%93&q=is%3Aissue+is%3Aopen+label%3Aopenshift).