github.com/treeverse/lakefs@v1.24.1-0.20240520134607-95648127bfb0/docs/howto/deploy/onprem.md (about) 1 --- 2 title: On-Premises 3 grand_parent: How-To 4 parent: Install lakeFS 5 description: How to deploy and set up a production-suitable lakeFS environment on-premises (or on other cloud providers) 6 redirect_from: 7 - /deploy/k8s.html 8 - /deploy/docker.html 9 - /integrations/minio.html 10 - /using/minio.html 11 - /deploy/onprem.html 12 - /deploying/install.html 13 next: ["Import data into your installation", "/howto/import.html"] 14 --- 15 16 # On-Premises Deployment 17 18 {: .tip } 19 > The instructions given here are for a self-managed deployment of lakeFS. 20 > 21 > For a hosted lakeFS service with guaranteed SLAs, try [lakeFS Cloud](https://lakefs.cloud) 22 23 {% include toc.html %} 24 25 ⏰ Expected deployment time: 25 min 26 {: .note } 27 28 ## Prerequisites 29 30 To use lakeFS on-premises, you can either use the [local blockstore](#local-blockstore) adapter or have access to an S3-compatible object store such as [MinIO](https://min.io). 31 32 For more information on how to set up MinIO, see the [official deployment guide](https://min.io/docs/minio/container/operations/installation.html){: target="_blank" } 33 34 ## Setting up a database 35 36 lakeFS requires a PostgreSQL database to synchronize actions on your repositories. 37 This section assumes that you already have a PostgreSQL >= 11.0 database accessible. 38 39 40 ## Setting up a lakeFS Server 41 42 <div class="tabs"> 43 <ul> 44 <li><a href="#linux">Linux</a></li> 45 <li><a href="#docker">Docker</a></li> 46 <li><a href="#k8s">Kubernetes</a></li> 47 </ul> 48 <div markdown="1" id="linux"> 49 50 Connect to your host using SSH: 51 52 1. Create a `config.yaml` on your VM, with the following parameters: 53 54 ```yaml 55 --- 56 database: 57 type: "postgres" 58 postgres: 59 connection_string: "[DATABASE_CONNECTION_STRING]" 60 61 auth: 62 encrypt: 63 # replace this with a randomly-generated string. Make sure to keep it safe! 64 secret_key: "[ENCRYPTION_SECRET_KEY]" 65 66 blockstore: 67 type: s3 68 s3: 69 force_path_style: true 70 endpoint: http://<minio_endpoint> 71 discover_bucket_region: false 72 credentials: 73 access_key_id: <minio_access_key> 74 secret_access_key: <minio_secret_key> 75 ``` 76 77 ⚠️ Notice that the lakeFS Blockstore type is set to `s3` - This configuration works with S3-compatible storage engines such as [MinIO](https://min.io/){: target="blank" }. 78 {: .note } 79 80 1. [Download the binary][downloads] to the server. 81 82 1. Run the `lakefs` binary: 83 84 ```sh 85 lakefs --config config.yaml run 86 ``` 87 88 **Note:** It's preferable to run the binary as a service using systemd or your operating system's facilities. 89 {: .note } 90 91 </div> 92 <div markdown="2" id="docker"> 93 94 To support container-based environments, you can configure lakeFS using environment variables. Here is a `docker run` 95 command to demonstrate starting lakeFS using Docker: 96 97 ```sh 98 docker run \ 99 --name lakefs \ 100 -p 8000:8000 \ 101 -e LAKEFS_DATABASE_TYPE="postgres" \ 102 -e LAKEFS_DATABASE_POSTGRES_CONNECTION_STRING="[DATABASE_CONNECTION_STRING]" \ 103 -e LAKEFS_AUTH_ENCRYPT_SECRET_KEY="[ENCRYPTION_SECRET_KEY]" \ 104 -e LAKEFS_BLOCKSTORE_TYPE="s3" \ 105 -e LAKEFS_BLOCKSTORE_S3_FORCE_PATH_STYLE="true" \ 106 -e LAKEFS_BLOCKSTORE_S3_ENDPOINT="http://<minio_endpoint>" \ 107 -e LAKEFS_BLOCKSTORE_S3_DISCOVER_BUCKET_REGION="false" \ 108 -e LAKEFS_BLOCKSTORE_S3_CREDENTIALS_ACCESS_KEY_ID="<minio_access_key>" \ 109 -e LAKEFS_BLOCKSTORE_S3_CREDENTIALS_SECRET_ACCESS_KEY="<minio_secret_key>" \ 110 treeverse/lakefs:latest run 111 ``` 112 113 ⚠️ Notice that the lakeFS Blockstore type is set to `s3` - This configuration works with S3-compatible storage engines such as [MinIO](https://min.io/){: target="blank" }. 114 {: .note } 115 116 See the [reference][config-envariables] for a complete list of environment variables. 117 118 119 </div> 120 <div markdown="3" id="k8s"> 121 122 You can install lakeFS on Kubernetes using a [Helm chart](https://github.com/treeverse/charts/tree/master/charts/lakefs). 123 124 To install lakeFS with Helm: 125 126 1. Copy the Helm values file relevant for S3-Compatible storage (MinIO in this example): 127 128 ```yaml 129 secrets: 130 # replace this with the connection string of the database you created in a previous step: 131 databaseConnectionString: [DATABASE_CONNECTION_STRING] 132 # replace this with a randomly-generated string 133 authEncryptSecretKey: [ENCRYPTION_SECRET_KEY] 134 lakefsConfig: | 135 blockstore: 136 type: s3 137 s3: 138 force_path_style: true 139 endpoint: http://<minio_endpoint> 140 discover_bucket_region: false 141 credentials: 142 access_key_id: <minio_access_key> 143 secret_access_key: <minio_secret_key> 144 ``` 145 146 ⚠️ Notice that the lakeFS Blockstore type is set to `s3` - This configuration works with S3-compatible storage engines such as [MinIO](https://min.io/){: target="blank" }. 147 {: .note } 148 1. Fill in the missing values and save the file as `conf-values.yaml`. For more configuration options, see our Helm chart [README](https://github.com/treeverse/charts/blob/master/charts/lakefs/README.md#custom-configuration){:target="_blank"}. 149 150 The `lakefsConfig` parameter is the lakeFS configuration documented [here](https://docs.lakefs.io/reference/configuration.html) but without sensitive information. 151 Sensitive information like `databaseConnectionString` is given through separate parameters, and the chart will inject it into Kubernetes secrets. 152 {: .note } 153 154 1. In the directory where you created `conf-values.yaml`, run the following commands: 155 156 ```bash 157 # Add the lakeFS repository 158 helm repo add lakefs https://charts.lakefs.io 159 # Deploy lakeFS 160 helm install my-lakefs lakefs/lakefs -f conf-values.yaml 161 ``` 162 163 *my-lakefs* is the [Helm Release](https://helm.sh/docs/intro/using_helm/#three-big-concepts) name. 164 165 166 ## Load balancing 167 168 To configure a load balancer to direct requests to the lakeFS servers you can use the `LoadBalancer` Service type or a Kubernetes Ingress. 169 By default, lakeFS operates on port 8000 and exposes a `/_health` endpoint that you can use for health checks. 170 171 💡 The NGINX Ingress Controller by default limits the client body size to 1 MiB. 172 Some clients use bigger chunks to upload objects - for example, multipart upload to lakeFS using the [S3-compatible Gateway][s3-gateway] or 173 a simple PUT request using the [OpenAPI Server][openapi]. 174 Checkout Nginx [documentation](https://kubernetes.github.io/ingress-nginx/user-guide/nginx-configuration/annotations/#custom-max-body-size) for increasing the limit, or an example of Nginx configuration with [MinIO](https://docs.min.io/docs/setup-nginx-proxy-with-minio.html). 175 {: .note } 176 177 </div> 178 </div> 179 180 ## Secure connection 181 182 Using a load balancer or cluster manager for TLS/SSL termination is recommended. It helps speed the decryption process and reduces the processing burden from lakeFS. 183 184 In case lakeFS needs to listen and serve with HTTPS, for example for development purposes, update its config yaml with the following section: 185 186 ```yaml 187 tls: 188 enabled: true 189 cert_file: server.crt # provide path to your certificate file 190 key_file: server.key # provide path to your server private key 191 ``` 192 193 194 ## Local Blockstore 195 196 You can configure a block adapter to a POSIX compatible storage location shared by all lakeFS instances. 197 Using the shared storage location, both data and metadata will be stored there. 198 199 Using the local blockstore import and allowing lakeFS access to a specific prefix, it is possible to import files from a shared location. 200 Import is not enabled by default, as it doesn't assume the local path is shared and there is a security concern about accessing a path outside the specified in the blockstore configuration. 201 Enabling is done by `blockstore.local.import_enabled` and `blockstore.local.allowed_external_prefixes` as described in the [configuration reference]({% link reference/configuration.md %}). 202 203 ### Sample configuration using local blockstore 204 205 ```yaml 206 database: 207 type: "postgres" 208 postgres: 209 connection_string: "[DATABASE_CONNECTION_STRING]" 210 211 auth: 212 encrypt: 213 # replace this with a randomly-generated string. Make sure to keep it safe! 214 secret_key: "[ENCRYPTION_SECRET_KEY]" 215 216 blockstore: 217 type: local 218 local: 219 path: /shared/location/lakefs_data # location where data and metadata kept by lakeFS 220 import_enabled: true # required to be true to enable import files 221 # from `allowed_external_prefixes` locations 222 allowed_external_prefixes: 223 - /shared/location/files_to_import # location with files we can import into lakeFS, require access from lakeFS 224 ``` 225 226 ### Limitations 227 228 - Using a local adapter on a shared location is relativly new and not battle-tested yet 229 - lakeFS doesn't control the way a shared location is managed across machines 230 - When using lakectl or the lakeFS UI, you can currently import only directories. If you need to import a single file, use the [HTTP API](https://docs.lakefs.io/reference/api.html#/import/importStart) or API Clients with `type=object` in the request body and `destination=<full-path-to-file>`. 231 - Garbage collector (for committed and uncommitted) and lakeFS Hadoop FileSystem currently unsupported 232 233 {% include_relative includes/setup.md %} 234 235 [config-envariables]: {% link reference/configuration.md %}#using-environment-variables %} 236 [downloads]: {% link index.md %}#downloads 237 [openapi]: {% link understand/architecture.md %}#openapi-server 238 [s3-gateway]: {% link understand/architecture.md %}#s3-gateway 239 [understand-repository]: {% link understand/model.md %}#repository 240 [integration-hadoopfs]: {% link integrations/spark.md %}#lakefs-hadoop-filesystem 241 [understand-commits]: {% link understand/how/versioning-internals.md %}#constructing-a-consistent-view-of-the-keyspace-ie-a-commit