github.com/treeverse/lakefs@v1.24.1-0.20240520134607-95648127bfb0/docs/howto/deploy/onprem.md (about)

     1  ---
     2  title: On-Premises
     3  grand_parent: How-To
     4  parent: Install lakeFS
     5  description: How to deploy and set up a production-suitable lakeFS environment on-premises (or on other cloud providers)
     6  redirect_from:
     7     - /deploy/k8s.html
     8     - /deploy/docker.html 
     9     - /integrations/minio.html
    10     - /using/minio.html
    11     - /deploy/onprem.html
    12     - /deploying/install.html
    13  next:  ["Import data into your installation", "/howto/import.html"]
    14  ---
    15  
    16  # On-Premises Deployment
    17  
    18  {: .tip }
    19  > The instructions given here are for a self-managed deployment of lakeFS.
    20  > 
    21  > For a hosted lakeFS service with guaranteed SLAs, try [lakeFS Cloud](https://lakefs.cloud)
    22  
    23  {% include toc.html %}
    24  
    25  ⏰ Expected deployment time: 25 min
    26  {: .note }
    27  
    28  ## Prerequisites
    29  
    30  To use lakeFS on-premises, you can either use the [local blockstore](#local-blockstore) adapter or have access to an S3-compatible object store such as [MinIO](https://min.io).
    31  
    32  For more information on how to set up MinIO, see the [official deployment guide](https://min.io/docs/minio/container/operations/installation.html){: target="_blank" }
    33  
    34  ## Setting up a database
    35  
    36  lakeFS requires a PostgreSQL database to synchronize actions on your repositories.
    37  This section assumes that you already have a PostgreSQL >= 11.0 database accessible.
    38  
    39  
    40  ## Setting up a lakeFS Server
    41  
    42  <div class="tabs">
    43    <ul>
    44      <li><a href="#linux">Linux</a></li>
    45      <li><a href="#docker">Docker</a></li>
    46      <li><a href="#k8s">Kubernetes</a></li>
    47    </ul>
    48    <div markdown="1" id="linux">
    49  
    50  Connect to your host using SSH:
    51  
    52  1. Create a `config.yaml` on your VM, with the following parameters:
    53   
    54     ```yaml
    55     ---
    56     database:
    57       type: "postgres"
    58       postgres:
    59         connection_string: "[DATABASE_CONNECTION_STRING]"
    60    
    61     auth:
    62       encrypt:
    63         # replace this with a randomly-generated string. Make sure to keep it safe!
    64         secret_key: "[ENCRYPTION_SECRET_KEY]"
    65     
    66     blockstore:
    67       type: s3
    68       s3:
    69          force_path_style: true
    70          endpoint: http://<minio_endpoint>
    71          discover_bucket_region: false
    72          credentials:
    73             access_key_id: <minio_access_key>
    74             secret_access_key: <minio_secret_key>
    75     ```
    76  
    77     ⚠️ Notice that the lakeFS Blockstore type is set to `s3` - This configuration works with S3-compatible storage engines such as [MinIO](https://min.io/){: target="blank" }.
    78     {: .note }
    79  
    80  1. [Download the binary][downloads] to the server.
    81  
    82  1. Run the `lakefs` binary:
    83  
    84     ```sh
    85     lakefs --config config.yaml run
    86     ```
    87  
    88  **Note:** It's preferable to run the binary as a service using systemd or your operating system's facilities.
    89  {: .note }
    90  
    91  </div>
    92  <div markdown="2" id="docker">
    93  
    94  To support container-based environments, you can configure lakeFS using environment variables. Here is a `docker run`
    95  command to demonstrate starting lakeFS using Docker:
    96  
    97  ```sh
    98  docker run \
    99    --name lakefs \
   100    -p 8000:8000 \
   101    -e LAKEFS_DATABASE_TYPE="postgres" \
   102    -e LAKEFS_DATABASE_POSTGRES_CONNECTION_STRING="[DATABASE_CONNECTION_STRING]" \
   103    -e LAKEFS_AUTH_ENCRYPT_SECRET_KEY="[ENCRYPTION_SECRET_KEY]" \
   104    -e LAKEFS_BLOCKSTORE_TYPE="s3" \
   105    -e LAKEFS_BLOCKSTORE_S3_FORCE_PATH_STYLE="true" \
   106    -e LAKEFS_BLOCKSTORE_S3_ENDPOINT="http://<minio_endpoint>" \
   107    -e LAKEFS_BLOCKSTORE_S3_DISCOVER_BUCKET_REGION="false" \
   108    -e LAKEFS_BLOCKSTORE_S3_CREDENTIALS_ACCESS_KEY_ID="<minio_access_key>" \
   109    -e LAKEFS_BLOCKSTORE_S3_CREDENTIALS_SECRET_ACCESS_KEY="<minio_secret_key>" \
   110    treeverse/lakefs:latest run
   111  ```
   112  
   113  ⚠️ Notice that the lakeFS Blockstore type is set to `s3` - This configuration works with S3-compatible storage engines such as [MinIO](https://min.io/){: target="blank" }.
   114  {: .note }
   115  
   116  See the [reference][config-envariables] for a complete list of environment variables.
   117  
   118  
   119  </div>
   120  <div markdown="3" id="k8s">
   121  
   122  You can install lakeFS on Kubernetes using a [Helm chart](https://github.com/treeverse/charts/tree/master/charts/lakefs).
   123  
   124  To install lakeFS with Helm:
   125  
   126  1. Copy the Helm values file relevant for S3-Compatible storage (MinIO in this example):
   127  
   128     ```yaml
   129     secrets:
   130         # replace this with the connection string of the database you created in a previous step:
   131         databaseConnectionString: [DATABASE_CONNECTION_STRING]
   132         # replace this with a randomly-generated string
   133         authEncryptSecretKey: [ENCRYPTION_SECRET_KEY]
   134     lakefsConfig: |
   135         blockstore:
   136           type: s3
   137           s3:
   138             force_path_style: true
   139             endpoint: http://<minio_endpoint>
   140             discover_bucket_region: false
   141             credentials:
   142               access_key_id: <minio_access_key>
   143               secret_access_key: <minio_secret_key>
   144     ```
   145  
   146     ⚠️ Notice that the lakeFS Blockstore type is set to `s3` - This configuration works with S3-compatible storage engines such as [MinIO](https://min.io/){: target="blank" }.
   147     {: .note }
   148  1. Fill in the missing values and save the file as `conf-values.yaml`. For more configuration options, see our Helm chart [README](https://github.com/treeverse/charts/blob/master/charts/lakefs/README.md#custom-configuration){:target="_blank"}.
   149  
   150     The `lakefsConfig` parameter is the lakeFS configuration documented [here](https://docs.lakefs.io/reference/configuration.html) but without sensitive information.
   151     Sensitive information like `databaseConnectionString` is given through separate parameters, and the chart will inject it into Kubernetes secrets.
   152     {: .note }
   153  
   154  1. In the directory where you created `conf-values.yaml`, run the following commands:
   155  
   156     ```bash
   157     # Add the lakeFS repository
   158     helm repo add lakefs https://charts.lakefs.io
   159     # Deploy lakeFS
   160     helm install my-lakefs lakefs/lakefs -f conf-values.yaml
   161     ```
   162  
   163     *my-lakefs* is the [Helm Release](https://helm.sh/docs/intro/using_helm/#three-big-concepts) name.
   164  
   165  
   166     ## Load balancing
   167     
   168     To configure a load balancer to direct requests to the lakeFS servers you can use the `LoadBalancer` Service type or a Kubernetes Ingress.
   169     By default, lakeFS operates on port 8000 and exposes a `/_health` endpoint that you can use for health checks.
   170  
   171     💡 The NGINX Ingress Controller by default limits the client body size to 1 MiB.
   172     Some clients use bigger chunks to upload objects - for example, multipart upload to lakeFS using the [S3-compatible Gateway][s3-gateway] or 
   173     a simple PUT request using the [OpenAPI Server][openapi].
   174     Checkout Nginx [documentation](https://kubernetes.github.io/ingress-nginx/user-guide/nginx-configuration/annotations/#custom-max-body-size) for increasing the limit, or an example of Nginx configuration with [MinIO](https://docs.min.io/docs/setup-nginx-proxy-with-minio.html).
   175     {: .note }
   176  
   177  </div>
   178  </div>
   179  
   180  ## Secure connection
   181  
   182  Using a load balancer or cluster manager for TLS/SSL termination is recommended. It helps speed the decryption process and reduces the processing burden from lakeFS.
   183  
   184  In case lakeFS needs to listen and serve with HTTPS, for example for development purposes, update its config yaml with the following section:
   185  
   186  ```yaml
   187  tls:
   188    enabled: true
   189    cert_file: server.crt   # provide path to your certificate file
   190    key_file: server.key    # provide path to your server private key
   191  ```
   192  
   193  
   194  ## Local Blockstore
   195  
   196  You can configure a block adapter to a POSIX compatible storage location shared by all lakeFS instances. 
   197  Using the shared storage location, both data and metadata will be stored there.
   198  
   199  Using the local blockstore import and allowing lakeFS access to a specific prefix, it is possible to import files from a shared location.
   200  Import is not enabled by default, as it doesn't assume the local path is shared and there is a security concern about accessing a path outside the specified in the blockstore configuration.
   201  Enabling is done by `blockstore.local.import_enabled` and `blockstore.local.allowed_external_prefixes` as described in the [configuration reference]({% link reference/configuration.md %}).
   202  
   203  ### Sample configuration using local blockstore
   204  
   205  ```yaml
   206  database:
   207    type: "postgres"
   208    postgres:
   209      connection_string: "[DATABASE_CONNECTION_STRING]"
   210    
   211  auth:
   212    encrypt:
   213      # replace this with a randomly-generated string. Make sure to keep it safe!
   214      secret_key: "[ENCRYPTION_SECRET_KEY]"
   215  
   216  blockstore:
   217    type: local
   218    local:
   219      path: /shared/location/lakefs_data    # location where data and metadata kept by lakeFS
   220      import_enabled: true                  # required to be true to enable import files
   221                                            # from `allowed_external_prefixes` locations
   222      allowed_external_prefixes:
   223        - /shared/location/files_to_import  # location with files we can import into lakeFS, require access from lakeFS
   224  ```
   225  
   226  ### Limitations
   227  
   228  - Using a local adapter on a shared location is relativly new and not battle-tested yet
   229  - lakeFS doesn't control the way a shared location is managed across machines
   230  - When using lakectl or the lakeFS UI, you can currently import only directories. If you need to import a single file, use the [HTTP API](https://docs.lakefs.io/reference/api.html#/import/importStart) or API Clients with `type=object` in the request body and `destination=<full-path-to-file>`.
   231  - Garbage collector (for committed and uncommitted) and lakeFS Hadoop FileSystem currently unsupported
   232  
   233  {% include_relative includes/setup.md %}
   234  
   235  [config-envariables]:  {% link reference/configuration.md %}#using-environment-variables %}
   236  [downloads]:  {% link index.md %}#downloads
   237  [openapi]:  {% link understand/architecture.md %}#openapi-server
   238  [s3-gateway]:  {% link understand/architecture.md %}#s3-gateway
   239  [understand-repository]:  {% link understand/model.md %}#repository
   240  [integration-hadoopfs]:  {% link integrations/spark.md %}#lakefs-hadoop-filesystem
   241  [understand-commits]:  {% link understand/how/versioning-internals.md %}#constructing-a-consistent-view-of-the-keyspace-ie-a-commit