github.com/treeverse/lakefs@v1.24.1-0.20240520134607-95648127bfb0/docs/howto/deploy/gcp.md (about)

     1  ---
     2  title: GCP
     3  grand_parent: How-To
     4  parent: Install lakeFS
     5  description: How to deploy and set up a production-suitable lakeFS environment on Google Cloud Platform (GCP).
     6  redirect_from:
     7     - /setup/storage/gcs.html 
     8     - /deploy/gcp.html 
     9  next:  ["Import data into your installation", "/howto/import.html"]
    10  ---
    11  
    12  # Deploy lakeFS on GCP
    13  
    14  
    15  {: .tip }
    16  > The instructions given here are for a self-managed deployment of lakeFS on GCP. 
    17  > 
    18  > For a hosted lakeFS service with guaranteed SLAs, please [contact us](support@treeverse.io) for details of lakeFS Cloud on GCP.
    19  
    20  When you deploy lakeFS on GCP these are the options available to use: 
    21  
    22  ![](/assets/img/deploy/deploy-on-gcp.excalidraw.png)
    23  
    24  {% include toc.html %}
    25  
    26  ⏰ Expected deployment time: 25 min
    27  {: .note }
    28  
    29  ## Create a Database
    30  
    31  lakeFS requires a PostgreSQL database to synchronize actions on your repositories.
    32  We will show you how to create a database on Google Cloud SQL, but you can use any PostgreSQL database as long as it's accessible by your lakeFS installation.
    33  
    34  If you already have a database, take note of the connection string and skip to the [next step](#run-the-lakefs-server)
    35  
    36  1. Follow the official [Google documentation](https://cloud.google.com/sql/docs/postgres/quickstart#create-instance) on how to create a PostgreSQL instance.
    37     Make sure you're using PostgreSQL version >= 11.
    38  1. On the *Users* tab in the console, create a user. The lakeFS installation will use it to connect to your database.
    39  1. Choose the method by which lakeFS [will connect to your database](https://cloud.google.com/sql/docs/postgres/connect-overview). Google recommends using
    40     the [SQL Auth Proxy](https://cloud.google.com/sql/docs/postgres/sql-proxy).
    41  
    42  
    43  ## Run the lakeFS Server
    44  
    45  <div class="tabs">
    46    <ul>
    47      <li><a href="#gce">GCE Instance</a></li>
    48      <li><a href="#docker">Docker</a></li>
    49      <li><a href="#gke">GKE</a></li>
    50    </ul>
    51    <div markdown="1" id="gce">
    52  
    53  1. Save the following configuration file as `config.yaml`:
    54  
    55     ```yaml
    56     ---
    57     database:
    58       type: "postgres"
    59       postgres:
    60         connection_string: "[DATABASE_CONNECTION_STRING]"
    61     auth:
    62       encrypt:
    63         # replace this with a randomly-generated string:
    64         secret_key: "[ENCRYPTION_SECRET_KEY]"
    65     blockstore:
    66       type: gs
    67        # Uncomment the following lines to give lakeFS access to your buckets using a service account:
    68        # gs:
    69        #   credentials_json: [YOUR SERVICE ACCOUNT JSON STRING]
    70     ```
    71     
    72  1. [Download the binary][downloads] to run on the GCE instance.
    73  1. Run the `lakefs` binary on the GCE machine:
    74     ```bash
    75     lakefs --config config.yaml run
    76     ```
    77     **Note:** it is preferable to run the binary as a service using systemd or your operating system's facilities.
    78  
    79  </div>
    80  <div markdown="2" id="docker">
    81  
    82  To support container-based environments like Google Cloud Run, lakeFS can be configured using environment variables. Here is a `docker run`
    83  command to demonstrate starting lakeFS using Docker:
    84  
    85  ```sh
    86  docker run \
    87    --name lakefs \
    88    -p 8000:8000 \
    89    -e LAKEFS_DATABASE_TYPE="postgres" \
    90    -e LAKEFS_DATABASE_POSTGRES_CONNECTION_STRING="[DATABASE_CONNECTION_STRING]" \
    91    -e LAKEFS_AUTH_ENCRYPT_SECRET_KEY="[ENCRYPTION_SECRET_KEY]" \
    92    -e LAKEFS_BLOCKSTORE_TYPE="gs" \
    93    treeverse/lakefs:latest run
    94  ```
    95  
    96  See the [reference][config-envariables] for a complete list of environment variables.
    97  
    98  </div>
    99  <div markdown="3" id="gke">
   100  
   101  You can install lakeFS on Kubernetes using a [Helm chart](https://github.com/treeverse/charts/tree/master/charts/lakefs).
   102  
   103  To install lakeFS with Helm:
   104  
   105  1. Copy the Helm values file relevant for Google Storage:
   106     
   107     ```yaml
   108     secrets:
   109         # replace DATABASE_CONNECTION_STRING with the connection string of the database you created in a previous step.
   110         # e.g.: postgres://postgres:myPassword@localhost/postgres:5432
   111         databaseConnectionString: [DATABASE_CONNECTION_STRING]
   112         # replace this with a randomly-generated string
   113         authEncryptSecretKey: [ENCRYPTION_SECRET_KEY]
   114     lakefsConfig: |
   115         blockstore:
   116           type: gs
   117           # Uncomment the following lines to give lakeFS access to your buckets using a service account:
   118           # gs:
   119           #   credentials_json: [YOUR SERVICE ACCOUNT JSON STRING]
   120     ```
   121  1. Fill in the missing values and save the file as `conf-values.yaml`. For more configuration options, see our Helm chart [README](https://github.com/treeverse/charts/blob/master/charts/lakefs/README.md#custom-configuration){:target="_blank"}.
   122  
   123     The `lakefsConfig` parameter is the lakeFS configuration documented [here](https://docs.lakefs.io/reference/configuration.html) but without sensitive information.
   124     Sensitive information like `databaseConnectionString` is given through separate parameters, and the chart will inject it into Kubernetes secrets.
   125     {: .note }
   126  
   127  1. In the directory where you created `conf-values.yaml`, run the following commands:
   128  
   129     ```bash
   130     # Add the lakeFS repository
   131     helm repo add lakefs https://charts.lakefs.io
   132     # Deploy lakeFS
   133     helm install my-lakefs lakefs/lakefs -f conf-values.yaml
   134     ```
   135  
   136     *my-lakefs* is the [Helm Release](https://helm.sh/docs/intro/using_helm/#three-big-concepts) name.
   137  
   138  
   139  ## Load balancing
   140  
   141  To configure a load balancer to direct requests to the lakeFS servers you can use the `LoadBalancer` Service type or a Kubernetes Ingress.
   142  By default, lakeFS operates on port 8000 and exposes a `/_health` endpoint that you can use for health checks.
   143  
   144  💡 The NGINX Ingress Controller by default limits the client body size to 1 MiB.
   145  Some clients use bigger chunks to upload objects - for example, multipart upload to lakeFS using the [S3-compatible Gateway][s3-gateway] or 
   146  a simple PUT request using the [OpenAPI Server][openapi].
   147  Checkout Nginx [documentation](https://kubernetes.github.io/ingress-nginx/user-guide/nginx-configuration/annotations/#custom-max-body-size) for increasing the limit, or an example of Nginx configuration with [MinIO](https://docs.min.io/docs/setup-nginx-proxy-with-minio.html).
   148  {: .note }
   149  
   150  </div>
   151  </div>
   152  
   153  
   154  {% include_relative includes/setup.md %}
   155  
   156  [config-envariables]:  {% link reference/configuration.md %}#using-environment-variables %}
   157  [downloads]:  {% link index.md %}#downloads
   158  [openapi]:  {% link understand/architecture.md %}#openapi-server
   159  [s3-gateway]:  {% link understand/architecture.md %}#s3-gateway
   160  [understand-repository]:  {% link understand/model.md %}#repository
   161  [integration-hadoopfs]:  {% link integrations/spark.md %}#lakefs-hadoop-filesystem
   162  [understand-commits]:  {% link understand/how/versioning-internals.md %}#constructing-a-consistent-view-of-the-keyspace-ie-a-commit