github.com/treeverse/lakefs@v1.24.1-0.20240520134607-95648127bfb0/docs/howto/deploy/gcp.md (about) 1 --- 2 title: GCP 3 grand_parent: How-To 4 parent: Install lakeFS 5 description: How to deploy and set up a production-suitable lakeFS environment on Google Cloud Platform (GCP). 6 redirect_from: 7 - /setup/storage/gcs.html 8 - /deploy/gcp.html 9 next: ["Import data into your installation", "/howto/import.html"] 10 --- 11 12 # Deploy lakeFS on GCP 13 14 15 {: .tip } 16 > The instructions given here are for a self-managed deployment of lakeFS on GCP. 17 > 18 > For a hosted lakeFS service with guaranteed SLAs, please [contact us](support@treeverse.io) for details of lakeFS Cloud on GCP. 19 20 When you deploy lakeFS on GCP these are the options available to use: 21 22  23 24 {% include toc.html %} 25 26 ⏰ Expected deployment time: 25 min 27 {: .note } 28 29 ## Create a Database 30 31 lakeFS requires a PostgreSQL database to synchronize actions on your repositories. 32 We will show you how to create a database on Google Cloud SQL, but you can use any PostgreSQL database as long as it's accessible by your lakeFS installation. 33 34 If you already have a database, take note of the connection string and skip to the [next step](#run-the-lakefs-server) 35 36 1. Follow the official [Google documentation](https://cloud.google.com/sql/docs/postgres/quickstart#create-instance) on how to create a PostgreSQL instance. 37 Make sure you're using PostgreSQL version >= 11. 38 1. On the *Users* tab in the console, create a user. The lakeFS installation will use it to connect to your database. 39 1. Choose the method by which lakeFS [will connect to your database](https://cloud.google.com/sql/docs/postgres/connect-overview). Google recommends using 40 the [SQL Auth Proxy](https://cloud.google.com/sql/docs/postgres/sql-proxy). 41 42 43 ## Run the lakeFS Server 44 45 <div class="tabs"> 46 <ul> 47 <li><a href="#gce">GCE Instance</a></li> 48 <li><a href="#docker">Docker</a></li> 49 <li><a href="#gke">GKE</a></li> 50 </ul> 51 <div markdown="1" id="gce"> 52 53 1. Save the following configuration file as `config.yaml`: 54 55 ```yaml 56 --- 57 database: 58 type: "postgres" 59 postgres: 60 connection_string: "[DATABASE_CONNECTION_STRING]" 61 auth: 62 encrypt: 63 # replace this with a randomly-generated string: 64 secret_key: "[ENCRYPTION_SECRET_KEY]" 65 blockstore: 66 type: gs 67 # Uncomment the following lines to give lakeFS access to your buckets using a service account: 68 # gs: 69 # credentials_json: [YOUR SERVICE ACCOUNT JSON STRING] 70 ``` 71 72 1. [Download the binary][downloads] to run on the GCE instance. 73 1. Run the `lakefs` binary on the GCE machine: 74 ```bash 75 lakefs --config config.yaml run 76 ``` 77 **Note:** it is preferable to run the binary as a service using systemd or your operating system's facilities. 78 79 </div> 80 <div markdown="2" id="docker"> 81 82 To support container-based environments like Google Cloud Run, lakeFS can be configured using environment variables. Here is a `docker run` 83 command to demonstrate starting lakeFS using Docker: 84 85 ```sh 86 docker run \ 87 --name lakefs \ 88 -p 8000:8000 \ 89 -e LAKEFS_DATABASE_TYPE="postgres" \ 90 -e LAKEFS_DATABASE_POSTGRES_CONNECTION_STRING="[DATABASE_CONNECTION_STRING]" \ 91 -e LAKEFS_AUTH_ENCRYPT_SECRET_KEY="[ENCRYPTION_SECRET_KEY]" \ 92 -e LAKEFS_BLOCKSTORE_TYPE="gs" \ 93 treeverse/lakefs:latest run 94 ``` 95 96 See the [reference][config-envariables] for a complete list of environment variables. 97 98 </div> 99 <div markdown="3" id="gke"> 100 101 You can install lakeFS on Kubernetes using a [Helm chart](https://github.com/treeverse/charts/tree/master/charts/lakefs). 102 103 To install lakeFS with Helm: 104 105 1. Copy the Helm values file relevant for Google Storage: 106 107 ```yaml 108 secrets: 109 # replace DATABASE_CONNECTION_STRING with the connection string of the database you created in a previous step. 110 # e.g.: postgres://postgres:myPassword@localhost/postgres:5432 111 databaseConnectionString: [DATABASE_CONNECTION_STRING] 112 # replace this with a randomly-generated string 113 authEncryptSecretKey: [ENCRYPTION_SECRET_KEY] 114 lakefsConfig: | 115 blockstore: 116 type: gs 117 # Uncomment the following lines to give lakeFS access to your buckets using a service account: 118 # gs: 119 # credentials_json: [YOUR SERVICE ACCOUNT JSON STRING] 120 ``` 121 1. Fill in the missing values and save the file as `conf-values.yaml`. For more configuration options, see our Helm chart [README](https://github.com/treeverse/charts/blob/master/charts/lakefs/README.md#custom-configuration){:target="_blank"}. 122 123 The `lakefsConfig` parameter is the lakeFS configuration documented [here](https://docs.lakefs.io/reference/configuration.html) but without sensitive information. 124 Sensitive information like `databaseConnectionString` is given through separate parameters, and the chart will inject it into Kubernetes secrets. 125 {: .note } 126 127 1. In the directory where you created `conf-values.yaml`, run the following commands: 128 129 ```bash 130 # Add the lakeFS repository 131 helm repo add lakefs https://charts.lakefs.io 132 # Deploy lakeFS 133 helm install my-lakefs lakefs/lakefs -f conf-values.yaml 134 ``` 135 136 *my-lakefs* is the [Helm Release](https://helm.sh/docs/intro/using_helm/#three-big-concepts) name. 137 138 139 ## Load balancing 140 141 To configure a load balancer to direct requests to the lakeFS servers you can use the `LoadBalancer` Service type or a Kubernetes Ingress. 142 By default, lakeFS operates on port 8000 and exposes a `/_health` endpoint that you can use for health checks. 143 144 💡 The NGINX Ingress Controller by default limits the client body size to 1 MiB. 145 Some clients use bigger chunks to upload objects - for example, multipart upload to lakeFS using the [S3-compatible Gateway][s3-gateway] or 146 a simple PUT request using the [OpenAPI Server][openapi]. 147 Checkout Nginx [documentation](https://kubernetes.github.io/ingress-nginx/user-guide/nginx-configuration/annotations/#custom-max-body-size) for increasing the limit, or an example of Nginx configuration with [MinIO](https://docs.min.io/docs/setup-nginx-proxy-with-minio.html). 148 {: .note } 149 150 </div> 151 </div> 152 153 154 {% include_relative includes/setup.md %} 155 156 [config-envariables]: {% link reference/configuration.md %}#using-environment-variables %} 157 [downloads]: {% link index.md %}#downloads 158 [openapi]: {% link understand/architecture.md %}#openapi-server 159 [s3-gateway]: {% link understand/architecture.md %}#s3-gateway 160 [understand-repository]: {% link understand/model.md %}#repository 161 [integration-hadoopfs]: {% link integrations/spark.md %}#lakefs-hadoop-filesystem 162 [understand-commits]: {% link understand/how/versioning-internals.md %}#constructing-a-consistent-view-of-the-keyspace-ie-a-commit