kythe.io@v0.0.68-0.20240422202219-7225dbc01741/kythe/extractors/gcp/README.md (about) 1 # Kythe Extracting on GCP 2 3 This package contains nothing of note, but will eventually support extracting 4 Kythe Compilation Units on Google Cloud Platform. 5 6 ## Cloud Build 7 8 Documentation for Cloud Build itself is available at 9 https://cloud.google.com/cloud-build/. 10 11 For the rest of this test documentation, we'll assume you've run those setup 12 instructions. Additionally, you should make an environment variable for your 13 gs bucket: 14 15 ``` 16 export BUCKET_NAME="your-bucket-name" 17 ``` 18 19 ## Hello World Test 20 21 To make sure you have done setup correctly, we have an example binary at 22 `kythe/extractors/gcp/examples/helloworld`, which you can run as follows: 23 24 ``` 25 gcloud builds submit --config examples/helloworld/helloworld.yaml \ 26 --substitutions=_BUCKET_NAME="$BUCKET_NAME"\ 27 examples/helloworld 28 ``` 29 30 If that fails, you have to go back up to the [Cloud Build](#cloud-build) section 31 and follow the installation steps. Of note, you will have to install `gcloud`, 32 authorize it, associate it with a valid project id, create a test gs bucket. 33 34 ## Maven Proof of Concept 35 36 To extract a maven repository using Kythe on Cloud Build, use 37 `examples/mvn.yaml`. This assumes that you will specify a maven repository 38 in `_REPO`, and that the repository has a top-level `pom.xml` file (right 39 now it is a hard-coded location, but in the future it will be configurable). 40 This also assumes you specify `$BUCKET_NAME` as per the Hello World Test above. 41 `_CORPUS` can be any identifying string for your repo, for example: "guava". 42 43 ``` 44 gcloud builds submit --config examples/mvn.yaml \ 45 --substitutions=\ 46 _BUCKET_NAME=$BUCKET_NAME,\ 47 _REPO=https://github.com/project-name/repo-name,\ 48 _COMMIT=<version-hash>,\ 49 _CORPUS=repo-name\ 50 --no-source 51 ``` 52 53 ### Guava specific example 54 55 To extract multiple parts of https://github.com/google/guava, use 56 `examples/guava-mvn.yaml`. 57 58 ``` 59 gcloud builds submit --config examples/guava-mvn.yaml \ 60 --substitutions=\ 61 _BUCKET_NAME=$BUCKET_NAME,\ 62 _COMMIT=<commit-hash>,\ 63 --no-source 64 ``` 65 66 This outputs `guava-<commit-hash>.kzip` to `$BUCKET_NAME` on Google Cloud Storage. 67 68 This is a reasonable example of a maven project which has already specified 69 the requisite `maven-compiler-plugin` bits in their `pom.xml` files to support 70 Kythe extraction, and also a project which has multiple modules. 71 72 Note however not all directories from guava extract with the top-level action. 73 For example if you want to extract the android copy of guava that lives inside 74 of the guava tree, you would need a slightly different action: 75 76 ``` 77 gcloud builds submit --config examples/guava-android-mvn.yaml \ 78 --substitutions=\ 79 _BUCKET_NAME=$BUCKET_NAME,\ 80 _COMMIT=<commit-hash>\ 81 --no-source 82 ``` 83 84 This outputs `guava-android-<commit-hash>kzip` to `$BUCKET_NAME` on GCS. 85 86 ## Gradle Proof of Concept 87 88 Gradle is extracted similarly: 89 90 ``` 91 gcloud builds submit --config examples/gradle.yaml \ 92 --substitutions=\ 93 _BUCKET_NAME=$BUCKET_NAME,\ 94 _REPO=https://github.com/project-name/repo-name,\ 95 _COMMIT=<version-hash>,\ 96 _CORPUS=repo-name\ 97 --no-source 98 ``` 99 100 ## Bazel Extraction 101 102 * `_COMMIT`: git repository commit to checkout, build, and extract 103 * `_REPO`: source git repository URL 104 * `_BUCKET_NAME`: GCS bucket name to store extracted compilations 105 * `_CORPUS`: Kythe corpus label 106 107 ```shell 108 # Extract github.com/angular/angular at commit 8accc98 109 gcloud builds submit --no-source --config kythe/extractors/gcp/bazel/bazel.yaml \ 110 --substitutions=_REPO=https://github.com/angular/angular.git,\ 111 _COMMIT=8accc98d28249628e84136d7306fdbbe1f4caaef,\ 112 _BUCKET_NAME=$BUCKET_NAME,\ 113 _CORPUS=github.com/angular/angular 114 115 # Extract github.com/bazelbuild/bazel at commit 22d375b 116 gcloud builds submit --no-source --config kythe/extractors/gcp/bazel/bazel.yaml \ 117 --substitutions=_REPO=https://github.com/bazelbuild/bazel.git,\ 118 _COMMIT=22d375bd532b04bb83f18a7770e5080e23a1d517,\ 119 _BUCKET_NAME=$BUCKET_NAME,\ 120 _CORPUS=github.com/bazelbuild/bazel 121 122 # Extract github.com/protocolbuffers/protobuf at commit e728325 123 gcloud builds submit --no-source --config kythe/extractors/gcp/bazel/bazel.yaml \ 124 --substitutions=_REPO=https://github.com/protocolbuffers/protobuf.git,\ 125 _COMMIT=e7283254d6eb01ddfdb63cc3c89cd312e2d354d5,\ 126 _BUCKET_NAME=$BUCKET_NAME,\ 127 _CORPUS=github.com/protocolbuffers/protobuf 128 ``` 129 130 ## Cloud Build REST API 131 132 Cloud Build has a REST API described at 133 https://cloud.google.com/cloud-build/docs/api/reference/rest/. For Kythe 134 extraction, we have a test binary that lets you isolate authentication problems 135 before dealing with real builds. 136 137 You will need access to your project's service credentials: 138 139 https://cloud.google.com/docs/authentication/production#obtaining_and_providing_service_account_credentials_manually 140 141 If your team already has credentials made for this purpose, see if you can 142 re-use them. 143 144 If not, you can use these steps to create new credentials: 145 146 1. In your GCP console, click on the top left hamburger icon 147 2. Click on APIs & Services 148 3. In the dropdown, click on Credentials 149 4. Now you can mostly follow the instructions from the [above link](https://cloud.google.com/docs/authentication/production#obtaining_and_providing_service_account_credentials_manually), 150 however note: 151 5. When making a service account key, you can select the Cloud Build roles, 152 instead of "project owner", to have better limiting of resources. 153 6. You will still download the json file and set environment variable 154 `GOOGLE_APPLICATION_CREDENTIALS` as described in the above link. 155 156 To test, run 157 158 ``` 159 bazel build kythe/extractors/gcp/examples/restcheck:rest_auth_check 160 ./bazel-bin/kythe/extractors/gcp/examples/restcheck/rest_auth_check -project_id=some-project 161 ``` 162 163 If that returns with a 403 error, you likely did the authentication steps above 164 incorrectly. 165 166 ## Associated extractor images 167 168 Kythe team maintains a few images useful for extracting Kythe data on Google 169 Cloud Build. Many of these are used in example scripts and other generated GCB 170 executions in Kythe. 171 172 ### gcr.io/kythe-public/kythe-javac-extractor-artifacts 173 174 Created from 175 [kythe/java/com/google/devtools/kythe/extractors/java/artifacts](https://github.com/kythe/kythe/blob/master/kythe/java/com/google/devtools/kythe/extractors/java/artifacts), 176 this image contains: 177 178 * `javac-wrapper.sh` script which calls Kythe extraction and then an actual java 179 compiler 180 * `javac_extractor.jar` which is the Kythe java extractor 181 * `javac9_tools.jar` which contains javac langtools for JDK 9, but targets JRE 8 182 183 ### gcr.io/kythe-public/bazel-extractor 184 185 Created from 186 [kythe/extractors/bazel](https://github.com/kythe/kythe/blob/master/kythe/extractors/bazel), 187 this image contains all of the pieces of kythe necessary to extract supported 188 languages - bazel itself, all of the kythe extractors, and the `.bazelrc`. 189 Additionally, it contains necessary tools (including a copy of `kzip-tools` 190 described below), and some required scripts. 191 192 When running this docker image, you must set environment variable 193 `$KYTHE_OUTPUT_DIRECTORY`. 194 195 ### gcr.io/kythe-public/build-preprocessor 196 197 This is a simple wrapper around 198 [kythe/go/extractors/config/preprocessor](https://github.com/kythe/kythe/blob/master/kythe/go/extractors/config/preprocessor/preprocessor.go), 199 which we use to preprocess build configurations to be able to 200 specify all of the above custom javac extraction logic. Supports maven 201 `pom.xml` files and gradle `build.gradle` files. Ironically, bazel extraction 202 doesn't need its `BUILD` files modified, because you can pass extractors 203 directly as `extra_action`, so `build-preprocessor` doesn't support `BUILD` 204 files. 205 206 ### gcr.io/kythe-public/kzip-tools 207 208 This image exposes the binary 209 [kythe/go/platform/tools/kzip](https://github.com/kythe/kythe/blob/master/kythe/go/platform/tools/kzip/kzip.go), 210 which currently supports merging multiple kzips together and creating trivial 211 kzips from the command line. 212 213 ## Troubleshooting 214 215 ### Generic failure to use gcloud 216 217 Make sure you've followed the setup setps above in [Cloud Build](#cloud-build), 218 especially `gcloud auth login`. 219 220 ### Step #N: fatal: could not read Username for 'https://github.com': No such device or address 221 222 This, confusingly, could be two completely separate errors. First, and simpler 223 to check, you could have just spelled the repo incorrectly. If you have a 224 typo in the repo name, instead of telling you "repo doesn't exist" or something, 225 the failure message is the above error about "could not read Username". 226 227 If you have verified that the repo name is spelled correctly, then you may be 228 trying to access a private git repo. It is possible to clone out of a private 229 git repo, but you need to follow some extra steps. This will involve using 230 Cloud KMS, and the steps are described in this 231 [Cloud Build Help 232 Doc](https://cloud.google.com/cloud-build/docs/access-private-github-repos). 233 This will involve adding extra steps to your `.yaml` file for decrypting a 234 provided key and using it to authenticate with git. Finally, your existing git 235 clone step will need to be modified to use the same root volume as your two new 236 steps.