kythe.io@v0.0.68-0.20240422202219-7225dbc01741/kythe/extractors/gcp/README.md (about)

     1  # Kythe Extracting on GCP
     2  
     3  This package contains nothing of note, but will eventually support extracting
     4  Kythe Compilation Units on Google Cloud Platform.
     5  
     6  ## Cloud Build
     7  
     8  Documentation for Cloud Build itself is available at
     9  https://cloud.google.com/cloud-build/.
    10  
    11  For the rest of this test documentation, we'll assume you've run those setup
    12  instructions.  Additionally, you should make an environment variable for your
    13  gs bucket:
    14  
    15  ```
    16  export BUCKET_NAME="your-bucket-name"
    17  ```
    18  
    19  ## Hello World Test
    20  
    21  To make sure you have done setup correctly, we have an example binary at
    22  `kythe/extractors/gcp/examples/helloworld`, which you can run as follows:
    23  
    24  ```
    25  gcloud builds submit --config examples/helloworld/helloworld.yaml \
    26    --substitutions=_BUCKET_NAME="$BUCKET_NAME"\
    27    examples/helloworld
    28  ```
    29  
    30  If that fails, you have to go back up to the [Cloud Build](#cloud-build) section
    31  and follow the installation steps.  Of note, you will have to install `gcloud`,
    32  authorize it, associate it with a valid project id, create a test gs bucket.
    33  
    34  ## Maven Proof of Concept
    35  
    36  To extract a maven repository using Kythe on Cloud Build, use
    37  `examples/mvn.yaml`.  This assumes that you will specify a maven repository
    38  in `_REPO`, and that the repository has a top-level `pom.xml` file (right
    39  now it is a hard-coded location, but in the future it will be configurable).
    40  This also assumes you specify `$BUCKET_NAME` as per the Hello World Test above.
    41  `_CORPUS` can be any identifying string for your repo, for example: "guava".
    42  
    43  ```
    44  gcloud builds submit --config examples/mvn.yaml \
    45    --substitutions=\
    46  _BUCKET_NAME=$BUCKET_NAME,\
    47  _REPO=https://github.com/project-name/repo-name,\
    48  _COMMIT=<version-hash>,\
    49  _CORPUS=repo-name\
    50    --no-source
    51  ```
    52  
    53  ### Guava specific example
    54  
    55  To extract multiple parts of https://github.com/google/guava, use
    56  `examples/guava-mvn.yaml`.
    57  
    58  ```
    59  gcloud builds submit --config examples/guava-mvn.yaml \
    60    --substitutions=\
    61  _BUCKET_NAME=$BUCKET_NAME,\
    62  _COMMIT=<commit-hash>,\
    63    --no-source
    64  ```
    65  
    66  This outputs `guava-<commit-hash>.kzip` to `$BUCKET_NAME` on Google Cloud Storage.
    67  
    68  This is a reasonable example of a maven project which has already specified
    69  the requisite `maven-compiler-plugin` bits in their `pom.xml` files to support
    70  Kythe extraction, and also a project which has multiple modules.
    71  
    72  Note however not all directories from guava extract with the top-level action.
    73  For example if you want to extract the android copy of guava that lives inside
    74  of the guava tree, you would need a slightly different action:
    75  
    76  ```
    77  gcloud builds submit --config examples/guava-android-mvn.yaml \
    78    --substitutions=\
    79  _BUCKET_NAME=$BUCKET_NAME,\
    80  _COMMIT=<commit-hash>\
    81    --no-source
    82  ```
    83  
    84  This outputs `guava-android-<commit-hash>kzip` to `$BUCKET_NAME` on GCS.
    85  
    86  ## Gradle Proof of Concept
    87  
    88  Gradle is extracted similarly:
    89  
    90  ```
    91  gcloud builds submit --config examples/gradle.yaml \
    92    --substitutions=\
    93  _BUCKET_NAME=$BUCKET_NAME,\
    94  _REPO=https://github.com/project-name/repo-name,\
    95  _COMMIT=<version-hash>,\
    96  _CORPUS=repo-name\
    97    --no-source
    98  ```
    99  
   100  ## Bazel Extraction
   101  
   102  * `_COMMIT`: git repository commit to checkout, build, and extract
   103  * `_REPO`: source git repository URL
   104  * `_BUCKET_NAME`: GCS bucket name to store extracted compilations
   105  * `_CORPUS`: Kythe corpus label
   106  
   107  ```shell
   108  # Extract github.com/angular/angular at commit 8accc98
   109  gcloud builds submit --no-source --config kythe/extractors/gcp/bazel/bazel.yaml \
   110    --substitutions=_REPO=https://github.com/angular/angular.git,\
   111  _COMMIT=8accc98d28249628e84136d7306fdbbe1f4caaef,\
   112  _BUCKET_NAME=$BUCKET_NAME,\
   113  _CORPUS=github.com/angular/angular
   114  
   115  # Extract github.com/bazelbuild/bazel at commit 22d375b
   116  gcloud builds submit --no-source --config kythe/extractors/gcp/bazel/bazel.yaml \
   117    --substitutions=_REPO=https://github.com/bazelbuild/bazel.git,\
   118  _COMMIT=22d375bd532b04bb83f18a7770e5080e23a1d517,\
   119  _BUCKET_NAME=$BUCKET_NAME,\
   120  _CORPUS=github.com/bazelbuild/bazel
   121  
   122  # Extract github.com/protocolbuffers/protobuf at commit e728325
   123  gcloud builds submit --no-source --config kythe/extractors/gcp/bazel/bazel.yaml \
   124    --substitutions=_REPO=https://github.com/protocolbuffers/protobuf.git,\
   125  _COMMIT=e7283254d6eb01ddfdb63cc3c89cd312e2d354d5,\
   126  _BUCKET_NAME=$BUCKET_NAME,\
   127  _CORPUS=github.com/protocolbuffers/protobuf
   128  ```
   129  
   130  ## Cloud Build REST API
   131  
   132  Cloud Build has a REST API described at
   133  https://cloud.google.com/cloud-build/docs/api/reference/rest/.  For Kythe
   134  extraction, we have a test binary that lets you isolate authentication problems
   135  before dealing with real builds.
   136  
   137  You will need access to your project's service credentials:
   138  
   139  https://cloud.google.com/docs/authentication/production#obtaining_and_providing_service_account_credentials_manually
   140  
   141  If your team already has credentials made for this purpose, see if you can
   142  re-use them.
   143  
   144  If not, you can use these steps to create new credentials:
   145  
   146  1. In your GCP console, click on the top left hamburger icon
   147  2. Click on APIs & Services
   148  3. In the dropdown, click on Credentials
   149  4. Now you can mostly follow the instructions from the [above link](https://cloud.google.com/docs/authentication/production#obtaining_and_providing_service_account_credentials_manually),
   150     however note:
   151  5. When making a service account key, you can select the Cloud Build roles,
   152     instead of "project owner", to have better limiting of resources.
   153  6. You will still download the json file and set environment variable
   154     `GOOGLE_APPLICATION_CREDENTIALS` as described in the above link.
   155  
   156  To test, run
   157  
   158  ```
   159  bazel build kythe/extractors/gcp/examples/restcheck:rest_auth_check
   160  ./bazel-bin/kythe/extractors/gcp/examples/restcheck/rest_auth_check -project_id=some-project
   161  ```
   162  
   163  If that returns with a 403 error, you likely did the authentication steps above
   164  incorrectly.
   165  
   166  ## Associated extractor images
   167  
   168  Kythe team maintains a few images useful for extracting Kythe data on Google
   169  Cloud Build.  Many of these are used in example scripts and other generated GCB
   170  executions in Kythe.
   171  
   172  ### gcr.io/kythe-public/kythe-javac-extractor-artifacts
   173  
   174  Created from
   175  [kythe/java/com/google/devtools/kythe/extractors/java/artifacts](https://github.com/kythe/kythe/blob/master/kythe/java/com/google/devtools/kythe/extractors/java/artifacts),
   176  this image contains:
   177  
   178  * `javac-wrapper.sh` script which calls Kythe extraction and then an actual java
   179    compiler
   180  * `javac_extractor.jar` which is the Kythe java extractor
   181  * `javac9_tools.jar` which contains javac langtools for JDK 9, but targets JRE 8
   182  
   183  ### gcr.io/kythe-public/bazel-extractor
   184  
   185  Created from
   186  [kythe/extractors/bazel](https://github.com/kythe/kythe/blob/master/kythe/extractors/bazel),
   187  this image contains all of the pieces of kythe necessary to extract supported
   188  languages - bazel itself, all of the kythe extractors, and the `.bazelrc`.
   189  Additionally, it contains necessary tools (including a copy of `kzip-tools`
   190  described below), and some required scripts.
   191  
   192  When running this docker image, you must set environment variable
   193  `$KYTHE_OUTPUT_DIRECTORY`.
   194  
   195  ### gcr.io/kythe-public/build-preprocessor
   196  
   197  This is a simple wrapper around
   198  [kythe/go/extractors/config/preprocessor](https://github.com/kythe/kythe/blob/master/kythe/go/extractors/config/preprocessor/preprocessor.go),
   199  which we use to preprocess build configurations to be able to
   200  specify all of the above custom javac extraction logic.  Supports maven
   201  `pom.xml` files and gradle `build.gradle` files.  Ironically, bazel extraction
   202  doesn't need its `BUILD` files modified, because you can pass extractors
   203  directly as `extra_action`, so `build-preprocessor` doesn't support `BUILD`
   204  files.
   205  
   206  ### gcr.io/kythe-public/kzip-tools
   207  
   208  This image exposes the binary
   209  [kythe/go/platform/tools/kzip](https://github.com/kythe/kythe/blob/master/kythe/go/platform/tools/kzip/kzip.go),
   210  which currently supports merging multiple kzips together and creating trivial
   211  kzips from the command line.
   212  
   213  ## Troubleshooting
   214  
   215  ### Generic failure to use gcloud
   216  
   217  Make sure you've followed the setup setps above in [Cloud Build](#cloud-build),
   218  especially `gcloud auth login`.
   219  
   220  ### Step #N: fatal: could not read Username for 'https://github.com': No such device or address
   221  
   222  This, confusingly, could be two completely separate errors.  First, and simpler
   223  to check, you could have just spelled the repo incorrectly.  If you have a
   224  typo in the repo name, instead of telling you "repo doesn't exist" or something,
   225  the failure message is the above error about "could not read Username".
   226  
   227  If you have verified that the repo name is spelled correctly, then you may be
   228  trying to access a private git repo.  It is possible to clone out of a private
   229  git repo, but you need to follow some extra steps.  This will involve using
   230  Cloud KMS, and the steps are described in this
   231  [Cloud Build Help
   232  Doc](https://cloud.google.com/cloud-build/docs/access-private-github-repos).
   233  This will involve adding extra steps to your `.yaml` file for decrypting a
   234  provided key and using it to authenticate with git.  Finally, your existing git
   235  clone step will need to be modified to use the same root volume as your two new
   236  steps.