k8s.io/test-infra@v0.0.0-20240520184403-27c6b4c223d8/docs/playbooks/prow.md (about)

     1  # Prow Playbook
     2  
     3  This is the playbook for Prow. See also [the playbook index][playbooks].
     4  
     5  TDLR: Prow is a set of CI services.
     6  
     7  The [prow OWNERS][prow OWNERS] are a potential point of contact for more info.
     8  
     9  For in depth details about the project see the [prow README][prow README].
    10  
    11  ## Prow deployment
    12  
    13  Prow is composed of a service cluster, and one or more build clusters
    14  - **service cluster**: runs prow components, responsible for handling GitHub
    15    events and scheduling ProwJob CRDs
    16  - **build cluster**: runs Pods that implement ProwJob CRDs
    17  
    18  Each build cluster may have additional components deployed:
    19  - **boskos**: responsible for managing pools of GCP projects
    20  - **greenhouse**: implements a remote bazel cache
    21  - **ghproxy**: reverse proxy HTTP cache optimized for use with the GitHub API
    22  - **kubernetes-external-secrets**: updates Kubernetes Secrets with values from 
    23    external secret stores such as Google Secret Manager
    24  
    25  Each cluster is a GKE cluster, living in its own GCP project, which may live
    26  in separate GCP organizations:
    27  - **google.com**: the Google-owned GCP project
    28  - **kubernetes.io**: the community-owned GCP project
    29  
    30  ### kubernetes prow service cluster aka prow.k8s.io
    31  
    32  - The kubernetes prow service cluster, exposed as https://prow.k8s.io
    33  - Lives in google.com GCP project k8s-prow
    34  - Infra manually managed
    35  - Kubernetes manifests live in /config/prow/cluster
    36  - Owner access given to Google employees in test-infra-oncall
    37  - Viewer access given to Google employees
    38  - Logs available via Google Cloud Logging
    39  
    40  ### default
    41  
    42  - The default prow build cluster for prow.k8s.io
    43  - Lives in google.com GCP project k8s-prow-builds
    44  - Infra manually managed
    45  - Kubernetes manifests live in /config/prow/cluster
    46  - Owner access given to Google employees in test-infra-oncall
    47  - Viewer access given to Google employees
    48  - Additional components: boskos, greenhouse
    49  - Logs available via Google Cloud Logging
    50  
    51  ### test-infra-trusted
    52  
    53  - The google.com-owned build cluster for trusted jobs that need access to sensitive secrets
    54  - Is the kubernetes prow service cluster, under a different name
    55  
    56  ### k8s-infra-prow-build
    57  
    58  - The community-owned prow build cluster
    59  - Lives in kubernetes.io GCP project k8s-infra-prow-build
    60  - Infra managed via terraform in k8s.io/infra/gcp/terraform/k8s-infra-prow-build/prow-build
    61  - Kubernetes manifests live in k8s.io/infra/gcp/terraform/k8s-infra-prow-build/prow-build/resources
    62  - Owner access given to k8s-infra-prow-oncall@kubernetes.io
    63  - Viewer access given to k8s-infra-prow-viewers@kubernetes.io
    64  - Kubernetes API access restricted to internal networks, must use google cloud shell
    65  - Additional components: boskos, greenhouse, kubernetes-external-secrets
    66  - [k8s-infra-prow-build dashboard](https://console.cloud.google.com/monitoring/dashboards/custom/10925237040785467832?project=k8s-infra-prow-build&timeDomain=1d)
    67  - [k8s-infra-prow-build logs](https://console.cloud.google.com/logs/query?project=k8s-infra-prow-build)
    68  
    69  ### k8s-infra-prow-build-trusted
    70  
    71  - The community-owned prow build cluster for trusted jobs that need access to sensitive secrets
    72  - Lives in kubernetes.io GCP project k8s-infra-prow-build-trusted
    73  - Infra managed via terraform in k8s.io/infra/gcp/terraform/k8s-infra-prow-build-trusted/prow-build-trusted
    74  - Kubernetes manifests live in k8s.io/infra/gcp/terraform/k8s-infra-prow-build-trusted/prow-build-trusted/resources
    75  - Owner access given k8s-infra-prow-oncall@kubernetes.io
    76  - Viewer access given to k8s-infra-prow-viewers@kubernetes.io
    77  - Kubernetes API access restricted to internal networks, must use google cloud shell
    78  - [k8s-infra-prow-build-trusted logs](https://console.cloud.google.com/logs/query?project=k8s-infra-prow-build-trusted)
    79  
    80  ### others
    81  
    82  - There are other prow build clusters that prow.k8s.io currently schedules to
    83    that are not directly related to kubernetes CI or community-owned,
    84    e.g. scalability
    85  
    86  ## Logs
    87  
    88  All cluster logs are accessible via google cloud logging. Access to logs
    89  requires Viewer access for the cluster's project.
    90  
    91  If you are a googler checking prow.k8s.io, you may open `go/prow-debug` in your
    92  browser. If you are not a googler but have access to this prow, you can
    93  open [Stackdriver] logs in the `k8s-prow` GCP projects.
    94  
    95  Other prow deployments may have their own logging stack.
    96  
    97  ## Monitoring
    98  
    99  ### Tide dashboards
   100  
   101  Tide merges PRs once label/review requirements satisfied, may re-run tests,
   102  may merge a batch of PRs
   103  
   104  - What is tide doing right now: https://prow.k8s.io/tide
   105  - What has tide been doing: https://prow.k8s.io/tide-history
   106    - e.g. [tide history for kubernetes/kubernetes master](https://prow.k8s.io/tide-history?repo=kubernetes%2Fkubernetes&branch=master)
   107    - lots of "TRIGGER_BATCH" with no "MERGE_BATCH" may mean tests are failing/flaking
   108  
   109  ### ProwJob dashboards
   110  
   111  ProwJobs are CRDs, updated based on the status of whatever is responsible for
   112  implementing them (usually Kubernetes Pods scheduled to a prow build cluster).
   113  Due to the volume of traffic prow.k8s.io handles, and in order to keep things
   114  responsive, ProwJob CRDs are only retained for 48h. This affects the amount of
   115  history available on the following dashboards
   116  
   117  - How many ProwJob CRDs exist right now: https://monitoring.prow.k8s.io/d/e1778910572e3552a935c2035ce80369/plank-dashboard
   118    - e.g. [all kubernetes/kubernetes ProwJob CRDs over the last 7d](https://monitoring.prow.k8s.io/d/e1778910572e3552a935c2035ce80369/plank-dashboard?orgId=1&from=now-7d&to=now&var-cluster=All&var-org=kubernetes&var-repo=kubernetes&var-state=$__all&var-type=$__all&var-group_by_1=type&var-group_by_2=state&var-group_by_3=cluster)
   119    - plots count of ProwJob CRDs in prow service cluster's registry, filtered/group by relevant fields
   120  
   121  - What ProwJobs are scheduled right now: https://prow.k8s.io/
   122    - e.g. [all prowjobs running as pods in k8s-infra-prow-build](https://prow.k8s.io/?cluster=k8s-infra-prow-build)
   123    - e.g. [all kubernetes/kubernetes presubmits](https://prow.k8s.io/?repo=kubernetes%2Fkubernetes&type=presubmit)
   124  
   125  ### Build cluster dashboards
   126  
   127  Access to these dashboards requires Viewer access for the cluster's project.
   128  This is available to members of k8s-infra-prow-oncall@kubernetes.io and
   129  k8s-infra-prow-viewers@kubernetes.io
   130  
   131  - What resources is k8s-infra-prow-build using right now: https://console.cloud.google.com/monitoring/dashboards/builder/10925237040785467832?project=k8s-infra-prow-build&timeDomain=6h
   132  - What resources are used by jobs on k8s-infra-prow-build: https://console.cloud.google.com/monitoring/dashboards/builder/10510319052103514664?project=k8s-infra-prow-build&timeDomain=1h
   133  
   134  ## Options
   135  
   136  The following well-known options are available for dealing with prow
   137  service issues.
   138  
   139  ### Rolling Back
   140  
   141  For prow.k8s.io you can simply use `experiment/revert-bump.sh` to roll back
   142  to the last checked in deployment version.
   143  
   144  If prow is at least somewhat healthy, filing and merging PR from this will 
   145  result in the rolled back version being deployed.
   146  
   147  If not, you may need to manually run `make -C config/prow deploy-all`.
   148  
   149  
   150  ## Known Issues
   151  
   152  
   153  <!--URLS-->
   154  [prow OWNERS]: /prow/OWNERS
   155  [prow README]: /prow/README.md
   156  [playbooks]: /docs/playbooks/README.md
   157  <!--Additional URLS-->
   158  [cluster]: /config/cluster
   159  [prow-k8s-io]: https://prow.k8s.io
   160  [Stackdriver]: https://cloud.google.com/stackdriver/
   161  
   162  [k8s-infra/prowjob-resource-usage]: https://console.cloud.google.com/monitoring/dashboards/custom/10510319052103514664?authuser=1&project=k8s-infra-prow-build&timeDomain=1d
   163  [k8s-infra/prow-build]: https://console.cloud.google.com/monitoring/dashboards/custom/10510319052103514664?authuser=1&project=k8s-infra-prow-build&timeDomain=1d