k8s.io/test-infra@v0.0.0-20240520184403-27c6b4c223d8/docs/playbooks/prow.md (about) 1 # Prow Playbook 2 3 This is the playbook for Prow. See also [the playbook index][playbooks]. 4 5 TDLR: Prow is a set of CI services. 6 7 The [prow OWNERS][prow OWNERS] are a potential point of contact for more info. 8 9 For in depth details about the project see the [prow README][prow README]. 10 11 ## Prow deployment 12 13 Prow is composed of a service cluster, and one or more build clusters 14 - **service cluster**: runs prow components, responsible for handling GitHub 15 events and scheduling ProwJob CRDs 16 - **build cluster**: runs Pods that implement ProwJob CRDs 17 18 Each build cluster may have additional components deployed: 19 - **boskos**: responsible for managing pools of GCP projects 20 - **greenhouse**: implements a remote bazel cache 21 - **ghproxy**: reverse proxy HTTP cache optimized for use with the GitHub API 22 - **kubernetes-external-secrets**: updates Kubernetes Secrets with values from 23 external secret stores such as Google Secret Manager 24 25 Each cluster is a GKE cluster, living in its own GCP project, which may live 26 in separate GCP organizations: 27 - **google.com**: the Google-owned GCP project 28 - **kubernetes.io**: the community-owned GCP project 29 30 ### kubernetes prow service cluster aka prow.k8s.io 31 32 - The kubernetes prow service cluster, exposed as https://prow.k8s.io 33 - Lives in google.com GCP project k8s-prow 34 - Infra manually managed 35 - Kubernetes manifests live in /config/prow/cluster 36 - Owner access given to Google employees in test-infra-oncall 37 - Viewer access given to Google employees 38 - Logs available via Google Cloud Logging 39 40 ### default 41 42 - The default prow build cluster for prow.k8s.io 43 - Lives in google.com GCP project k8s-prow-builds 44 - Infra manually managed 45 - Kubernetes manifests live in /config/prow/cluster 46 - Owner access given to Google employees in test-infra-oncall 47 - Viewer access given to Google employees 48 - Additional components: boskos, greenhouse 49 - Logs available via Google Cloud Logging 50 51 ### test-infra-trusted 52 53 - The google.com-owned build cluster for trusted jobs that need access to sensitive secrets 54 - Is the kubernetes prow service cluster, under a different name 55 56 ### k8s-infra-prow-build 57 58 - The community-owned prow build cluster 59 - Lives in kubernetes.io GCP project k8s-infra-prow-build 60 - Infra managed via terraform in k8s.io/infra/gcp/terraform/k8s-infra-prow-build/prow-build 61 - Kubernetes manifests live in k8s.io/infra/gcp/terraform/k8s-infra-prow-build/prow-build/resources 62 - Owner access given to k8s-infra-prow-oncall@kubernetes.io 63 - Viewer access given to k8s-infra-prow-viewers@kubernetes.io 64 - Kubernetes API access restricted to internal networks, must use google cloud shell 65 - Additional components: boskos, greenhouse, kubernetes-external-secrets 66 - [k8s-infra-prow-build dashboard](https://console.cloud.google.com/monitoring/dashboards/custom/10925237040785467832?project=k8s-infra-prow-build&timeDomain=1d) 67 - [k8s-infra-prow-build logs](https://console.cloud.google.com/logs/query?project=k8s-infra-prow-build) 68 69 ### k8s-infra-prow-build-trusted 70 71 - The community-owned prow build cluster for trusted jobs that need access to sensitive secrets 72 - Lives in kubernetes.io GCP project k8s-infra-prow-build-trusted 73 - Infra managed via terraform in k8s.io/infra/gcp/terraform/k8s-infra-prow-build-trusted/prow-build-trusted 74 - Kubernetes manifests live in k8s.io/infra/gcp/terraform/k8s-infra-prow-build-trusted/prow-build-trusted/resources 75 - Owner access given k8s-infra-prow-oncall@kubernetes.io 76 - Viewer access given to k8s-infra-prow-viewers@kubernetes.io 77 - Kubernetes API access restricted to internal networks, must use google cloud shell 78 - [k8s-infra-prow-build-trusted logs](https://console.cloud.google.com/logs/query?project=k8s-infra-prow-build-trusted) 79 80 ### others 81 82 - There are other prow build clusters that prow.k8s.io currently schedules to 83 that are not directly related to kubernetes CI or community-owned, 84 e.g. scalability 85 86 ## Logs 87 88 All cluster logs are accessible via google cloud logging. Access to logs 89 requires Viewer access for the cluster's project. 90 91 If you are a googler checking prow.k8s.io, you may open `go/prow-debug` in your 92 browser. If you are not a googler but have access to this prow, you can 93 open [Stackdriver] logs in the `k8s-prow` GCP projects. 94 95 Other prow deployments may have their own logging stack. 96 97 ## Monitoring 98 99 ### Tide dashboards 100 101 Tide merges PRs once label/review requirements satisfied, may re-run tests, 102 may merge a batch of PRs 103 104 - What is tide doing right now: https://prow.k8s.io/tide 105 - What has tide been doing: https://prow.k8s.io/tide-history 106 - e.g. [tide history for kubernetes/kubernetes master](https://prow.k8s.io/tide-history?repo=kubernetes%2Fkubernetes&branch=master) 107 - lots of "TRIGGER_BATCH" with no "MERGE_BATCH" may mean tests are failing/flaking 108 109 ### ProwJob dashboards 110 111 ProwJobs are CRDs, updated based on the status of whatever is responsible for 112 implementing them (usually Kubernetes Pods scheduled to a prow build cluster). 113 Due to the volume of traffic prow.k8s.io handles, and in order to keep things 114 responsive, ProwJob CRDs are only retained for 48h. This affects the amount of 115 history available on the following dashboards 116 117 - How many ProwJob CRDs exist right now: https://monitoring.prow.k8s.io/d/e1778910572e3552a935c2035ce80369/plank-dashboard 118 - e.g. [all kubernetes/kubernetes ProwJob CRDs over the last 7d](https://monitoring.prow.k8s.io/d/e1778910572e3552a935c2035ce80369/plank-dashboard?orgId=1&from=now-7d&to=now&var-cluster=All&var-org=kubernetes&var-repo=kubernetes&var-state=$__all&var-type=$__all&var-group_by_1=type&var-group_by_2=state&var-group_by_3=cluster) 119 - plots count of ProwJob CRDs in prow service cluster's registry, filtered/group by relevant fields 120 121 - What ProwJobs are scheduled right now: https://prow.k8s.io/ 122 - e.g. [all prowjobs running as pods in k8s-infra-prow-build](https://prow.k8s.io/?cluster=k8s-infra-prow-build) 123 - e.g. [all kubernetes/kubernetes presubmits](https://prow.k8s.io/?repo=kubernetes%2Fkubernetes&type=presubmit) 124 125 ### Build cluster dashboards 126 127 Access to these dashboards requires Viewer access for the cluster's project. 128 This is available to members of k8s-infra-prow-oncall@kubernetes.io and 129 k8s-infra-prow-viewers@kubernetes.io 130 131 - What resources is k8s-infra-prow-build using right now: https://console.cloud.google.com/monitoring/dashboards/builder/10925237040785467832?project=k8s-infra-prow-build&timeDomain=6h 132 - What resources are used by jobs on k8s-infra-prow-build: https://console.cloud.google.com/monitoring/dashboards/builder/10510319052103514664?project=k8s-infra-prow-build&timeDomain=1h 133 134 ## Options 135 136 The following well-known options are available for dealing with prow 137 service issues. 138 139 ### Rolling Back 140 141 For prow.k8s.io you can simply use `experiment/revert-bump.sh` to roll back 142 to the last checked in deployment version. 143 144 If prow is at least somewhat healthy, filing and merging PR from this will 145 result in the rolled back version being deployed. 146 147 If not, you may need to manually run `make -C config/prow deploy-all`. 148 149 150 ## Known Issues 151 152 153 <!--URLS--> 154 [prow OWNERS]: /prow/OWNERS 155 [prow README]: /prow/README.md 156 [playbooks]: /docs/playbooks/README.md 157 <!--Additional URLS--> 158 [cluster]: /config/cluster 159 [prow-k8s-io]: https://prow.k8s.io 160 [Stackdriver]: https://cloud.google.com/stackdriver/ 161 162 [k8s-infra/prowjob-resource-usage]: https://console.cloud.google.com/monitoring/dashboards/custom/10510319052103514664?authuser=1&project=k8s-infra-prow-build&timeDomain=1d 163 [k8s-infra/prow-build]: https://console.cloud.google.com/monitoring/dashboards/custom/10510319052103514664?authuser=1&project=k8s-infra-prow-build&timeDomain=1d