go.chromium.org/luci@v0.0.0-20240309015107-7cdc2e660f33/luciexe/doc.go (about) 1 // Copyright 2019 The LUCI Authors. 2 // 3 // Licensed under the Apache License, Version 2.0 (the "License"); 4 // you may not use this file except in compliance with the License. 5 // You may obtain a copy of the License at 6 // 7 // http://www.apache.org/licenses/LICENSE-2.0 8 // 9 // Unless required by applicable law or agreed to in writing, software 10 // distributed under the License is distributed on an "AS IS" BASIS, 11 // WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 12 // See the License for the specific language governing permissions and 13 // limitations under the License. 14 15 // Package luciexe documents the "LUCI Executable" protocol, and contains 16 // constants which are part of this protocol. 17 // 18 // # Summary 19 // 20 // A LUCI Executable ("luciexe") is a binary which implements a protocol to: 21 // 22 // - Pass the initial state of the 'build' from a parent process to the luciexe. 23 // - Understand the build's local system contracts (like the location of cached data). 24 // - Asynchronously update the state of the build as it runs. 25 // 26 // This protocol is recursive; A luciexe can run another luciexe such that the 27 // child's updates are reflected on the parent's output. 28 // 29 // The protocol has 3 parts: 30 // 31 // - Host Application - This sits at the top of the luciexe process 32 // hierarchy and sets up singleton environmental requirements for the whole 33 // tree. 34 // - Invocation of a luciexe binary - This invocation process occurs both 35 // for the topmost luciexe, as well as all internal invocations of other 36 // luciexe's within the process hierarchy. 37 // - The luciexe binary - The binary has a couple of responsibilities to be 38 // compatible with this protocol. Once the binary has fulfilled it's 39 // responsibilities it's free to do what it wants (i.e. actually do its 40 // task). 41 // 42 // In general, we strive where possible to minimize the complexity of the 43 // luciexe binary. This is because we expect to have a small number of 'host 44 // application' implementations, and a relatively large number of 'luciexe' 45 // implementations. 46 // 47 // # The Host Application 48 // 49 // At the root of every tree of luciexe invocations there is a 'host' 50 // application which sets up an manages all environmental singletons (like the 51 // Logdog 'butler' service, the LUCI ambient authentication service, etc.). This 52 // Host Application is responsible for intercepting and merging all 53 // 'build.proto' streams emitted within this tree of luciexes (see "Recursive 54 // Invocation"). The Host Application may choose what happens to these 55 // intercepted build.proto messages. 56 // 57 // The Host Application MUST: 58 // - Run a logdog butler service and expose all relevant LOGDOG_* environment 59 // variables such that the following client libraries can stream log data: 60 // - Golang: go.chromium.org/luci/logdog/client/butlerlib/bootstrap 61 // - Python: infra_libs.logdog.bootstrap 62 // - Hook the butler to intercept and merge build.proto streams into a single 63 // build.proto (zlib-compressed) stream. 64 // - Set up a local LUCI ambient authentication service which luciexe's can 65 // use to mint auth tokens. 66 // - Prepare an empty directory which will house tempdirs and workdirs for 67 // all luciexe invocations. The Host Application MAY clean this directory 68 // up, but it may be useful to leak it for debugging. It's permissible for 69 // the Host Application to defer this cleanup to an external process (e.g. 70 // buildbucket's agent may defer this to swarming). 71 // 72 // The Host Application MAY hook additional streams for debugging/logging; it is 73 // frequently convenient to hook the stderr/stdout streams from the top level 74 // luciexe and tee them to the Host Application's stdout/stderr. 75 // 76 // For example: the `go.chromium.org/luci/buildbucket/cmd/agent` binary forwards 77 // these merged build.proto messages to the Buildbucket service, and also 78 // uploads all streams to them to the Logdog cloud service. Other host 79 // implementations may instead choose to write all streams to disk, send them to 80 // /dev/null or render them as html. However, from the point of view of the 81 // luciexe that they run, this is transparent. 82 // 83 // Host Applications MAY implement 'backpressure' on the luciexe binaries by 84 // throttling the rate at which the Logdog butler accepts data on its various 85 // streams. However, doing this could introduce timing issues in the luciexe 86 // binaries as they try to run, so this should be done thoughtfully. 87 // 88 // If a Host Application detects a protocol violation from a luciexe within its 89 // purview, it SHOULD report the violation (in a manner of its choosing) and 90 // MUST consider the entire Build status to be INFRA_FAILURE. In addition the 91 // Host Application SHOULD attempt to kill (via process group SIGTERM/SIGKILL on 92 // *nix, and CTRL+BREAK/Terminate on windows) the luciexe hierarchy. The host 93 // application MAY provide a window of time between the initial "please stop" 94 // signal and the "you die now" signal, but this comes with the usual caveats of 95 // cleanups and deadlines (notably: best-effort clean up is just that: 96 // best-effort. It cannot be relied on to run to completion (or to run 97 // completely)). 98 // 99 // # Invocation 100 // 101 // When invoking a luciexe, the parent process has a couple responsibilities. It 102 // must: 103 // 104 // - Set $TEMPDIR, $TMPDIR, $TEMP, $TMP and $MAC_CHROMIUM_TMPDIR to all point 105 // to the same, empty directory. 106 // This directory MUST be located on the same file system as CWD. 107 // This directory MUST NOT be the same as CWD. 108 // 109 // - Set $LUCI_CONTEXT["luciexe"]["cache_dir"] to a cache dir which makes sense 110 // for the luciexe. 111 // The cache dir MAY persist/be shared between luciexe invocations. 112 // The cache dir MAY NOT be on the same filesystem as CWD. 113 // 114 // - Set the $LOGDOG_NAMESPACE to a prefix which namespaces all logdog streams 115 // generated from the luciexe. 116 // 117 // - Set the Status of initial buildbucket.v2.Build message to `STARTED`. 118 // 119 // - Set the CreateTime and StartTime of initial buildbucket.v2.Build message. 120 // 121 // - Clear following fields in the initial buildbucket.v2.Build message. 122 // 123 // - EndTime 124 // 125 // - Output 126 // 127 // - StatusDetails 128 // 129 // - Steps 130 // 131 // - SummaryMarkdown 132 // 133 // - UpdateTime 134 // 135 // The CWD is up to your application. Some contexts (like Buildbucket) will 136 // guarantee an empty CWD, but others (like recursive invocation) may explicitly 137 // share CWD between multiple luciexe's. 138 // 139 // The tempdir and workdir paths SHOULD NOT be cleaned up by the invoking 140 // process. Instead, the invoking process should defer to the Host Application 141 // to provide this cleanup, since the Host Application may be configured to leak 142 // these for debugging purposes. 143 // 144 // The invoker MUST attach the stdout/stderr to the logdog butler as text 145 // streams. These MUST be located at `$LOGDOG_NAMESPACE/std{out,err}`. 146 // Typical luciexe implementations will use these for debug logging and output, 147 // but are not required to do so. 148 // 149 // The invoker MUST write a binary-encoded buildbucket.v2.Build to the stdin of 150 // the luciexe which contains all the input parameters that the luciexe needs to 151 // know to run successfully. 152 // 153 // # The luciexe binary 154 // 155 // Once running, the luciexe MUST read a binary-encoded buildbucket.v2.Build 156 // message from stdin until EOF. 157 // 158 // As per the invoker's responsibility, the luciexe binary MAY expect the 159 // status of the initial Build message to be `STARTED` and CreateTime and 160 // StartTime are populated. It MAY also expect following fields to be empty. 161 // It MUST NOT assume other fields in the Build message are set. However, the 162 // Host Application or invoker MAY fill in other fields they think are useful. 163 // 164 // EndTime 165 // Output 166 // StatusDetails 167 // Steps 168 // SummaryMarkdown 169 // Tags 170 // UpdateTime 171 // 172 // As per the Host Application's responsibilities, the luciexe binary MAY expect 173 // the "luciexe" and "local_auth" sections of LUCI_CONTEXT to be filled. Other 174 // sections of LUCI_CONTEXT MAY also be filled. See the LUCI_CONTEXT docs: 175 // https://chromium.googlesource.com/infra/luci/luci-py/+/HEAD/client/LUCI_CONTEXT.md 176 // 177 // !!NOTE!! The paths supplied to the luciexe MUST NOT be considered stable 178 // across invocations. Do not hard-code these, and try not to rely on their 179 // consistency (e.g. for build reproducibility). 180 // 181 // # The luciexe binary - Updating the Build state 182 // 183 // A luciexe MAY update the Build state by writing to a "build.proto" Logdog 184 // stream named "$LOGDOG_NAMESPACE/build.proto". A "build.proto" Logdog stream 185 // is defined as: 186 // 187 // Content-Type: "application/luci+proto; message=buildbucket.v2.Build" 188 // Type: Datagram 189 // 190 // Additionally, a build.proto stream MAY append "; encoding=zlib" to the 191 // Content-Type (and compress each message accordingly). This is useful for when 192 // you potentially have very large builds. 193 // 194 // Each datagram MUST be a valid binary-encoded buildbucket.v2.Build message. 195 // The state of the build is defined as the last Build message sent on this 196 // stream. There's no implicit accumulation between sent Build messages. 197 // 198 // The Step.Log.Url field in the emitted Build messages can be either absolute 199 // (has a valid url scheme) or relative. If an absolute `Url` is provided, the 200 // luciexe is also responsible for supplying corresponding `ViewUrl` and the 201 // host application won't validate or make any adjustments to those urls. This 202 // allows luciexe to include logs from other sources into this build. If a 203 // relative `Url` is provided, the host application will namespace the `Url` 204 // with $LOGDOG_NAMESPACE of the build.proto stream. For example, if the host 205 // application is parsing a Build.proto in a stream named 206 // "logdog://host/project/prefix/+/something/build.proto", then a Log with a Url 207 // of "hello/world/stdout" will be transformed into: 208 // 209 // Url: logdog://host/project/prefix/+/something/hello/world/stdout 210 // ViewUrl: <implementation defined> 211 // 212 // The `ViewUrl` field in this case SHOULD be left empty, and will be filled in 213 // by the host application running the luciexe (if supplied it will be 214 // overwritten). 215 // 216 // The following Build fields will be read from the luciexe-controlled 217 // build.proto stream: 218 // 219 // EndTime 220 // Output 221 // Status 222 // StatusDetails 223 // Steps 224 // SummaryMarkdown 225 // Tags 226 // UpdateTime 227 // 228 // # The luciexe binary - Reporting final status 229 // 230 // A luciexe MUST report its success/failure by sending a Build message with 231 // a terminal `status` value before exiting. If the luciexe exits before sending 232 // a Build message with a terminal Status, the invoking application MUST 233 // interpret this as an INFRA_FAILURE status. The exit code of the luciexe 234 // SHOULD be ignored, except for advisory (logging/reporting) purposes. The host 235 // application MUST detect this case and fill in a final status of 236 // INFRA_FAILURE, but MUST NOT terminate the process hierarchy in this case. 237 // 238 // # Recursive Invocation 239 // 240 // To support recursive invocation, a luciexe MUST accept the flag: 241 // 242 // --output=path/to/file.{pb,json,textpb} 243 // 244 // The value of this flag MUST be an absolute path to a non-existent file in 245 // an existing directory. The extension of the file dictates the data format 246 // (binary, json or text protobuf). The luciexe MUST write it's final Build 247 // message to this file in the correct format. If `--output` is specified, 248 // but no Build message (or an invalid/improperly formatted Build message) 249 // is written, the caller MUST interpret this as an INFRA_FAILURE status. 250 // 251 // NOTE: JSON outputs SHOULD be written with the original proto field names, 252 // not the lowerCamelCase names; downstream users may not be using jsonpb 253 // unmarshallers to interpret the JSON data. 254 // 255 // This may need to be revised in a subsequent version of this API 256 // specification. 257 // 258 // LUCI Executables MAY invoke other LUCI Executables as sub-steps and have the 259 // Steps from the child luciexe show in the parent's Build updates. This is one 260 // of the responsibilities of the Host Application. 261 // 262 // The parent can achieve this by recording a Step S (with no children), and 263 // a Step.MergeBuild.from_logdog_stream which points to a "build.proto" stream 264 // (see "Updating the Build State"). This is called a "Merge Step", and is 265 // a directive for the host to merge Build messages from in the 266 // from_logdog_stream log here. Note that this stream name should follow the 267 // same provisions specified for "Step.Log.Url". It's the invoker's 268 // responsibility to populate $LOGDOG_NAMESPACE with the new, full, namespace. 269 // Failure to do this will result in missing logs/step data. 270 // 271 // (SIDENOTE: There's an internal proposal to improve the Logdog "butler" API so 272 // that application code only needs to handle the relative namespaces as well, 273 // which would make this much less confusing). 274 // 275 // The Host Application MUST append all steps from the child build.proto 276 // stream to the parent build as substeps of step S and copy the following 277 // fields of the child Build to the equivalent fields of step S *only if* 278 // step S has *non-final* status. It is the caller's responsibility to populate 279 // rest of the fields of step S if the caller explicitly marks the step 280 // status as final. 281 // 282 // SummaryMarkdown 283 // Status 284 // EndTime 285 // Output.Logs (appended) 286 // 287 // This rule applies recursively, i.e. the child build MAY have Merge Step(s). 288 // 289 // Each luciexe's step names should be emitted as relative names. e.g. say 290 // a build runs a sub-luciexe with the name "a|b". This sub-luciexe then runs 291 // a step "x|y|z". The top level build.proto stream will show the step 292 // "a|b|x|y|z". 293 // 294 // The graph of datagram streams MUST be a tree. This follows from the 295 // namespacing rules of the Log.Url fields; Since Log.Url fields are relative to 296 // their build's namespace, it's only possible to have a merge step point 297 // further 'down' the tree, making it impossible to create a cycle. 298 // 299 // # Recursive Invocation - Legacy ChromeOS style 300 // 301 // Note that there's an option in MergeBuild called 'legacy_global_namespace'. 302 // This mode is NOT RECOMMENDED, and was only added to support legacy ChromeOS 303 // builders which relied on this functionality prior to luciexe. 304 // 305 // When this option is turned on, output properties on the build stream will be 306 // merged into the top-level build containing the Merge Step, and it will also 307 // de-namespace the step names (i.e. a MergeStep called "Parent" whose 308 // build.proto stream contains a name "Step" would normally show on the build as 309 // "Parent|Step", will instead simply show as "Step" in this mode. This also 310 // means that these merge step processes must make sure not to emit a step which 311 // duplicates the name of one emitted in the parent build (otherwise the overall 312 // build will be invalid)). 313 // 314 // Using this functionality has the downside that properties emitted by the 315 // parent build WILL BE OVERWRITTEN by the merged build stream, if written to 316 // the same top-level property keys. In ChromeOS's case this is "OK" because the 317 // legacy recipes do a short bootstrap followed by the main payload which does 318 // all the actual stuff. 319 // 320 // There's a version of this feature which would be more-supportable, which 321 // would be to introduce namespaced outputs so that the top-level build could 322 // reflect the outputs from a given step (i.e. allow Step to actually have its 323 // own Output message). If you feel like you need this functionality, please 324 // contact ChOps Foundation team to discuss, rather than trying to use the 325 // legacy_global_namespace option. 326 // 327 // # Related Libraries 328 // 329 // For implementation-level details, please refer to the following: 330 // 331 // - "go.chromium.org/luci/luciexe" - low-level protocol details (this module). 332 // - "go.chromium.org/luci/luciexe/host" - the Host Application library. 333 // - "go.chromium.org/luci/luciexe/invoke" - luciexe invocation. 334 // - "go.chromium.org/luci/luciexe/exe" - luciexe binary helper library. 335 // 336 // # Other Client Implementations 337 // 338 // Python Recipes (https://chromium.googlesource.com/infra/luci/recipes-py) 339 // implement the LUCI Executable protocol using the "luciexe" subcommand. 340 // 341 // TODO(iannucci): Implement a luciexe binary helper in `infra_libs` analogous 342 // to go.chromium.org/luci/luciexe/exe and implement Recipes' support in terms 343 // of this. 344 // 345 // # LUCI Executables on Buildbucket 346 // 347 // Buildbucket accepts LUCI Executables as CIPD packages containing the 348 // luciexe to run with the fixed name of "luciexe". 349 // 350 // On Windows, this may be named "luciexe.exe" or "luciexe.bat". Buildbucket 351 // will prefer the first of these which it finds. 352 // 353 // Note that the `recipes.py bundle` command generates a "luciexe" wrapper 354 // script for compatibility with Buildbucket. 355 package luciexe