go.chromium.org/luci@v0.0.0-20240309015107-7cdc2e660f33/luciexe/doc.go (about)

     1  // Copyright 2019 The LUCI Authors.
     2  //
     3  // Licensed under the Apache License, Version 2.0 (the "License");
     4  // you may not use this file except in compliance with the License.
     5  // You may obtain a copy of the License at
     6  //
     7  //      http://www.apache.org/licenses/LICENSE-2.0
     8  //
     9  // Unless required by applicable law or agreed to in writing, software
    10  // distributed under the License is distributed on an "AS IS" BASIS,
    11  // WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    12  // See the License for the specific language governing permissions and
    13  // limitations under the License.
    14  
    15  // Package luciexe documents the "LUCI Executable" protocol, and contains
    16  // constants which are part of this protocol.
    17  //
    18  // # Summary
    19  //
    20  // A LUCI Executable ("luciexe") is a binary which implements a protocol to:
    21  //
    22  //   - Pass the initial state of the 'build' from a parent process to the luciexe.
    23  //   - Understand the build's local system contracts (like the location of cached data).
    24  //   - Asynchronously update the state of the build as it runs.
    25  //
    26  // This protocol is recursive; A luciexe can run another luciexe such that the
    27  // child's updates are reflected on the parent's output.
    28  //
    29  // The protocol has 3 parts:
    30  //
    31  //   - Host Application - This sits at the top of the luciexe process
    32  //     hierarchy and sets up singleton environmental requirements for the whole
    33  //     tree.
    34  //   - Invocation of a luciexe binary - This invocation process occurs both
    35  //     for the topmost luciexe, as well as all internal invocations of other
    36  //     luciexe's within the process hierarchy.
    37  //   - The luciexe binary - The binary has a couple of responsibilities to be
    38  //     compatible with this protocol. Once the binary has fulfilled it's
    39  //     responsibilities it's free to do what it wants (i.e. actually do its
    40  //     task).
    41  //
    42  // In general, we strive where possible to minimize the complexity of the
    43  // luciexe binary. This is because we expect to have a small number of 'host
    44  // application' implementations, and a relatively large number of 'luciexe'
    45  // implementations.
    46  //
    47  // # The Host Application
    48  //
    49  // At the root of every tree of luciexe invocations there is a 'host'
    50  // application which sets up an manages all environmental singletons (like the
    51  // Logdog 'butler' service, the LUCI ambient authentication service, etc.). This
    52  // Host Application is responsible for intercepting and merging all
    53  // 'build.proto' streams emitted within this tree of luciexes (see "Recursive
    54  // Invocation"). The Host Application may choose what happens to these
    55  // intercepted build.proto messages.
    56  //
    57  // The Host Application MUST:
    58  //   - Run a logdog butler service and expose all relevant LOGDOG_* environment
    59  //     variables such that the following client libraries can stream log data:
    60  //   - Golang: go.chromium.org/luci/logdog/client/butlerlib/bootstrap
    61  //   - Python: infra_libs.logdog.bootstrap
    62  //   - Hook the butler to intercept and merge build.proto streams into a single
    63  //     build.proto (zlib-compressed) stream.
    64  //   - Set up a local LUCI ambient authentication service which luciexe's can
    65  //     use to mint auth tokens.
    66  //   - Prepare an empty directory which will house tempdirs and workdirs for
    67  //     all luciexe invocations. The Host Application MAY clean this directory
    68  //     up, but it may be useful to leak it for debugging. It's permissible for
    69  //     the Host Application to defer this cleanup to an external process (e.g.
    70  //     buildbucket's agent may defer this to swarming).
    71  //
    72  // The Host Application MAY hook additional streams for debugging/logging; it is
    73  // frequently convenient to hook the stderr/stdout streams from the top level
    74  // luciexe and tee them to the Host Application's stdout/stderr.
    75  //
    76  // For example: the `go.chromium.org/luci/buildbucket/cmd/agent` binary forwards
    77  // these merged build.proto messages to the Buildbucket service, and also
    78  // uploads all streams to them to the Logdog cloud service. Other host
    79  // implementations may instead choose to write all streams to disk, send them to
    80  // /dev/null or render them as html. However, from the point of view of the
    81  // luciexe that they run, this is transparent.
    82  //
    83  // Host Applications MAY implement 'backpressure' on the luciexe binaries by
    84  // throttling the rate at which the Logdog butler accepts data on its various
    85  // streams. However, doing this could introduce timing issues in the luciexe
    86  // binaries as they try to run, so this should be done thoughtfully.
    87  //
    88  // If a Host Application detects a protocol violation from a luciexe within its
    89  // purview, it SHOULD report the violation (in a manner of its choosing) and
    90  // MUST consider the entire Build status to be INFRA_FAILURE. In addition the
    91  // Host Application SHOULD attempt to kill (via process group SIGTERM/SIGKILL on
    92  // *nix, and CTRL+BREAK/Terminate on windows) the luciexe hierarchy. The host
    93  // application MAY provide a window of time between the initial "please stop"
    94  // signal and the "you die now" signal, but this comes with the usual caveats of
    95  // cleanups and deadlines (notably: best-effort clean up is just that:
    96  // best-effort. It cannot be relied on to run to completion (or to run
    97  // completely)).
    98  //
    99  // # Invocation
   100  //
   101  // When invoking a luciexe, the parent process has a couple responsibilities. It
   102  // must:
   103  //
   104  //   - Set $TEMPDIR, $TMPDIR, $TEMP, $TMP and $MAC_CHROMIUM_TMPDIR to all point
   105  //     to the same, empty directory.
   106  //     This directory MUST be located on the same file system as CWD.
   107  //     This directory MUST NOT be the same as CWD.
   108  //
   109  //   - Set $LUCI_CONTEXT["luciexe"]["cache_dir"] to a cache dir which makes sense
   110  //     for the luciexe.
   111  //     The cache dir MAY persist/be shared between luciexe invocations.
   112  //     The cache dir MAY NOT be on the same filesystem as CWD.
   113  //
   114  //   - Set the $LOGDOG_NAMESPACE to a prefix which namespaces all logdog streams
   115  //     generated from the luciexe.
   116  //
   117  //   - Set the Status of initial buildbucket.v2.Build message to `STARTED`.
   118  //
   119  //   - Set the CreateTime and StartTime of initial buildbucket.v2.Build message.
   120  //
   121  //   - Clear following fields in the initial buildbucket.v2.Build message.
   122  //
   123  //   - EndTime
   124  //
   125  //   - Output
   126  //
   127  //   - StatusDetails
   128  //
   129  //   - Steps
   130  //
   131  //   - SummaryMarkdown
   132  //
   133  //   - UpdateTime
   134  //
   135  // The CWD is up to your application. Some contexts (like Buildbucket) will
   136  // guarantee an empty CWD, but others (like recursive invocation) may explicitly
   137  // share CWD between multiple luciexe's.
   138  //
   139  // The tempdir and workdir paths SHOULD NOT be cleaned up by the invoking
   140  // process. Instead, the invoking process should defer to the Host Application
   141  // to provide this cleanup, since the Host Application may be configured to leak
   142  // these for debugging purposes.
   143  //
   144  // The invoker MUST attach the stdout/stderr to the logdog butler as text
   145  // streams. These MUST be located at `$LOGDOG_NAMESPACE/std{out,err}`.
   146  // Typical luciexe implementations will use these for debug logging and output,
   147  // but are not required to do so.
   148  //
   149  // The invoker MUST write a binary-encoded buildbucket.v2.Build to the stdin of
   150  // the luciexe which contains all the input parameters that the luciexe needs to
   151  // know to run successfully.
   152  //
   153  // # The luciexe binary
   154  //
   155  // Once running, the luciexe MUST read a binary-encoded buildbucket.v2.Build
   156  // message from stdin until EOF.
   157  //
   158  // As per the invoker's responsibility, the luciexe binary MAY expect the
   159  // status of the initial Build message to be `STARTED` and CreateTime and
   160  // StartTime are populated. It MAY also expect following fields to be empty.
   161  // It MUST NOT assume other fields in the Build message are set. However, the
   162  // Host Application or invoker MAY fill in other fields they think are useful.
   163  //
   164  //	EndTime
   165  //	Output
   166  //	StatusDetails
   167  //	Steps
   168  //	SummaryMarkdown
   169  //	Tags
   170  //	UpdateTime
   171  //
   172  // As per the Host Application's responsibilities, the luciexe binary MAY expect
   173  // the "luciexe" and "local_auth" sections of LUCI_CONTEXT to be filled. Other
   174  // sections of LUCI_CONTEXT MAY also be filled. See the LUCI_CONTEXT docs:
   175  // https://chromium.googlesource.com/infra/luci/luci-py/+/HEAD/client/LUCI_CONTEXT.md
   176  //
   177  //	!!NOTE!! The paths supplied to the luciexe MUST NOT be considered stable
   178  //	across invocations. Do not hard-code these, and try not to rely on their
   179  //	consistency (e.g. for build reproducibility).
   180  //
   181  // # The luciexe binary - Updating the Build state
   182  //
   183  // A luciexe MAY update the Build state by writing to a "build.proto" Logdog
   184  // stream named "$LOGDOG_NAMESPACE/build.proto". A "build.proto" Logdog stream
   185  // is defined as:
   186  //
   187  //	Content-Type: "application/luci+proto; message=buildbucket.v2.Build"
   188  //	Type: Datagram
   189  //
   190  // Additionally, a build.proto stream MAY append "; encoding=zlib" to the
   191  // Content-Type (and compress each message accordingly). This is useful for when
   192  // you potentially have very large builds.
   193  //
   194  // Each datagram MUST be a valid binary-encoded buildbucket.v2.Build message.
   195  // The state of the build is defined as the last Build message sent on this
   196  // stream. There's no implicit accumulation between sent Build messages.
   197  //
   198  // The Step.Log.Url field in the emitted Build messages can be either absolute
   199  // (has a valid url scheme) or relative. If an absolute `Url` is provided, the
   200  // luciexe is also responsible for supplying corresponding `ViewUrl` and the
   201  // host application won't validate or make any adjustments to those urls. This
   202  // allows luciexe to include logs from other sources into this build. If a
   203  // relative `Url` is provided, the host application will namespace the `Url`
   204  // with $LOGDOG_NAMESPACE of the build.proto stream. For example, if the host
   205  // application is parsing a Build.proto in a stream named
   206  // "logdog://host/project/prefix/+/something/build.proto", then a Log with a Url
   207  // of "hello/world/stdout" will be transformed into:
   208  //
   209  //	Url:     logdog://host/project/prefix/+/something/hello/world/stdout
   210  //	ViewUrl: <implementation defined>
   211  //
   212  // The `ViewUrl` field in this case SHOULD be left empty, and will be filled in
   213  // by the host application running the luciexe (if supplied it will be
   214  // overwritten).
   215  //
   216  // The following Build fields will be read from the luciexe-controlled
   217  // build.proto stream:
   218  //
   219  //	EndTime
   220  //	Output
   221  //	Status
   222  //	StatusDetails
   223  //	Steps
   224  //	SummaryMarkdown
   225  //	Tags
   226  //	UpdateTime
   227  //
   228  // # The luciexe binary - Reporting final status
   229  //
   230  // A luciexe MUST report its success/failure by sending a Build message with
   231  // a terminal `status` value before exiting. If the luciexe exits before sending
   232  // a Build message with a terminal Status, the invoking application MUST
   233  // interpret this as an INFRA_FAILURE status. The exit code of the luciexe
   234  // SHOULD be ignored, except for advisory (logging/reporting) purposes. The host
   235  // application MUST detect this case and fill in a final status of
   236  // INFRA_FAILURE, but MUST NOT terminate the process hierarchy in this case.
   237  //
   238  // # Recursive Invocation
   239  //
   240  // To support recursive invocation, a luciexe MUST accept the flag:
   241  //
   242  //	--output=path/to/file.{pb,json,textpb}
   243  //
   244  // The value of this flag MUST be an absolute path to a non-existent file in
   245  // an existing directory. The extension of the file dictates the data format
   246  // (binary, json or text protobuf). The luciexe MUST write it's final Build
   247  // message to this file in the correct format. If `--output` is specified,
   248  // but no Build message (or an invalid/improperly formatted Build message)
   249  // is written, the caller MUST interpret this as an INFRA_FAILURE status.
   250  //
   251  //	NOTE: JSON outputs SHOULD be written with the original proto field names,
   252  //	not the lowerCamelCase names; downstream users may not be using jsonpb
   253  //	unmarshallers to interpret the JSON data.
   254  //
   255  //	This may need to be revised in a subsequent version of this API
   256  //	specification.
   257  //
   258  // LUCI Executables MAY invoke other LUCI Executables as sub-steps and have the
   259  // Steps from the child luciexe show in the parent's Build updates. This is one
   260  // of the responsibilities of the Host Application.
   261  //
   262  // The parent can achieve this by recording a Step S (with no children), and
   263  // a Step.MergeBuild.from_logdog_stream which points to a "build.proto" stream
   264  // (see "Updating the Build State"). This is called a "Merge Step", and is
   265  // a directive for the host to merge Build messages from in the
   266  // from_logdog_stream log here. Note that this stream name should follow the
   267  // same provisions specified for "Step.Log.Url". It's the invoker's
   268  // responsibility to populate $LOGDOG_NAMESPACE with the new, full, namespace.
   269  // Failure to do this will result in missing logs/step data.
   270  //
   271  // (SIDENOTE: There's an internal proposal to improve the Logdog "butler" API so
   272  // that application code only needs to handle the relative namespaces as well,
   273  // which would make this much less confusing).
   274  //
   275  // The Host Application MUST append all steps from the child build.proto
   276  // stream to the parent build as substeps of step S and copy the following
   277  // fields of the child Build to the equivalent fields of step S *only if*
   278  // step S has *non-final* status. It is the caller's responsibility to populate
   279  // rest of the fields of step S if the caller explicitly marks the step
   280  // status as final.
   281  //
   282  //	SummaryMarkdown
   283  //	Status
   284  //	EndTime
   285  //	Output.Logs (appended)
   286  //
   287  // This rule applies recursively, i.e. the child build MAY have Merge Step(s).
   288  //
   289  // Each luciexe's step names should be emitted as relative names. e.g. say
   290  // a build runs a sub-luciexe with the name "a|b". This sub-luciexe then runs
   291  // a step "x|y|z". The top level build.proto stream will show the step
   292  // "a|b|x|y|z".
   293  //
   294  // The graph of datagram streams MUST be a tree. This follows from the
   295  // namespacing rules of the Log.Url fields; Since Log.Url fields are relative to
   296  // their build's namespace, it's only possible to have a merge step point
   297  // further 'down' the tree, making it impossible to create a cycle.
   298  //
   299  // # Recursive Invocation - Legacy ChromeOS style
   300  //
   301  // Note that there's an option in MergeBuild called 'legacy_global_namespace'.
   302  // This mode is NOT RECOMMENDED, and was only added to support legacy ChromeOS
   303  // builders which relied on this functionality prior to luciexe.
   304  //
   305  // When this option is turned on, output properties on the build stream will be
   306  // merged into the top-level build containing the Merge Step, and it will also
   307  // de-namespace the step names (i.e. a MergeStep called "Parent" whose
   308  // build.proto stream contains a name "Step" would normally show on the build as
   309  // "Parent|Step", will instead simply show as "Step" in this mode. This also
   310  // means that these merge step processes must make sure not to emit a step which
   311  // duplicates the name of one emitted in the parent build (otherwise the overall
   312  // build will be invalid)).
   313  //
   314  // Using this functionality has the downside that properties emitted by the
   315  // parent build WILL BE OVERWRITTEN by the merged build stream, if written to
   316  // the same top-level property keys. In ChromeOS's case this is "OK" because the
   317  // legacy recipes do a short bootstrap followed by the main payload which does
   318  // all the actual stuff.
   319  //
   320  // There's a version of this feature which would be more-supportable, which
   321  // would be to introduce namespaced outputs so that the top-level build could
   322  // reflect the outputs from a given step (i.e. allow Step to actually have its
   323  // own Output message). If you feel like you need this functionality, please
   324  // contact ChOps Foundation team to discuss, rather than trying to use the
   325  // legacy_global_namespace option.
   326  //
   327  // # Related Libraries
   328  //
   329  // For implementation-level details, please refer to the following:
   330  //
   331  //   - "go.chromium.org/luci/luciexe" - low-level protocol details (this module).
   332  //   - "go.chromium.org/luci/luciexe/host" - the Host Application library.
   333  //   - "go.chromium.org/luci/luciexe/invoke" - luciexe invocation.
   334  //   - "go.chromium.org/luci/luciexe/exe" - luciexe binary helper library.
   335  //
   336  // # Other Client Implementations
   337  //
   338  // Python Recipes (https://chromium.googlesource.com/infra/luci/recipes-py)
   339  // implement the LUCI Executable protocol using the "luciexe" subcommand.
   340  //
   341  //	TODO(iannucci): Implement a luciexe binary helper in `infra_libs` analogous
   342  //	to go.chromium.org/luci/luciexe/exe and implement Recipes' support in terms
   343  //	of this.
   344  //
   345  // # LUCI Executables on Buildbucket
   346  //
   347  // Buildbucket accepts LUCI Executables as CIPD packages containing the
   348  // luciexe to run with the fixed name of "luciexe".
   349  //
   350  // On Windows, this may be named "luciexe.exe" or "luciexe.bat". Buildbucket
   351  // will prefer the first of these which it finds.
   352  //
   353  // Note that the `recipes.py bundle` command generates a "luciexe" wrapper
   354  // script for compatibility with Buildbucket.
   355  package luciexe