github.com/nya3jp/tast@v0.0.0-20230601000426-85c8e4d83a9b/docs/ARCHITECTURE.md (about)

     1  # Tast Architecture Guide (go/tast-architecture-guide)
     2  
     3  This document describes high-level architecture of Tast framework, and provides
     4  guidance for future enhancements to the framework.
     5  
     6  [TOC]
     7  
     8  ## Introduction
     9  
    10  Tast framework feature development is mostly about designing concepts. That is
    11  because everything else in the framework, including APIs and internal
    12  implementations, are all designed based on abstract concepts we define.
    13  Well-designed concepts give us simple API users can easily understand, and
    14  maintainable internal implementations. Bad concepts lead to user confusion and
    15  maintenance nightmare.
    16  
    17  This document was written to help you understand the current architecture of
    18  Tast, and design new framework features.
    19  
    20  This document first explains Tast's overall architecture and important existing
    21  concepts. Next, it provides high-level guidance for future enhancements, citing
    22  many examples of good/bad decisions we have made in the past. Finally, it
    23  mentions several best practices we learned from framework development.
    24  
    25  ## Background
    26  
    27  ### Remote end-to-end testing
    28  
    29  Tast is a remote end-to-end testing framework, primarily targeting ChromeOS.
    30  
    31  There are two important aspects of Tast here: **end-to-end **and** remote**.
    32  - **End-to-end**: Tast runs tests against a complete **target product**. Tests
    33    run in a Linux process independent from the target product, so they interact
    34    with the target product by simulating user inputs (e.g. generating keyboard
    35    events), calling into test APIs provided by the target product (e.g. Chrome
    36    DevTools protocol), etc.
    37  - **Remote**: Tast involves two types of machines: a **host system** and one or
    38    more **target systems**. Tast tests are initiated from a host system, and
    39    exercise target products running on target systems remotely. Tast requires
    40    that target systems are reachable via SSH. Tast tests may use other extra
    41    means to interact with target systems, for example peripherals attached to
    42    a target device physically.
    43  
    44  ![Remote testing](images/remote-testing.png)
    45  
    46  ### Two types of Tast users
    47  
    48  Tast has two types of users:
    49  
    50  - **Test authors** who use Tast to write test scenarios. Test authors include
    51    not only authors of individual tests but also authors of support libraries
    52    used by multiple tests. Tast provides **Go APIs** to test authors which allows
    53    them to register their tests to the framework and access resources needed to
    54    perform test scenarios etc.
    55  - **Test requesters** who use Tast to run test scenarios. Continuous integration
    56    systems configured to run Tast tests automatically are the most significant
    57    test requesters. Also, Test authors are considered test requesters since they
    58    need to run work-in-progress tests to ensure they're correct. Tast provides
    59    a **CLI command** to test requesters which allows them to run Tast tests and
    60    consume their results.
    61  
    62  Tast stands between these two types of users. It is important to know that they
    63  have different, or sometimes even conflicting, needs to Tast.
    64  
    65  ## Current architecture
    66  
    67  This chapter describes the architecture of Tast framework as of writing.
    68  
    69  ### High-level structure
    70  
    71  At a high level, Tast-related components can be largely categorized into two:
    72  **framework** and **user code**.
    73  
    74  - **Framework** is the engine that executes tests defined in user code.
    75    Framework code resides in the [chromiumos/platform/tast] repository.
    76    Test authors rarely make changes to the framework. This document primarily
    77    discusses the design of the framework.
    78  - **User code** is a bunch of code written by test authors. User code resides
    79    mainly in the [chromiumos/platform/tast-tests] repository, but there are
    80    several other repositories such as [chromeos/platform/tast-tests-private].
    81  
    82  User code can be further categorized into two subcategories:
    83  
    84  - **Support libraries** are a collection of common libraries used by tests.
    85  - **Tests** are actual test cases written by test authors.
    86  
    87  The framework provides following APIs to users:
    88  
    89  - Test authors: **Go APIs** to interact with the framework, including:
    90      - Registering entities (e.g. tests) to the framework
    91      - Defining a test bundle
    92      - Some basic libraries shared with the framework (e.g. SSH)
    93  - Test requesters: **CLI command ("Tast CLI")** to work with tests, providing:
    94      - Command line flags and parameters to specify execution configuration
    95      - Stable protocols to report test results
    96  
    97  The next diagram illustrates the relationship of those layers.
    98  
    99  ![Tast layers](images/component-stack.png)
   100  
   101  [chromiumos/platform/tast]: https://chromium.googlesource.com/chromiumos/platform/tast
   102  [chromiumos/platform/tast-tests]:https://chromium.googlesource.com/chromiumos/platform/tast-tests
   103  [chromeos/platform/tast-tests-private]: https://chrome-internal.googlesource.com/chromeos/platform/tast-tests-private
   104  
   105  ### Concepts
   106  
   107  #### Tests
   108  
   109  A **test** is a unit of test scenarios defined by test authors.
   110  
   111  A test is defined in a .go file. We call such files as **test files**. A test
   112  file must define these two functions:
   113  
   114  1. **Test registration**: An init() function that registers a test to the
   115     framework on initialization.
   116  2. **Test function**: An exported function that implements a test scenario.
   117  
   118  Here is a complete example of a test file defining a no-op test:
   119  
   120  ```go
   121  // File: src/go.chromium.org/tast-tests/cros/local/bundles/cros/example/pass.go
   122  
   123  package example
   124  
   125  import (
   126      "context"
   127  
   128      "go.chromium.org/tast/core/testing"
   129  )
   130  
   131  func init() {
   132      testing.AddTest(&testing.Test{
   133          Func:     Pass,
   134          Desc:     "Always passes",
   135          Contacts: []string{"nya@chromium.org", "tast-owners@google.com"},
   136          Attr:     []string{"group:mainline"},
   137      })
   138  }
   139  
   140  func Pass(ctx context.Context, s *testing.State) {}
   141  ```
   142  
   143  **Test metadata** is represented by a testing.Test struct passed to
   144  testing.AddTest on registration. Test metadata includes, but not limited to,
   145  following fields:
   146  
   147  - Func: A test function
   148  - Desc: Human-readable description of a test
   149  - Contacts: Contact emails
   150  - Attr: Attributes assigned to a test
   151  - Data: Data files needed by a test
   152  - SoftwareDeps/HardwareDeps: Dependencies required to run a test
   153  - VarDeps: Runtime variables needed by a test
   154  - ServiceDeps: Services needed by a test
   155  
   156  Note that test names are not included in test metadata. A test name is
   157  automatically derived by joining a package name and a test function name with a
   158  period. In the example above, the name of the test is "example.Pass".
   159  
   160  When a test requester executes Tast CLI to run a test, the framework calls its
   161  test function, passing a context.Context and a testing.State, with which it can
   162  access resources needed for test scenario execution.
   163  
   164  One of the most important things a test function does is to **report test
   165  errors**. A test is considered **failed** if it reports one or more errors.
   166  Otherwise, a test is considered **passed**. Once the framework starts a test,
   167  its result is either passed or failed. If a test cannot be run on certain
   168  conditions, it should describe the constraints as software/hardware dependencies
   169  so that the framework skips it without executing it.
   170  
   171  A test can save output files for post-run inspection. testing.State.OutDir and
   172  testing.ContextOutDir returns a directory where a test should place output
   173  files.
   174  
   175  *** note
   176  **Note**: Additional restrictions on test files
   177  
   178  For better consistency and readability, we have a lint checker which enforces
   179  various rules on how to define a test. Here are some notable rules:
   180  
   181  - A test file must define exactly one test. It is prohibited to define two or
   182    more tests in a file.
   183  - A test file name must match with the base name of a test. For example, a test
   184    named "pkg.FooBar" must be defined in pkg/foo_bar.go.
   185  - A test file must not define any exported symbols but a test function. It can
   186    still optionally define other unexported symbols (constants, variables,
   187    functions...) that are used by the test.
   188  
   189  These rules are designed to make it very easy to find a test file from a test name.
   190  ***
   191  
   192  #### Local/remote
   193  
   194  There are two types of tests: **local tests** and **remote tests**. Local tests
   195  are executed in a process running on the target system, while remote tests are
   196  executed in a process running on the host system.
   197  
   198  Unless remote test functionalities are needed, a test is better to be written as
   199  a local test. Local tests are much easier to interact with the target system as
   200  it gets direct access to the resources on the target system (e.g. local file
   201  system, network sockets, system calls).
   202  
   203  There are several cases where remote tests are needed. One of the most popular
   204  cases is a test rebooting the target system. Such a test cannot be written as a
   205  local test since a reboot would terminate a testing process itself. Other
   206  possible cases where remote tests are needed are: a test temporarily detaching
   207  the target system from the network, a test controlling the target system via
   208  peripherals attached to it, or a test interacting with multiple target systems.
   209  
   210  Fixtures can also be local and remote. See [Fixtures](#fixtures) for details.
   211  
   212  #### Services
   213  
   214  A **service** is a user-defined gRPC service that can be run on the target
   215  system to be called from remote tests and fixtures.
   216  
   217  Remote tests have access to the target device via SSH, with which they can
   218  interact with the target device theoretically. However it is only capable of
   219  running external commands on the target system, so it's not enough to perform
   220  complicated test scenarios. Instead users can implement services and call them
   221  from remote tests by gRPC. Then remote tests can call into support libraries
   222  built for local tests.
   223  
   224  Services are registered to the framework in a very similar way as tests.
   225  A service is defined in a .go file called **a service file**, containing the
   226  following two symbols:
   227  
   228  1. **Service registration**: An init() function that registers a service to the
   229     framework on initialization.
   230  2. **Service implementation**: An exported type that implements a gRPC service.
   231  
   232  Here is an example of a service file:
   233  
   234  ```go
   235  // File: src/go.chromium.org/tast-tests/cros/local/bundles/cros/example/foo_service.go
   236  
   237  func init() {
   238      testing.AddService(&testing.Service{
   239      Register: func(srv *grpc.Server, s *testing.ServiceState) {
   240          example.RegisterFooServiceServer(srv, &FooService{s: s})
   241          },
   242      })
   243  }
   244  
   245  type FooService struct {
   246      s *testing.ServiceState
   247  }
   248  
   249  func (s *FooService) Bar(ctx context.Context, req *example.BarRequest) (*example.BarResponse, error) {
   250      ...
   251  }
   252  ```
   253  
   254  **Service metadata** is represented by a testing.Service struct passed to
   255  testing.AddService on registration.
   256  
   257  Service methods have access to several features similar to tests. For example,
   258  they can call testing.ContextLog to emit logs, and testing.ContextOutDir to save
   259  output files. Those functions behave as if they're called in the remote test
   260  calling into the current gRPC method.
   261  
   262  Users have to declare in remote test metadata which services a remote test may
   263  call into. This is required to build [entity graphs](#entities) before
   264  execution.
   265  
   266  #### Fixtures
   267  
   268  A **fixture** sets up and maintains an **environment** to be shared by tests and
   269  other fixtures.
   270  
   271  An environment is an abstract term referring to a state of the target/host
   272  system. Some possible environments a fixture may set up are, for example:
   273  
   274  - The target ChromeOS device is in the login screen
   275  - The target ChromeOS device is logged into a user session
   276  - The target ChromeOS device is logged into a user session, and Crostini is
   277    enabled
   278  - The target ChromeOS device is enrolled into an enterprise policy
   279  
   280  Fixtures are registered to the framework in a very similar way as tests and
   281  services. Fixture registration is done by an init function calling
   282  testing.AddFixture with a testing.Fixture struct, representing fixture metadata.
   283  
   284  Here is an example minimum fixture definition:
   285  
   286  ```go
   287  func init() {
   288      testing.AddFixture(&testing.Fixture{
   289          Name: "someFixture",
   290          Impl: &someFixture{},
   291      })
   292  }
   293  
   294  type someFixture struct{}
   295  
   296  func (*someFixture) SetUp(ctx context.Context, s *testing.FixtState) interface{} { return nil }
   297  func (*someFixture) TearDown(ctx context.Context, s *testing.FixtState) {}
   298  func (*someFixture) PreTest(ctx context.Context, s *testing.FixtTestState) {}
   299  func (*someFixture) PostTest(ctx context.Context, s *testing.FixtTestState) {}
   300  func (*someFixture) Reset(ctx context.Context) error { return nil }
   301  ```
   302  
   303  A test can optionally **depend on** a fixture by declaring a dependency in its
   304  metadata. If a test depends on a fixture, the fixture is used to set up a
   305  desired environment before the test starts.
   306  
   307  To let a fixture provide a consistent environment to tests, the framework calls
   308  into a fixture's various **lifecycle methods**. There are 5 lifecycle methods:
   309  
   310  - SetUp
   311  - TearDown
   312  - PreTest
   313  - PostTest
   314  - Reset
   315  
   316  SetUp/TearDown are called when a fixture needs to set up / tear down an
   317  environment. PreTest/PostTest are called before/after a test depending on the
   318  fixture runs.
   319  
   320  SetUp may return a **fixture value**, an arbitrary value that is made available
   321  to its dependants. A fixture value value is typically used to pass in-memory
   322  objects and/or information related to the environment the fixture has set up.
   323  
   324  **Reset** is a unique lifecycle method called between tests depending on the
   325  fixture. In Reset, a fixture should perform a light-weight reset of the current
   326  environment to one acceptable by the fixture. If it fails to do so, it should
   327  return an error, which in turn causes the framework to tear down the fixture and
   328  set it up again before the next test. If Reset succeeds, the framework proceeds
   329  to run the next test without tearing down the fixture. This lifecycle event
   330  allows fixtures to efficiently recover from side effects tests left to the
   331  environment.
   332  
   333  For example, let us think of a fixture that provides an environment where
   334  "logged into a Chrome user session and all windows are closed". This fixture's
   335  lifecycle methods can be implemented in the following way:
   336  
   337  - SetUp: Restart UI and log into a new Chrome user session, and return a Chrome
   338    connection object as a fixture value
   339  - TearDown: Logout from a session and close the connection object
   340  - PreTest/PostTest: Do nothing
   341  - Reset: Check that the Chrome process is intact, and close all open windows
   342  
   343  A fixture is useful for multiple tests to share an environment whose set up is
   344  costly. For example, let us think of 10 tests needing to run test scenarios in
   345  a Chrome user session. Without fixtures, each test needs to perform a login to
   346  a new user session at their beginning since they don't know the current state of
   347  the target system when they start. This is not only inefficient, but also can
   348  elevate the risk of test flakiness as they repeat the same login operations.
   349  This problem can be solved by introducing a fixture that logs into a new user
   350  session, and letting 10 tests depend on the fixture. Then, when one or more
   351  tests in the 10 tests are requested to run, the fixture is executed in advance
   352  to log into a new user session, and tests run their test scenarios without
   353  needing to repeat logins.
   354  
   355  So far we explained the most basic use of fixtures. But fixtures are a powerful
   356  mechanism with the following features:
   357  
   358  - A fixture can be local or remote. Local fixtures run on the target system,
   359    while remote fixtures run on the host system.
   360  - Fixtures are **composable**: a fixture can also optionally depend on another
   361    fixture. A fixture cannot depend on itself directly or indirectly.
   362  - Furthermore, **local tests/fixtures can depend on remote fixtures**. This
   363    allows writing local tests that interact with the target system remotely.
   364  
   365  See the design doc of fixtures for more information.
   366  
   367  ### Preconditions (deprecated)
   368  
   369  Preconditions are a predecessor of fixtures. Preconditions tried to solve the
   370  same problem in a limited way; they are not composable and have leaky boundaries
   371  with tests.
   372  
   373  ### Entities
   374  
   375  An **entity** is a collective term of items registered to the framework with
   376  metadata on initialization, and called back by the framework as needed. Today,
   377  **tests, fixtures, and services** are entities.
   378  
   379  An entity can declare dependencies to other entities in its metadata. The
   380  diagram below indicates which entity can depend on which entity, and which
   381  metadata field declares them.
   382  
   383  ![Entity dependencies](images/entity-dependencies.png)
   384  
   385  When a test/fixture does not depend on a fixture explicitly, the framework
   386  treats it internally as implicitly depending on the **virtual root fixture**.
   387  
   388  An **entity graph** is a graph having tests and fixtures as nodes and fixture
   389  dependencies as edges. **An entity graph forms a directed tree** whose root is
   390  the virtual root fixture. The below diagram illustrates an example entity graph.
   391  
   392  ![Entity graph](images/entity-graph.png)
   393  
   394  The most important property of an entity graph is that it can be statically
   395  computed from entity metadata. This property allows the framework to compute all
   396  entities relevant to tests requested to run before actually executing them by
   397  traversing an entity graph from test nodes.
   398  
   399  *** note
   400  **Note**: Extended entity graph
   401  
   402  Entity graphs do not contain services. We can define an extended entity graph
   403  containing tests, fixtures and services. An extended entity graph is not a tree
   404  but a directed acyclic graph (DAG).
   405  ***
   406  
   407  #### Test bundles
   408  
   409  A **test bundle** is a Go executable file built by linking user-defined entities
   410  and their dependencies.
   411  
   412  A test bundle can be local or remote. Local test bundles should link local
   413  entities only, and vice versa. A local test bundle and a remote test bundle with
   414  the same name are grouped; entities in the same group may interact, e.g. a local
   415  test depending on a remote fixture, or a remote test depending on a service.
   416  
   417  A test bundle's main.go is typically a small file that anonymously imports
   418  packages where entities are defined, and defines a main function that calls into
   419  a framework entry point function. Below is an example main file of a local test
   420  bundle:
   421  
   422  ```go
   423  package main
   424  
   425  import (
   426      "os"
   427  
   428      "go.chromium.org/tast/core/bundle"
   429  
   430      // Underscore-imported packages register their tests via init functions.
   431      _ "go.chromium.org/tast-tests/cros/local/bundles/cros/apps"
   432      _ "go.chromium.org/tast-tests/cros/local/bundles/cros/arc"
   433      ...
   434  )
   435  
   436  func main() {
   437      os.Exit(bundle.LocalDefault(bundle.Delegate{}))
   438  }
   439  ```
   440  
   441  bundle.LocalDefault/RemoteDefault accepts a bundle.Delegate struct which
   442  specifies various hooks to be called by the framework. A run hook is called
   443  before/after a test bundle is executed. A test hook is called before/after
   444  a test is executed.
   445  
   446  There are a few reasons to create a new test bundle. The first and foremost one
   447  is ACL: if you want to make several tests public while keeping other tests
   448  private, you need to create two test bundles, one for public tests and the other
   449  one for private tests, so that external users who cannot check out private
   450  source code can still build the public test bundle. Also, it would be useful to
   451  create a new test bundle for a new target system (e.g. non ChromeOS target
   452  systems) since it can install a different set of hooks.
   453  
   454  As of writing, we have only two test bundles today: "cros" for public ChromeOS
   455  tests and "crosint" for private ChromeOS tests. Since the two test bundles
   456  share the same set of bundle.Delegate parameters, their main functions call into
   457  the bundlemain support package, which in turn calls into
   458  bundle.LocalDefault/RemoteDefault, to avoid duplication.
   459  
   460  ### Executables
   461  
   462  Tast test execution involves three types of executables:
   463  
   464  - **Tast CLI** a.k.a "tast" command. This is an executable installed to the host
   465    system in prior. Test requesters run this command, and it communicates with
   466    other executables to run tests. In local development environment, Tast CLI
   467    also invokes Go toolchains to build other executables (aka -build=true mode).
   468  - **Local/remote test runner**. There are exactly two executables:
   469    "local_test_runner" installed onto the target system, and "remote_test_runner"
   470    installed onto the host system. They are built from solely the framework code
   471    and don't include any user-defined code. Tast CLI calls them to perform
   472    operations not specific to test bundles, and to run test bundles.
   473  - **Local/remote test bundles**. As described above, they are executable
   474    containing user-defined entities.
   475  
   476  ![Executables](images/executables.png)
   477  
   478  ## Guidance for future enhancements
   479  
   480  This chapter gives guidance on framework enhancements in the future. We start
   481  from higher-level principles and then go down to more detailed best practices.
   482  
   483  ### Key design principle
   484  
   485  There are many design general principles for software design, and most of them
   486  are useful for Tast framework design. That said, one of the most important
   487  design principles I found useful specific to Tast is:
   488  
   489  **A good framework provides a small number of orthogonal features that cover
   490  a large number of use cases.**
   491  
   492  It is obvious that covering more use cases is better. On the other hand, it is
   493  good to minimize the number of features because, the more features the framework
   494  provides, the more complexity it gets due to interaction between the features.
   495  
   496  ### Considerations on designing a new feature
   497  
   498  #### Do you really need the feature in the framework?
   499  
   500  On evaluating a feature request, first ask yourself if you really need it in the
   501  framework.
   502  
   503  As described in the key design principle, we want to minimize the number of
   504  features the framework provides. It's best if we could support use cases without
   505  adding new features to the framework. Check if the feature can be implemented in
   506  support libraries or with existing framework features. If we really need
   507  a feature in the framework, do your best to design it to cover as many use cases
   508  as possible.
   509  
   510  When a proposed feature is useful only for certain use cases, it may mean that
   511  the design is too specific to those use cases. In such cases, it often helps to
   512  punt the feature until we learn more use cases and better generalize
   513  requirements. If feature requests are high priority, consider implementing the
   514  feature in support libraries, even if they look unclean and/or end up in more
   515  boilerplates.
   516  
   517  *** aside
   518  **Example**: Faillog
   519  
   520  Tast has a mechanism called faillog to capture logs such as screenshots on test
   521  failures. We initially implemented faillog as a support library
   522  ([crbug.com/856540](https://crbug.com/856540)) since we were not sure if the
   523  feature is useful for all tests. Faillog as a support library was not optimal
   524  as tests interested in faillog should have been modified slightly to opt-in.
   525  After some experiments, faillog turned out to be useful for most tests, so we
   526  merged the mechanism to the framework
   527  ([crbug.com/882729](https://crbug.com/882729)).
   528  ***
   529  
   530  *** aside
   531  **Example**: Screenshot tests
   532  
   533  A proposal to extend the Tast control protocol was made for screenshot tests
   534  ([crrev.com/c/2422101](https://crrev.com/c/2422101)). After checking the
   535  requirements, it turned out that they just wanted to run executables available
   536  only on the host, so writing remote tests was sufficient.
   537  ***
   538  
   539  *** aside
   540  **Example**: -skipsort for MTBF tests
   541  
   542  A proposal was made by MTBF test authors to add a new flag -skipsort to Tast CLI
   543  ([crrev.com/c/2429242](https://crrev.com/c/2429242)). The flag was meant to
   544  disable Tast's internal test reordering and run tests in the exact order as
   545  specified in command line arguments.
   546  
   547  Supporting this feature was technically possible. However, there were no other
   548  use cases needing this feature, and also the feature was expected to introduce
   549  a lot of complexity to the framework. After discussion with relevant teams, we
   550  agreed not to implement this feature.
   551  ***
   552  
   553  *** aside
   554  **Example**: Uploading crash dumps
   555  
   556  A proposal was made to upload crash dumps generated during tests to Google
   557  servers automatically ([crrev.com/c/2337754](https://crrev.com/c/2337754)).
   558  The approach had a privacy implication since Tast has many users outside of
   559  Google. In the end, the feature was implemented in the ChromeOS testing
   560  infrastructure.
   561  ***
   562  
   563  #### Interaction with other features
   564  
   565  Think carefully how a new feature interacts with other existing features.
   566  
   567  Enumerating interactions with existing features is a difficult task as you need
   568  understanding of all existing features in the framework. If you're unsure, you
   569  may want to try creating a proof-of-concept implementation of the feature, which
   570  can uncover some interactions you couldn't imagine in advance.
   571  
   572  *** aside
   573  **Example**: ContextSoftwareDeps
   574  
   575  testing.ContextSoftwareDeps is a function that returns a list of software
   576  dependencies declared by the current test. This function was introduced to
   577  ensure in certain support libraries that a calling test declares correct
   578  software dependencies. An example is that chrome.New calls this function to
   579  ensure the current test declares the "chrome" software dependency
   580  ([crbug.com/954435](https://crbug.com/954435)).
   581  
   582  Introduction of fixtures made the function less useful since there is no
   583  "current test" when executing fixtures. The function is planned to be deleted
   584  ([crbug.com/1135996](https://crbug.com/1135996)).
   585  
   586  As you see from this example, you should be careful when a feature works with
   587  "the current test".
   588  ***
   589  
   590  *** aside
   591  **Example**: Direct test execution with local_test_runner
   592  
   593  Usually Tast tests are initiated by Tast CLI installed on the host system.
   594  However, local_test_runner installed on the target system can be directly
   595  executed by test requesters to run local tests directly. This feature was
   596  implemented in the very early days of Tast.
   597  
   598  Currently direct test execution with local_test_runner is deprecated since we
   599  got several features that cannot be supported without a host system. For
   600  example, local tests directly executed by local_test_runner cannot access secret
   601  runtime variables as they're only installed on the host system. Also,
   602  local_test_runner cannot execute local tests depending on remote fixtures.
   603  ***
   604  
   605  #### Beware of versioning boundaries
   606  
   607  Many CI systems deploy Tast for end-to-end testing, including ChromeOS, Chrome,
   608  Android, Google3, and several other CI systems outside of Google. This means
   609  that it is very difficult to make changes to the protocol between Tast and CI
   610  systems, e.g. adding/removing/changing Tast CLI flags or changing test result
   611  directory structure, since you cannot make atomic commits to Tast and all those
   612  CI systems.
   613  
   614  In general, we should be extremely careful about designing a new Tast CLI
   615  feature for test requesters since it is difficult to make breaking changes.
   616  As for Go APIs for test authors, we can be less strict as we can make atomic
   617  commits to the framework and user code as of writing. However, once we start
   618  having Tast tests outside of ChromeOS repositories, Go API stability will
   619  become important.
   620  
   621  *** aside
   622  **Example**: Introducing group:mainline
   623  
   624  In the early days of Tast, we had only three classifications of a test:
   625  critical, informational, and disabled. The classification rule was simple:
   626  a test is,
   627  - disabled if it has the "disabled" attribute,
   628  - informational if it has the "informational" attribute,
   629  - critical otherwise.
   630  
   631  After introducing non-functional tests (e.g. performance tests), we introduced
   632  test group attributes. In the new rule, a test needed the "group:mainline"
   633  attribute to be considered as critical/informational. To disable a test, simply
   634  the "group:mainline" attribute could be removed.
   635  
   636  Migration from the old rule to the new rule turned out to be very painful
   637  because those rules have been hard-coded to several CI systems (ChromeOS,
   638  Chrome, Android at that time) as attribute expressions. Therefore we needed to
   639  do step-by-step migration as described in
   640  [go/tast-mainline-attr-transition](https://goto.google.com/tast-mainline-attr-transition).
   641  ***
   642  
   643  *** aside
   644  **Example**: Test selection by software dependencies
   645  
   646  We had a bug where ARC-related tests were run in an unexpected way
   647  ([crbug.com/992303](https://crbug.com/992303)). The root cause was that
   648  ARC-related software dependency names were renamed
   649  (e.g. "android" -> "android_p") while we continued to use the old software
   650  dependency name to select ARC-related tests ("dep:android").
   651  
   652  A lesson learned is that user-defined test metadata should not be directly
   653  referenceable in attribute expressions. This is a problem we have to resolve
   654  in the future.
   655  ***
   656  
   657  *** aside
   658  **Example**: Using new Tast CLI flags
   659  
   660  We had a bug that Tast CLI fails to run because of unsupported flags on release
   661  branches ([b/191779650](https://issuetracker.google.com/issues/191779650)).
   662  It was because a new flag was added to Tast CLI but ChromeOS CI used an
   663  unbranched config to specify a list of flags to pass to Tast CLI.
   664  
   665  We think that this is a design bug in ChromeOS CI configuration: unbranched CI
   666  configs should not construct Tast CLI flags that can change per branch. We
   667  expect that this problem is solved in the future.
   668  ***
   669  
   670  #### Beware of ChromeOS specific logic
   671  
   672  Tast framework should focus on being a general remote testing framework, and
   673  should be agnostic to the target/host system type.
   674  
   675  Tast started as a testing framework for ChromeOS, so naturally it has several
   676  hard-coded logic that assume that the target system is ChromeOS and the host
   677  system is ChromeOS chroot. But we expect that Tast will be used outside of
   678  ChromeOS in near future. Therefore it is good to avoid introducing new
   679  ChromeOS specific logic to the framework, and remove existing ChromeOS
   680  specific logic from the framework.
   681  
   682  If you need ChromeOS specific logic, consider if you can put them in test
   683  bundles or CI systems. If it's impossible, introduce a proper boundary between
   684  the new ChromeOS specific logic and existing OS agnostic logic.
   685  
   686  *** aside
   687  **Example**: Test hooks
   688  
   689  A proposal was made to the framework to run the auditctl command between tests
   690  for debugging certain failures
   691  ([crrev.com/c/2513678](https://crrev.com/c/2513678)). Since this logic was
   692  specific to ChromeOS, we introduced test hooks to test bundles and asked the
   693  author to put the logic there.
   694  ***
   695  
   696  *** aside
   697  **Example**: ChromeOS infra specific APIs
   698  
   699  For next-gen ChromeOS infra support, we added to the framework the logic to
   700  resolve the target hostname and port with ChromeOS infra specific APIs. This
   701  design turned out bad, and we're moving the logic out of the framework.
   702  ***
   703  
   704  *** aside
   705  **Example**: Downloading external data files
   706  
   707  In ChromeOS lab, when downloading data files from Google Cloud Storage, test
   708  frameworks (not limited to Tast) are supposed to use Devservers, which act as
   709  a sort of caching proxy server to Google Cloud Storage with private credentials.
   710  Tast uses Devservers to download external data files needed by tests.
   711  
   712  Test frameworks and Devservers use non-standard REST APIs to communicate.
   713  Today, many non-ChromeOS infra run Tast tests, but it is only ChromeOS infra
   714  that provides Devservers to Tast to allow downloading ACL'ed external data
   715  files.
   716  
   717  In near future we should replace Devserver protocol support in Tast.
   718  ***