github.com/MetalBlockchain/metalgo@v1.11.9/tests/fixture/tmpnet/README.md

github.com/MetalBlockchain/metalgo@v1.11.9/tests/fixture/tmpnet/README.md (about)

     1  # tmpnet - temporary network orchestration
     2  
     3  This package implements a simple orchestrator for the avalanchego
     4  nodes of a temporary network. Configuration is stored on disk, and
     5  nodes run as independent processes whose process details are also
     6  written to disk. Using the filesystem to store configuration and
     7  process details allows for the `tmpnetctl` cli and e2e test fixture to
     8  orchestrate the same temporary networks without the use of an rpc daemon.
     9  
    10  ## What's in a name?
    11  
    12  The name of this package was originally `testnet` and its cli was
    13  `testnetctl`. This name was chosen in ignorance that `testnet`
    14  commonly refers to a persistent blockchain network used for testing.
    15  
    16  To avoid confusion, the name was changed to `tmpnet` and its cli
    17  `tmpnetctl`. `tmpnet` is short for `temporary network` since the
    18  networks it deploys are likely to live for a limited duration in
    19  support of the development and testing of avalanchego and its related
    20  repositories.
    21  
    22  ## Package details
    23  
    24  The functionality in this package is grouped by logical purpose into
    25  the following non-test files:
    26  
    27  | Filename          | Types       | Purpose                                        |
    28  |:------------------|:------------|:-----------------------------------------------|
    29  | defaults.go       |             | Defines common default configuration           |
    30  | flags.go          | FlagsMap    | Simplifies configuration of avalanchego flags  |
    31  | genesis.go        |             | Creates test genesis                           |
    32  | network.go        | Network     | Orchestrates and configures temporary networks |
    33  | network_config.go | Network     | Reads and writes network configuration         |
    34  | node.go           | Node        | Orchestrates and configures nodes              |
    35  | node_config.go    | Node        | Reads and writes node configuration            |
    36  | node_process.go   | NodeProcess | Orchestrates node processes                    |
    37  | subnet.go         | Subnet      | Orchestrates subnets                           |
    38  | utils.go          |             | Defines shared utility functions               |
    39  
    40  ## Usage
    41  
    42  ### Via tmpnetctl
    43  
    44  A temporary network can be managed by the `tmpnetctl` cli tool:
    45  
    46  ```bash
    47  # From the root of the avalanchego repo
    48  
    49  # Build the tmpnetctl binary
    50  $ ./scripts/build_tmpnetctl.sh
    51  
    52  # Start a new network. Possible to specify the number of nodes (> 1) with --node-count.
    53  $ ./build/tmpnetctl start-network --avalanchego-path=/path/to/avalanchego
    54  ...
    55  Started network /home/me/.tmpnet/networks/20240306-152305.924531 (UUID: abaab590-b375-44f6-9ca5-f8a6dc061725)
    56  
    57  Configure tmpnetctl to target this network by default with one of the following statements:
    58   - source /home/me/.tmpnet/networks/20240306-152305.924531/network.env
    59   - export TMPNET_NETWORK_DIR=/home/me/.tmpnet/networks/20240306-152305.924531
    60   - export TMPNET_NETWORK_DIR=/home/me/.tmpnet/networks/latest
    61  
    62  # Stop the network
    63  $ ./build/tmpnetctl stop-network --network-dir=/path/to/network
    64  ```
    65  
    66  Note the export of the path ending in `latest`. This is a symlink that
    67  is set to the last network created by `tmpnetctl start-network`. Setting
    68  the `TMPNET_NETWORK_DIR` env var to this symlink ensures that
    69  `tmpnetctl` commands target the most recently deployed temporary
    70  network.
    71  
    72  #### Deprecated usage with e2e suite
    73  
    74  `tmpnetctl` was previously used to create temporary networks for use
    75  across multiple e2e test runs. As the usage of temporary networks has
    76  expanded to require subnets, that usage has been supplanted by the
    77  `--reuse-network` flag defined for the e2e suite. It was easier to
    78  support defining subnet configuration in the e2e suite in code than to
    79  extend a cli tool like `tmpnetctl` to support similar capabilities.
    80  
    81  ### Via code
    82  
    83  A temporary network can be managed in code:
    84  
    85  ```golang
    86  network := &tmpnet.Network{                   // Configure non-default values for the new network
    87      DefaultFlags: tmpnet.FlagsMap{
    88          config.LogLevelKey: "INFO",           // Change one of the network's defaults
    89      },
    90      Nodes: tmpnet.NewNodesOrPanic(5),           // Number of initial validating nodes
    91      Subnets: []*tmpnet.Subnet{                // Subnets to create on the new network once it is running
    92          {
    93              Name: "xsvm-a",                   // User-defined name used to reference subnet in code and on disk
    94              Chains: []*tmpnet.Chain{
    95                  {
    96                      VMName: "xsvm",           // Name of the VM the chain will run, will be used to derive the name of the VM binary
    97                      Genesis: <genesis bytes>, // Genesis bytes used to initialize the custom chain
    98                      PreFundedKey: <key>,      // (Optional) A private key that is funded in the genesis bytes
    99                  },
   100              },
   101              ValidatorIDs: <node ids>,         // The IDs of nodes that validate the subnet
   102          },
   103      },
   104  }
   105  
   106  _ := tmpnet.BootstrapNewNetwork(          // Bootstrap the network
   107      ctx,                                  // Context used to limit duration of waiting for network health
   108      ginkgo.GinkgoWriter,                  // Writer to report progress of initialization
   109      network,
   110      "",                                   // Empty string uses the default network path (~/tmpnet/networks)
   111      "/path/to/avalanchego",               // The path to the binary that nodes will execute
   112      "/path/to/plugins",                   // The path nodes will use for plugin binaries (suggested value ~/.avalanchego/plugins)
   113  )
   114  
   115  uris := network.GetNodeURIs()
   116  
   117  // Use URIs to interact with the network
   118  
   119  // Stop all nodes in the network
   120  network.Stop(context.Background())
   121  ```
   122  
   123  ## Networking configuration
   124  
   125  By default, nodes in a temporary network will be started with staking and
   126  API ports set to `0` to ensure that ports will be dynamically
   127  chosen. The tmpnet fixture discovers the ports used by a given node
   128  by reading the `[base-data-dir]/process.json` file written by
   129  avalanchego on node start. The use of dynamic ports supports testing
   130  with many temporary networks without having to manually select compatible
   131  port ranges.
   132  
   133  ## Configuration on disk
   134  
   135  A temporary network relies on configuration written to disk in the following structure:
   136  
   137  ```
   138  HOME
   139  └── .tmpnet                                              // Root path for the temporary network fixture
   140      ├── prometheus                                       // Working directory for a metrics-scraping prometheus instance
   141      │   └── file_sd_configs                              // Directory containing file-based service discovery config for prometheus
   142      ├── promtail                                         // Working directory for a log-collecting promtail instance
   143      │   └── file_sd_configs                              // Directory containing file-based service discovery config for promtail
   144      └── networks                                         // Default parent directory for temporary networks
   145          └── 20240306-152305.924531                       // The timestamp of creation is the name of a network's directory
   146              ├── NodeID-37E8UK3x2YFsHE3RdALmfWcppcZ1eTuj9 // The ID of a node is the name of its data dir
   147              │   ├── chainData
   148              │   │   └── ...
   149              │   ├── config.json                          // Node runtime configuration
   150              │   ├── db
   151              │   │   └── ...
   152              │   ├── flags.json                           // Node flags
   153              │   ├── logs
   154              │   │   └── ...
   155              │   ├── plugins
   156              │   │   └── ...
   157              │   └── process.json                         // Node process details (PID, API URI, staking address)
   158              ├── chains
   159              │   ├── C
   160              │   │   └── config.json                      // C-Chain config for all nodes
   161              │   └── raZ51bwfepaSaZ1MNSRNYNs3ZPfj...U7pa3
   162              │       └── config.json                      // Custom chain configuration for all nodes
   163              ├── config.json                              // Common configuration (including defaults and pre-funded keys)
   164              ├── genesis.json                             // Genesis for all nodes
   165              ├── network.env                              // Sets network dir env var to simplify network usage
   166              └── subnets                                  // Directory containing subnet config for both avalanchego and tmpnet
   167                  ├── subnet-a.json                        // tmpnet configuration for subnet-a and its chain(s)
   168                  ├── subnet-b.json                        // tmpnet configuration for subnet-b and its chain(s)
   169                  └── 2jRbWtaonb2RP8DEM5DBsd7o2o8d...RqNs9 // The ID of a subnet is the name of its configuration dir
   170                      └── config.json                      // avalanchego configuration for subnet
   171  ```
   172  
   173  ### Common networking configuration
   174  
   175  Network configuration such as default flags (e.g. `--log-level=`),
   176  runtime defaults (e.g. avalanchego path) and pre-funded private keys
   177  are stored at `[network-dir]/config.json`. A given default will only
   178  be applied to a new node on its addition to the network if the node
   179  does not explicitly set a given value.
   180  
   181  ### Genesis
   182  
   183  The genesis file is stored at `[network-dir]/genesis.json` and
   184  referenced by default by all nodes in the network. The genesis file
   185  content will be generated with reasonable defaults if not
   186  supplied. Each node in the network can override the default by setting
   187  an explicit value for `--genesis-file` or `--genesis-file-content`.
   188  
   189  ### Chain configuration
   190  
   191  The chain configuration for a temporary network is stored at
   192  `[network-dir]/chains/[chain alias or ID]/config.json` and referenced
   193  by all nodes in the network. The C-Chain config will be generated with
   194  reasonable defaults if not supplied. X-Chain and P-Chain will use
   195  implicit defaults. The configuration for custom chains can be provided
   196  with subnet configuration and will be writen to the appropriate path.
   197  
   198  Each node in the network can override network-level chain
   199  configuration by setting `--chain-config-dir` to an explicit value and
   200  ensuring that configuration files for all chains exist at
   201  `[custom-chain-config-dir]/[chain alias or ID]/config.json`.
   202  
   203  ### Network env
   204  
   205  A shell script that sets the `TMPNET_NETWORK_DIR` env var to the
   206  path of the network is stored at `[network-dir]/network.env`. Sourcing
   207  this file (i.e. `source network.env`) in a shell will configure ginkgo
   208  e2e and the `tmpnetctl` cli to target the network path specified in
   209  the env var.
   210  
   211  Set `TMPNET_ROOT_DIR` to specify the root directory in which to create
   212  the configuration directory of new networks
   213  (e.g. `$TMPNET_ROOT_DIR/[network-dir]`). The default root directory is
   214  `~/.tmpdir/networks`. Configuring the root directory is only relevant
   215  when creating new networks as the path of existing networks will
   216  already have been set.
   217  
   218  ### Node configuration
   219  
   220  The data dir for a node is set by default to
   221  `[network-path]/[node-id]`. A node can be configured to use a
   222  non-default path by explicitly setting the `--data-dir`
   223  flag.
   224  
   225  #### Runtime config
   226  
   227  The details required to configure a node's execution are written to
   228  `[network-path]/[node-id]/config.json`. This file contains the
   229  runtime-specific details like the path of the avalanchego binary to
   230  start the node with.
   231  
   232  #### Flags
   233  
   234  All flags used to configure a node are written to
   235  `[network-path]/[node-id]/flags.json` so that a node can be
   236  configured with only a single argument:
   237  `--config-file=/path/to/flags.json`. This simplifies node launch and
   238  ensures all parameters used to launch a node can be modified by
   239  editing the config file.
   240  
   241  #### Process details
   242  
   243  The process details of a node are written by avalanchego to
   244  `[base-data-dir]/process.json`. The file contains the PID of the node
   245  process, the URI of the node's API, and the address other nodes can
   246  use to bootstrap themselves (aka staking address).
   247  
   248  ## Monitoring
   249  
   250  Monitoring is an essential part of understanding the workings of a
   251  distributed system such as avalanchego. The tmpnet fixture enables
   252  collection of logs and metrics from temporary networks to a monitoring
   253  stack (prometheus+loki+grafana) to enable results to be analyzed and
   254  shared.
   255  
   256  ### Example usage
   257  
   258  ```bash
   259  # Start prometheus to collect metrics
   260  PROMETHEUS_ID=<id> PROMETHEUS_PASSWORD=<password> ./scripts/run_prometheus.sh
   261  
   262  # Start promtail to collect logs
   263  LOKI_ID=<id> LOKI_PASSWORD=<password> ./scripts/run_promtail.sh
   264  
   265  # Network start emits link to grafana displaying collected logs and metrics
   266  ./build/tmpnetctl start-network
   267  ```
   268  
   269  ### Metrics collection
   270  
   271  When a node is started, configuration enabling collection of metrics
   272  from the node is written to
   273  `~/.tmpnet/prometheus/file_sd_configs/[network uuid]-[node id].json`.
   274  
   275  The `scripts/run_prometheus.sh` script starts prometheus in agent mode
   276  configured to scrape metrics from configured nodes and forward the
   277  metrics to a persistent prometheus instance. The script requires that
   278  the `PROMETHEUS_ID` and `PROMETHEUS_PASSWORD` env vars be set. By
   279  default the prometheus instance at
   280  https://prometheus-experimental.avax-dev.network will be targeted and
   281  this can be overridden via the `PROMETHEUS_URL` env var.
   282  
   283  ### Log collection
   284  
   285  Nodes log are stored at `~/.tmpnet/networks/[network id]/[node
   286  id]/logs` by default, and can optionally be forwarded to loki with
   287  promtail.
   288  
   289  When a node is started, promtail configuration enabling
   290  collection of logs for the node is written to
   291  `~/.tmpnet/promtail/file_sd_configs/[network
   292  uuid]-[node id].json`.
   293  
   294  The `scripts/run_promtail.sh` script starts promtail configured to
   295  collect logs from configured nodes and forward the results to loki. The
   296  script requires that the `LOKI_ID` and `LOKI_PASSWORD` env vars be
   297  set. By default the loki instance at
   298  https://loki-experimental.avax-dev.network will be targeted and this
   299  can be overridden via the `LOKI_URL` env var.
   300  
   301  ### Labels
   302  
   303  The logs and metrics collected for temporary networks will have the
   304  following labels applied:
   305  
   306   - `network_uuid`
   307     - uniquely identifies a network across hosts
   308   - `node_id`
   309   - `is_ephemeral_node`
   310     - 'ephemeral' nodes are expected to run for only a fraction of the
   311       life of a network
   312   - `network_owner`
   313     - an arbitrary string that can be used to differentiate results
   314       when a CI job runs more than one network
   315  
   316  When a network runs as part of a github CI job, the following
   317  additional labels will be applied:
   318  
   319   - `gh_repo`
   320   - `gh_workflow`
   321   - `gh_run_id`
   322   - `gh_run_number`
   323   - `gh_run_attempt`
   324   - `gh_job_id`
   325  
   326  These labels are sourced from Github Actions' `github` context as per
   327  https://docs.github.com/en/actions/learn-github-actions/contexts#github-context.
   328  
   329  ### Viewing
   330  
   331  #### Local networks
   332  
   333  When a network is started with tmpnet, a link to the [default grafana
   334  instance](https://grafana-experimental.avax-dev.network) will be
   335  emitted. The dashboards will only be populated if prometheus and
   336  promtail are running locally (as per previous sections) to collect
   337  metrics and logs.
   338  
   339  #### CI
   340  
   341  Collection of logs and metrics is enabled for CI jobs that use
   342  tmpnet. Each job will execute a step titled `Notify of metrics
   343  availability` that emits a link to grafana parametized to show results
   344  for the job. Additional links to grafana parametized to show results
   345  for individual network will appear in the logs displaying the start of
   346  those networks.