github.com/MetalBlockchain/metalgo@v1.11.9/tests/fixture/tmpnet/README.md (about) 1 # tmpnet - temporary network orchestration 2 3 This package implements a simple orchestrator for the avalanchego 4 nodes of a temporary network. Configuration is stored on disk, and 5 nodes run as independent processes whose process details are also 6 written to disk. Using the filesystem to store configuration and 7 process details allows for the `tmpnetctl` cli and e2e test fixture to 8 orchestrate the same temporary networks without the use of an rpc daemon. 9 10 ## What's in a name? 11 12 The name of this package was originally `testnet` and its cli was 13 `testnetctl`. This name was chosen in ignorance that `testnet` 14 commonly refers to a persistent blockchain network used for testing. 15 16 To avoid confusion, the name was changed to `tmpnet` and its cli 17 `tmpnetctl`. `tmpnet` is short for `temporary network` since the 18 networks it deploys are likely to live for a limited duration in 19 support of the development and testing of avalanchego and its related 20 repositories. 21 22 ## Package details 23 24 The functionality in this package is grouped by logical purpose into 25 the following non-test files: 26 27 | Filename | Types | Purpose | 28 |:------------------|:------------|:-----------------------------------------------| 29 | defaults.go | | Defines common default configuration | 30 | flags.go | FlagsMap | Simplifies configuration of avalanchego flags | 31 | genesis.go | | Creates test genesis | 32 | network.go | Network | Orchestrates and configures temporary networks | 33 | network_config.go | Network | Reads and writes network configuration | 34 | node.go | Node | Orchestrates and configures nodes | 35 | node_config.go | Node | Reads and writes node configuration | 36 | node_process.go | NodeProcess | Orchestrates node processes | 37 | subnet.go | Subnet | Orchestrates subnets | 38 | utils.go | | Defines shared utility functions | 39 40 ## Usage 41 42 ### Via tmpnetctl 43 44 A temporary network can be managed by the `tmpnetctl` cli tool: 45 46 ```bash 47 # From the root of the avalanchego repo 48 49 # Build the tmpnetctl binary 50 $ ./scripts/build_tmpnetctl.sh 51 52 # Start a new network. Possible to specify the number of nodes (> 1) with --node-count. 53 $ ./build/tmpnetctl start-network --avalanchego-path=/path/to/avalanchego 54 ... 55 Started network /home/me/.tmpnet/networks/20240306-152305.924531 (UUID: abaab590-b375-44f6-9ca5-f8a6dc061725) 56 57 Configure tmpnetctl to target this network by default with one of the following statements: 58 - source /home/me/.tmpnet/networks/20240306-152305.924531/network.env 59 - export TMPNET_NETWORK_DIR=/home/me/.tmpnet/networks/20240306-152305.924531 60 - export TMPNET_NETWORK_DIR=/home/me/.tmpnet/networks/latest 61 62 # Stop the network 63 $ ./build/tmpnetctl stop-network --network-dir=/path/to/network 64 ``` 65 66 Note the export of the path ending in `latest`. This is a symlink that 67 is set to the last network created by `tmpnetctl start-network`. Setting 68 the `TMPNET_NETWORK_DIR` env var to this symlink ensures that 69 `tmpnetctl` commands target the most recently deployed temporary 70 network. 71 72 #### Deprecated usage with e2e suite 73 74 `tmpnetctl` was previously used to create temporary networks for use 75 across multiple e2e test runs. As the usage of temporary networks has 76 expanded to require subnets, that usage has been supplanted by the 77 `--reuse-network` flag defined for the e2e suite. It was easier to 78 support defining subnet configuration in the e2e suite in code than to 79 extend a cli tool like `tmpnetctl` to support similar capabilities. 80 81 ### Via code 82 83 A temporary network can be managed in code: 84 85 ```golang 86 network := &tmpnet.Network{ // Configure non-default values for the new network 87 DefaultFlags: tmpnet.FlagsMap{ 88 config.LogLevelKey: "INFO", // Change one of the network's defaults 89 }, 90 Nodes: tmpnet.NewNodesOrPanic(5), // Number of initial validating nodes 91 Subnets: []*tmpnet.Subnet{ // Subnets to create on the new network once it is running 92 { 93 Name: "xsvm-a", // User-defined name used to reference subnet in code and on disk 94 Chains: []*tmpnet.Chain{ 95 { 96 VMName: "xsvm", // Name of the VM the chain will run, will be used to derive the name of the VM binary 97 Genesis: <genesis bytes>, // Genesis bytes used to initialize the custom chain 98 PreFundedKey: <key>, // (Optional) A private key that is funded in the genesis bytes 99 }, 100 }, 101 ValidatorIDs: <node ids>, // The IDs of nodes that validate the subnet 102 }, 103 }, 104 } 105 106 _ := tmpnet.BootstrapNewNetwork( // Bootstrap the network 107 ctx, // Context used to limit duration of waiting for network health 108 ginkgo.GinkgoWriter, // Writer to report progress of initialization 109 network, 110 "", // Empty string uses the default network path (~/tmpnet/networks) 111 "/path/to/avalanchego", // The path to the binary that nodes will execute 112 "/path/to/plugins", // The path nodes will use for plugin binaries (suggested value ~/.avalanchego/plugins) 113 ) 114 115 uris := network.GetNodeURIs() 116 117 // Use URIs to interact with the network 118 119 // Stop all nodes in the network 120 network.Stop(context.Background()) 121 ``` 122 123 ## Networking configuration 124 125 By default, nodes in a temporary network will be started with staking and 126 API ports set to `0` to ensure that ports will be dynamically 127 chosen. The tmpnet fixture discovers the ports used by a given node 128 by reading the `[base-data-dir]/process.json` file written by 129 avalanchego on node start. The use of dynamic ports supports testing 130 with many temporary networks without having to manually select compatible 131 port ranges. 132 133 ## Configuration on disk 134 135 A temporary network relies on configuration written to disk in the following structure: 136 137 ``` 138 HOME 139 └── .tmpnet // Root path for the temporary network fixture 140 ├── prometheus // Working directory for a metrics-scraping prometheus instance 141 │ └── file_sd_configs // Directory containing file-based service discovery config for prometheus 142 ├── promtail // Working directory for a log-collecting promtail instance 143 │ └── file_sd_configs // Directory containing file-based service discovery config for promtail 144 └── networks // Default parent directory for temporary networks 145 └── 20240306-152305.924531 // The timestamp of creation is the name of a network's directory 146 ├── NodeID-37E8UK3x2YFsHE3RdALmfWcppcZ1eTuj9 // The ID of a node is the name of its data dir 147 │ ├── chainData 148 │ │ └── ... 149 │ ├── config.json // Node runtime configuration 150 │ ├── db 151 │ │ └── ... 152 │ ├── flags.json // Node flags 153 │ ├── logs 154 │ │ └── ... 155 │ ├── plugins 156 │ │ └── ... 157 │ └── process.json // Node process details (PID, API URI, staking address) 158 ├── chains 159 │ ├── C 160 │ │ └── config.json // C-Chain config for all nodes 161 │ └── raZ51bwfepaSaZ1MNSRNYNs3ZPfj...U7pa3 162 │ └── config.json // Custom chain configuration for all nodes 163 ├── config.json // Common configuration (including defaults and pre-funded keys) 164 ├── genesis.json // Genesis for all nodes 165 ├── network.env // Sets network dir env var to simplify network usage 166 └── subnets // Directory containing subnet config for both avalanchego and tmpnet 167 ├── subnet-a.json // tmpnet configuration for subnet-a and its chain(s) 168 ├── subnet-b.json // tmpnet configuration for subnet-b and its chain(s) 169 └── 2jRbWtaonb2RP8DEM5DBsd7o2o8d...RqNs9 // The ID of a subnet is the name of its configuration dir 170 └── config.json // avalanchego configuration for subnet 171 ``` 172 173 ### Common networking configuration 174 175 Network configuration such as default flags (e.g. `--log-level=`), 176 runtime defaults (e.g. avalanchego path) and pre-funded private keys 177 are stored at `[network-dir]/config.json`. A given default will only 178 be applied to a new node on its addition to the network if the node 179 does not explicitly set a given value. 180 181 ### Genesis 182 183 The genesis file is stored at `[network-dir]/genesis.json` and 184 referenced by default by all nodes in the network. The genesis file 185 content will be generated with reasonable defaults if not 186 supplied. Each node in the network can override the default by setting 187 an explicit value for `--genesis-file` or `--genesis-file-content`. 188 189 ### Chain configuration 190 191 The chain configuration for a temporary network is stored at 192 `[network-dir]/chains/[chain alias or ID]/config.json` and referenced 193 by all nodes in the network. The C-Chain config will be generated with 194 reasonable defaults if not supplied. X-Chain and P-Chain will use 195 implicit defaults. The configuration for custom chains can be provided 196 with subnet configuration and will be writen to the appropriate path. 197 198 Each node in the network can override network-level chain 199 configuration by setting `--chain-config-dir` to an explicit value and 200 ensuring that configuration files for all chains exist at 201 `[custom-chain-config-dir]/[chain alias or ID]/config.json`. 202 203 ### Network env 204 205 A shell script that sets the `TMPNET_NETWORK_DIR` env var to the 206 path of the network is stored at `[network-dir]/network.env`. Sourcing 207 this file (i.e. `source network.env`) in a shell will configure ginkgo 208 e2e and the `tmpnetctl` cli to target the network path specified in 209 the env var. 210 211 Set `TMPNET_ROOT_DIR` to specify the root directory in which to create 212 the configuration directory of new networks 213 (e.g. `$TMPNET_ROOT_DIR/[network-dir]`). The default root directory is 214 `~/.tmpdir/networks`. Configuring the root directory is only relevant 215 when creating new networks as the path of existing networks will 216 already have been set. 217 218 ### Node configuration 219 220 The data dir for a node is set by default to 221 `[network-path]/[node-id]`. A node can be configured to use a 222 non-default path by explicitly setting the `--data-dir` 223 flag. 224 225 #### Runtime config 226 227 The details required to configure a node's execution are written to 228 `[network-path]/[node-id]/config.json`. This file contains the 229 runtime-specific details like the path of the avalanchego binary to 230 start the node with. 231 232 #### Flags 233 234 All flags used to configure a node are written to 235 `[network-path]/[node-id]/flags.json` so that a node can be 236 configured with only a single argument: 237 `--config-file=/path/to/flags.json`. This simplifies node launch and 238 ensures all parameters used to launch a node can be modified by 239 editing the config file. 240 241 #### Process details 242 243 The process details of a node are written by avalanchego to 244 `[base-data-dir]/process.json`. The file contains the PID of the node 245 process, the URI of the node's API, and the address other nodes can 246 use to bootstrap themselves (aka staking address). 247 248 ## Monitoring 249 250 Monitoring is an essential part of understanding the workings of a 251 distributed system such as avalanchego. The tmpnet fixture enables 252 collection of logs and metrics from temporary networks to a monitoring 253 stack (prometheus+loki+grafana) to enable results to be analyzed and 254 shared. 255 256 ### Example usage 257 258 ```bash 259 # Start prometheus to collect metrics 260 PROMETHEUS_ID=<id> PROMETHEUS_PASSWORD=<password> ./scripts/run_prometheus.sh 261 262 # Start promtail to collect logs 263 LOKI_ID=<id> LOKI_PASSWORD=<password> ./scripts/run_promtail.sh 264 265 # Network start emits link to grafana displaying collected logs and metrics 266 ./build/tmpnetctl start-network 267 ``` 268 269 ### Metrics collection 270 271 When a node is started, configuration enabling collection of metrics 272 from the node is written to 273 `~/.tmpnet/prometheus/file_sd_configs/[network uuid]-[node id].json`. 274 275 The `scripts/run_prometheus.sh` script starts prometheus in agent mode 276 configured to scrape metrics from configured nodes and forward the 277 metrics to a persistent prometheus instance. The script requires that 278 the `PROMETHEUS_ID` and `PROMETHEUS_PASSWORD` env vars be set. By 279 default the prometheus instance at 280 https://prometheus-experimental.avax-dev.network will be targeted and 281 this can be overridden via the `PROMETHEUS_URL` env var. 282 283 ### Log collection 284 285 Nodes log are stored at `~/.tmpnet/networks/[network id]/[node 286 id]/logs` by default, and can optionally be forwarded to loki with 287 promtail. 288 289 When a node is started, promtail configuration enabling 290 collection of logs for the node is written to 291 `~/.tmpnet/promtail/file_sd_configs/[network 292 uuid]-[node id].json`. 293 294 The `scripts/run_promtail.sh` script starts promtail configured to 295 collect logs from configured nodes and forward the results to loki. The 296 script requires that the `LOKI_ID` and `LOKI_PASSWORD` env vars be 297 set. By default the loki instance at 298 https://loki-experimental.avax-dev.network will be targeted and this 299 can be overridden via the `LOKI_URL` env var. 300 301 ### Labels 302 303 The logs and metrics collected for temporary networks will have the 304 following labels applied: 305 306 - `network_uuid` 307 - uniquely identifies a network across hosts 308 - `node_id` 309 - `is_ephemeral_node` 310 - 'ephemeral' nodes are expected to run for only a fraction of the 311 life of a network 312 - `network_owner` 313 - an arbitrary string that can be used to differentiate results 314 when a CI job runs more than one network 315 316 When a network runs as part of a github CI job, the following 317 additional labels will be applied: 318 319 - `gh_repo` 320 - `gh_workflow` 321 - `gh_run_id` 322 - `gh_run_number` 323 - `gh_run_attempt` 324 - `gh_job_id` 325 326 These labels are sourced from Github Actions' `github` context as per 327 https://docs.github.com/en/actions/learn-github-actions/contexts#github-context. 328 329 ### Viewing 330 331 #### Local networks 332 333 When a network is started with tmpnet, a link to the [default grafana 334 instance](https://grafana-experimental.avax-dev.network) will be 335 emitted. The dashboards will only be populated if prometheus and 336 promtail are running locally (as per previous sections) to collect 337 metrics and logs. 338 339 #### CI 340 341 Collection of logs and metrics is enabled for CI jobs that use 342 tmpnet. Each job will execute a step titled `Notify of metrics 343 availability` that emits a link to grafana parametized to show results 344 for the job. Additional links to grafana parametized to show results 345 for individual network will appear in the logs displaying the start of 346 those networks.