github.com/cockroachdb/cockroach@v20.2.0-alpha.1+incompatible/pkg/cmd/roachprod/README.md (about)

     1  ## roachprod
     2  
     3  ⚠️ roachprod is an **internal** tool for creating and testing
     4  CockroachDB clusters. Use at your own risk! ⚠️
     5  
     6  ## Setup
     7  
     8  1. Make sure you have [gcloud installed] and configured (`gcloud auth list` to
     9  check, `gcloud auth login` to authenticate). You may want to update old
    10  installations (`gcloud components update`).
    11  1. Build a local binary of `roachprod`: `make bin/roachprod`
    12  1. Add `$PWD/bin` to your `PATH` so you can run `roachprod` from the root directory of `cockroach`.
    13  
    14  ## Summary
    15  
    16  * By default, clusters are created in the [cockroach-ephemeral] GCE
    17    project. Use the `--gce-project` flag or `GCE_PROJECT` environment
    18    variable to create clusters in a different GCE project. Note that
    19    the `lifetime` functionality requires `roachprod gc
    20    --gce-project=<name>` to be run periodically (i.e. via a
    21    cronjob). This is only provided out-of-the-box for the
    22    [cockroach-ephemeral] cluster.
    23  * Anyone can connect to any port on VMs in [cockroach-ephemeral].
    24    **DO NOT STORE SENSITIVE DATA**.
    25  * Cluster names are prefixed with the user creating them. For example,
    26    `roachprod create test` creates the `marc-test` cluster.
    27  * VMs have a default lifetime of 12 hours (changeable with the
    28    `--lifetime` flag).
    29  * Default settings create 4 VMs (`-n 4`) with 4 CPUs, 15GB memory
    30    (`--machine-type=n1-standard-4`), and local SSDs (`--local-ssd`).
    31  
    32  ## Cluster quick-start using roachprod
    33  
    34  ```bash
    35  # Create a cluster with 4 nodes and local SSD. The last node is used as a
    36  # load generator for some tests. Note that the cluster name must always begin
    37  # with your username.
    38  export CLUSTER="${USER}-test"
    39  roachprod create ${CLUSTER} -n 4 --local-ssd
    40  
    41  # Add gcloud SSH key. Optional for most commands, but some require it.
    42  ssh-add ~/.ssh/google_compute_engine
    43  
    44  # Stage binaries.
    45  roachprod stage ${CLUSTER} workload
    46  roachprod stage ${CLUSTER} release v2.0.5
    47  
    48  # ...or using roachprod directly (e.g., for your locally-built binary).
    49  build/builder.sh mkrelease
    50  roachprod put ${CLUSTER} cockroach-linux-2.6.32-gnu-amd64 cockroach
    51  
    52  # Start a cluster.
    53  roachprod start ${CLUSTER}
    54  
    55  # Check the admin UI.
    56  roachprod admin --open ${CLUSTER}:1
    57  
    58  # Run a workload.
    59  roachprod run ${CLUSTER}:4 -- ./workload init kv
    60  roachprod run ${CLUSTER}:4 -- ./workload run kv --read-percent=0 --splits=1000 --concurrency=384 --duration=5m
    61  
    62  # Open a SQL connection to the first node.
    63  roachprod sql ${CLUSTER}:1
    64  
    65  # Extend lifetime by another 6 hours.
    66  roachprod extend ${CLUSTER} --lifetime=6h
    67  
    68  # Destroy the cluster.
    69  roachprod destroy ${CLUSTER}
    70  ```
    71  
    72  ## Command reference
    73  
    74  Warning: this reference is incomplete. Be prepared to refer to the CLI help text
    75  and the source code.
    76  
    77  ### Create a cluster
    78  
    79  ```
    80  $ roachprod create foo
    81  Creating cluster marc-foo with 3 nodes
    82  OK
    83  marc-foo: 23h59m42s remaining
    84    marc-foo-0000   [marc-foo-0000.us-east1-b.cockroach-ephemeral]
    85    marc-foo-0001   [marc-foo-0001.us-east1-b.cockroach-ephemeral]
    86    marc-foo-0002   [marc-foo-0002.us-east1-b.cockroach-ephemeral]
    87  Syncing...
    88  ```
    89  
    90  #### Choosing a Provider
    91  
    92  Use the `--clouds` flag to set which cloud provider(s) to use. Ex:
    93  
    94  ```
    95  $ roachprod create foo --clouds gce,aws
    96  ```
    97  
    98  #### Node Distribution Options
    99  
   100  There are a couple flags that interact to create nodes in one zone or in
   101  geographically distributed zones:
   102  
   103  - `--geo`
   104  - the `--[provider]-zones` flags (`--gce-zones`, `--aws-zones`, `--azure-locations`)
   105  
   106  Here's what to expect when the options are combined:
   107  
   108  - _If neither are set_: nodes are all placed within one of the the provider's default zones
   109  - _`--geo` only_: nodes are spread across the provider's default zones
   110  - _`--[provider]-zones` or `--geo --[provider]-zones`_: nodes are spread across
   111    all the specified zones
   112  
   113  ### Interact using crl-prod tools
   114  
   115  `roachprod` populates hosts files in `~/.roachprod/hosts`. These are used by
   116  `crl-prod` tools to map clusters to node addresses.
   117  
   118  ```
   119  $ crl-ssh marc-foo all df -h /
   120  1: marc-foo-0000.us-east1-b.cockroach-ephemeral
   121  Filesystem      Size  Used Avail Use% Mounted on
   122  /dev/sda1        49G  1.2G   48G   3% /
   123  
   124  2: marc-foo-0001.us-east1-b.cockroach-ephemeral
   125  Filesystem      Size  Used Avail Use% Mounted on
   126  /dev/sda1        49G  1.2G   48G   3% /
   127  
   128  3: marc-foo-0002.us-east1-b.cockroach-ephemeral
   129  Filesystem      Size  Used Avail Use% Mounted on
   130  /dev/sda1        49G  1.2G   48G   3% /
   131  ```
   132  
   133  ### Interact using `roachprod` directly
   134  
   135  ```
   136  # Add ssh-key
   137  $ ssh-add ~/.ssh/google_compute_engine
   138  
   139  $ roachprod status marc-foo
   140  marc-foo: status 3/3
   141     1: not running
   142     2: not running
   143     3: not running
   144  ```
   145  
   146  ### SSH into hosts
   147  
   148  `roachprod` uses `gcloud` to sync the list of hostnames to `~/.ssh/config` and
   149  set up keys.
   150  
   151  ```
   152  $ ssh marc-foo-0000.us-east1-b.cockroach-ephemeral
   153  ```
   154  
   155  ### List clusters
   156  
   157  ```
   158  $ roachprod list
   159  marc-foo: 23h58m27s remaining
   160    marc-foo-0000
   161    marc-foo-0001
   162    marc-foo-0002
   163  Syncing...
   164  ```
   165  
   166  ### Destroy cluster
   167  
   168  ```
   169  $ roachprod destroy marc-foo
   170  Destroying cluster marc-foo with 3 nodes
   171  OK
   172  ```
   173  
   174  See `roachprod help <command>` for further details.
   175  
   176  ## Return Codes
   177  
   178  `roachprod` uses return codes to provide information about the exit status.
   179  These are the codes and what they mean:
   180  
   181  - 0: everything ran as expected
   182  - 1: an unclassified roachprod error
   183  - 10: a problem with an SSH connection to a server in the cluster
   184  - 20: a problem running a non-cockroach command on a remote cluster server or on a local node
   185  - 30: a problem running a cockroach command on a remote cluster server or a local node
   186  
   187  Each of these codes has a corresponding easy-to-search-for string that is
   188  emitted to output when an error of that type occurs. The strings are emitted
   189  near the end of output and for each error that happens during an ssh
   190  connection to a remote cluster node. The strings for each error code are:
   191  
   192  - 1:  `UNCLASSIFIED_PROBLEM`
   193  - 10: `SSH_PROBLEM`
   194  - 20: `COMMAND_PROBLEM`
   195  - 30: `DEAD_ROACH_PROBLEM`
   196  
   197  # Future improvements
   198  
   199  * Bigger loadgen VM (last instance)
   200  
   201  * Ease the creation of test metadata and then running a series of tests
   202    using `roachprod <cluster> test <dir1> <dir2> ...`. Perhaps something like
   203    `roachprod prepare <test> <binary>`.
   204  
   205  * Automatically detect stalled tests and restart tests upon unexpected
   206    failures. Detection of stalled tests could be done by noticing zero output
   207    for a period of time.
   208  
   209  * Detect crashed cockroach nodes.
   210  
   211  [cockroach-ephemeral]: https://console.cloud.google.com/home/dashboard?project=cockroach-ephemeral
   212  [gcloud installed]: https://cloud.google.com/sdk/downloads