github.com/cockroachdb/cockroach@v20.2.0-alpha.1+incompatible/docs/cloud-resources.md (about) 1 # Cloud Resources 2 3 This document attempts to catalog the cloud resources created by or required by 4 scripts in this repository. At the time of writing, 16 August 2017, this 5 document was not nearly comprehensive; please do not take it as such. 6 7 ## Test Fixtures 8 9 Our acceptance tests push quite a bit of data around. For a rough sense of 10 scale, the biggest allocator tests create clusters with hundreds of gigabytes of 11 data, and the biggest backup/restore tests push around several terabytes of 12 data. This is far too much data to store in a VM image and far too much data to 13 generate on demand, so we stash "test fixtures" in cloud blob storage. 14 15 At the moment, all our test fixtures live in [Azure Blob 16 Storage][azure-blob-storage], Azure's equivalent of Amazon S3. The object 17 hierarchy looks like this: 18 19 * **roachfixtures{region}/** 20 * **backups/** — the output of `BACKUP... TO 'azure://roachfixtures/backups/FOO'`, 21 used to test `RESTORE` without manually running a backup. 22 * **2tb/** 23 * **tpch{1,5,10,100}/** 24 * **store-dumps/** — gzipped tarballs of raw stores (i.e., `cockroach-data` 25 directories), used to test allocator rebalancing and 26 backups without manually inserting gigabytes of data. 27 * **1node-17gb-841ranges/** - source: `RESTORE` of `tpch10` 28 * **1node-113gb-9595ranges/** - source: `tpch100 IMPORT` 29 * **3nodes-17gb-841ranges/** - source: `RESTORE` of `tpch10` 30 * **6nodes-67gb-9588ranges/** - source: `RESTORE` of `tpch100` 31 * **10nodes-2tb-50000ranges/** 32 * **csvs/** — huge CSVs used to test distributed CSV import (`IMPORT...`). 33 34 *PLEA(benesch):* Please keep the above list up to date if you add additional 35 objects to the storage account. It's very difficult to track down an object's 36 origin story. 37 38 Note that egress bandwidth is *expensive*. Every gigabyte of outbound traffic 39 costs 8¢, so one 2TB restore costs approximately $160. Data transfer within a 40 region, like from an storage account in the `eastus` region to a VM in the 41 `eastus` region, is not considered outbound traffic, however, and so is free. 42 43 Ideally, we'd limit ourselves to one region and frolic in free bandwidth 44 forever. In practice, of course, things are never simple. The `eastus` region 45 doesn't support newer VMs, and we want to test backup/restore on both old and 46 new VMs. So we duplicate the `roachfixtures` storage accounts in each region we 47 spin up acceptance tests. At the moment, we have the following storage accounts: 48 49 * **`roachfixtureseastus`** — missing new VMs 50 * **`roachfixtureswestus`** — has new VMs 51 52 ### Syncing `roachfixtures{region}` Buckets 53 54 By far the fastest way to interact with Azure Blob Storage is the 55 [`azcopy`][azcopy] command. It's a bit of a pain to install, but the Azure CLIs 56 (`az` and `azure`) don't attempt to parallelize operations and won't come close 57 to saturating your network bandwidth. 58 59 Here's a sample invocation to sync data from `eastus` to `westus`: 60 61 ```shell 62 for container in $(az storage container list --account-name roachfixtureseastus -o tsv --query '[*].name') 63 do 64 azcopy --recursive --exclude-older --sync-copy \ 65 --source "https://roachfixtureseastus.blob.core.windows.net/$container" \ 66 --destination "https://roachfixtureswestus.blob.core.windows.net/$container" \ 67 --source-key "$source_key" --dest-key "$dest_key" 68 done 69 ``` 70 71 Since egress is expensive and ingress is free, be sure to run this on an 72 azworker located in the source region—`eastus` in this case. 73 74 You can fetch the source and destination access keysfrom the Azure Portal or 75 with the following Azure CLI 2.0 command: 76 77 ```shell 78 az storage account keys list -g fixtures -n roachfixtures{region} -o tsv --query '[0].value' 79 ``` 80 81 TODO(benesch): install `azcopy` on azworkers. 82 83 TODO(benesch): set up a TeamCity build to sync fixtures buckets automatically. 84 85 ## Ephemeral Storage 86 87 Backup acceptance tests need a cloud storage account to use as the backup 88 destination. These backups don't need to last beyond the end of each acceptance 89 test, and so the files are periodically cleaned up to avoid paying for 90 unnecessary storage. 91 92 TODO(benesch): automate cleaning out ephemeral data. Right now, the process is 93 entirely manual. 94 95 To avoid accidentally deleting fixture data—which can take several *days* to 96 regenerate—the ephemeral data is stored in separate storage accounts from the 97 backup data. Currently, we maintain the following storage accounts: 98 99 * **`roachephemeraleastus`** 100 * **`roachephemeralwestus`** 101 102 Some acceptance tests follow a backup to ephemeral storage with a restore from 103 that same ephemeral storage, triggering both ingress and egress bandwidth. To 104 avoid paying for this egress bandwidth, we colocate storage accounts with the 105 VMs running the acceptance tests, as we do for test fixtures. 106 107 [azcopy]: https://docs.microsoft.com/en-us/azure/storage/storage-use-azcopy-linux 108 [azure-blob-storage]: https://docs.microsoft.com/en-us/azure/storage/storage-introduction#blob-storage