github.com/NBISweden/sda-cli@v0.1.2-0.20240506070033-4c8af88918df/README.md (about) 1 SDA-CLI 2 ======= 3 4 This is the Sensitive Data Archive (SDA) Command Line Interface (sda-cli). This 5 tool was created to unify and simplify the tools needed to perform the most 6 common user actions in the SDA. 7 8 This tool can be used to encrypt and upload data when submitting to the archive, 9 and to download and decrypt with retrieving data from the archive. 10 11 It is recommended to use precompiled executables for `sda-cli` which can be found at https://github.com/NBISweden/sda-cli/releases 12 13 To get help on the usage of the tool, please use the following command 14 ```bash 15 ./sda-cli help 16 ``` 17 18 # Usage 19 20 The main functionalities implemented in this tool are explained in the following sections. 21 22 If any command seems a bit puzzling, you can find guidance on how to interpret it right [here](https://ftpdocs.broadcom.com/cadocs/0/CA%20ARCserve%20%20Backup%2015-ENU/Bookshelf_Files/HTML/CMD_Ref/command_line_syntax_characters.htm) and [here](https://medium.com/@jaewei.j.w/how-to-read-man-page-synopsis-3408e7fd0e42). 23 24 ## Encrypt 25 26 The files stored in the SDA/BP archive are encrypted using the [crypt4gh standard](https://www.ga4gh.org/news/crypt4gh-a-secure-method-for-sharing-human-genetic-data/). The following sections explain how to encrypt and upload files to the archive. 27 28 ### Download the crypt4gh public key 29 30 The files that are uploaded to the SDA/BP services, need to be encrypted with the correct public key. Depending on the service you want to use, this key can be downloaded using this command **if you are uploading to the SDA**: 31 ```bash 32 wget https://raw.githubusercontent.com/NBISweden/EGA-SE-user-docs/main/crypt4gh_key.pub 33 ``` 34 or this command **if you are uploading to Big Picture**: 35 ```bash 36 wget https://raw.githubusercontent.com/NBISweden/EGA-SE-user-docs/main/crypt4gh_bp_key.pub 37 ``` 38 39 ### Encrypt file(s) 40 41 Now that the public key is downloaded, the file(s) can be encrypted using the binary file created in the first step of this guide. To encrypt a specific file, use the following command: 42 ```bash 43 ./sda-cli encrypt [-key <public_key>] <file_to_encrypt> 44 ``` 45 where `<public_key>` the key downloaded in the previous step. The tool also allows for encrypting multiple files at once, by listing them separated with space like: 46 ```bash 47 ./sda-cli encrypt -key <public_key> <file_1_to_encrypt> <file_2_to_encrypt> <file_3_to_encrypt> 48 ``` 49 This command comes with the `-continue` option, which will continue encrypting files, even if one of them fails. To enable this feature, the command should be executed with the `-continue=true` option. 50 If no public key is provided, the tool will look for it from a previous login session. 51 52 ### Encrypt file(s) with multiple keys 53 54 To encrypt files with more than one public keys, repeatedly use the `-key` flag, e.g. 55 ```bash 56 ./sda-cli encrypt -key <public_key1> -key <public_key2> <file_to_encrypt> 57 ``` 58 will encrypt a file using two keys so that it can be decrypted with either of the corresponding private keys. Encryption with more than two keys is possible, as well. Another option is to provide as argument to `-key` a file with concatenated public keys generated e.g. from a command like 59 ```bash 60 cat <pub_key1> <pub_key2> > <concatenated_pub_keys> 61 ``` 62 Passing a combination of the above arguments is allowed, as well: 63 ```bash 64 ./sda-cli encrypt -key <concatenated_public_keys> -key <public_key3> <file_to_encrypt> 65 ``` 66 67 **Note**: The `encrypt` command will create four files containing hashes (both md5 and sha256) for the encrypted and unencrypted files, respectively. 68 69 **Developers' Notes:** The tool is creating a key pair when encrypting the files. This key pair is temporary for security reasons. 70 71 72 ## Upload 73 74 ### Download the configuration file 75 76 Once your files are encrypted, they are ready to be submitted to the SDA/BP archive. The s3 storage requires users to be authenticated, therefore a configuration file needs to be downloaded before starting the uploading of the files. 77 78 The configuration file can be downloaded by logging in with a Life Science RI account: 79 80 * For BigPicture use https://login.bp.nbis.se/ 81 82 Follow the dialogue to get authenticated and then click on `Download inbox s3cmd credentials` to download the configuration file named s3cmd.conf. The configuration file should be placed in the root folder of the repository. 83 84 Alternatively, you can download the configuration file using the [login command](#Login). 85 86 ### Upload file(s) 87 88 Now that the configuration file is downloaded, the file(s) can be uploaded to the archive using the binary file created in the first step of this guide. To upload a specific file, use the following command: 89 ```bash 90 ./sda-cli upload -config <configuration_file> <encrypted_file_to_upload> 91 ``` 92 where `<configuration_file>` the file downloaded in the previous step and `<encrypted_file_to_upload>` a file encrypted in the earlier steps. The tool also allows for uploading multiple files at once, by listing them separated with space like: 93 ```bash 94 ./sda-cli upload -config <configuration_file> <encrypted_file_1_to_upload> <encrypted_file_2_to_upload> 95 ``` 96 Note that the files will be uploaded in the base folder of the user. 97 98 ### Upload folder(s) 99 100 One can also upload entire directories recursively, i.e. including all contained files and folders while keeping the local folder structure. This can be achieved with the `-r` flag, e.g. running: 101 ```bash 102 ./sda-cli upload -config <configuration_file> -r <folder_to_upload> 103 ``` 104 will upload `<folder_to_upload>` as is, i.e. with the same inner folder and file structure as the local one, to the archive. 105 106 It is also possible to specify multiple directories and files for upload with the same command. For example, 107 ```bash 108 ./sda-cli upload -config <configuration_file> -r <encrypted_file_1_to_upload> <encrypted_file_2_to_upload> <folder_1_to_upload> <folder_2_to_upload> 109 ``` 110 However, if `-r` is omitted in the above, any folders will be skipped during upload. 111 112 ### Upload to a different path 113 114 The user can specify a different path for uploading files/folders with the `-targetDir` flag followed by the name of the folder. For example, the command: 115 ```bash 116 ./sda-cli upload -config <configuration_file> -r <encrypted_file_1_to_upload> <folder_1_to_upload> -targetDir <upload_folder> 117 ``` 118 will create `<upload_folder>` under the user's base folder with contents `<upload_folder>/<encrypted_file_1_to_upload>` and `<upload_folder>/<folder_1_to_upload>`. Note that the given `<upload_folder>` may well be a folder path, e.g. `<folder1/folder2>`, and in this case `<encrypted_file_1_to_upload>` will be uploaded to `folder1/folder2/<encrypted_file_1_to_upload>`. 119 120 As a side note it is possible to include all the contents of a directory with `/.`, for example, 121 ```bash 122 ./sda-cli upload -config <configuration_file> -r <folder_to_upload>/. -targetDir <new_folder_name> 123 ``` 124 will upload all contents of `<folder_to_upload>` to `<new_folder_name>` recursively, effectively renaming `<folder_to_upload>` upon upload to the archive. 125 126 ### Encrypt on upload 127 128 It is possible to combine the encryption and upload steps into with the use of the flag `--encrypt-with-key` followed by the path of the crypt4gh public key to be used for encryption. In this case, the input list of file arguments can only contain *unencrypted* source files. For example the following, 129 ```bash 130 ./sda-cli upload -config <configuration_file> --encrypt-with-key <public_key> <unencrypted_file_to_upload> 131 ``` 132 will encrypt `<unencrypted_file_to_upload>` using `<public_key>` as public key and upload the created `<file_to_upload.c4gh>` in the base folder of the user. 133 134 Encrypt on upload can be combined with any of the flags above. For example, 135 ```bash 136 ./sda-cli upload -config <configuration_file> --encrypt-with-key <public_key> -r <folder_to_upload_with_unencrypted_data> -targetDir <new_folder_name> 137 ``` 138 will first encrypt all files in `<folder_to_upload_with_unencrypted_data>` and then upload the folder recursively (selecting only the created `c4gh` files) under `<new_folder_name>`. 139 140 **Notes**: The tool calls the [encrypt](#Encrypt) module internally, therefore similar behavior to that command is expected, including the creation of hash files. In addition, 141 142 - For encryption with [multiple public keys](#Encrypt-file(s)-with-multiple-keys), concatenate all public keys into one file and pass it as the argument to `encrypt-with-key`. 143 - If the input includes encrypted files, the tool will exit without performing further tasks. 144 - The encrypted files will be created next to their unencrypted counterparts. 145 - The tool will not overwrite existing encrypted files. It will exit early if encrypted counterparts of the source files already exist with the same source path. 146 - If the flag `--force-overwrite` is used, the tool will overwrite any already existing file. 147 - The cli will exit if the input has any un-encrypred files. To override that, use the flag `--force-unencrypted`. 148 149 ## Get dataset size 150 151 Before downloading a dataset or a specific file, the `sda-cli` tool allows for requesting the size of each file, as well as the whole dataset. In order to use this functionality, the tool expects as an argument a file containing the location of the files in the dataset. The argument can be one of the following: 152 1. a URL to the file containing the locations of the dataset files 153 2. a URL to a folder containing the `urls_list.txt` file with the locations of the dataset files 154 3. the path to a local file containing the locations of the dataset files. 155 156 Given this argument, the dataset size can be retrieved using the following command: 157 ```bash 158 ./sda-cli datasetsize <urls_file> 159 ``` 160 where `urls_file` as described above. 161 162 ## List files 163 164 The uploaded files can be listed using the `list` parameter. This feature returns all the files in the user's bucket recursively and can be executed using: 165 ```bash 166 ./sda-cli list [-config <configuration_file>] 167 ``` 168 It also allows for requesting files/filepaths with a specified prefix using: 169 ```bash 170 ./sda-cli list [-config <configuration_file>] <prefix> 171 ``` 172 This command will return any file/path starting with the defined `<prefix>`. 173 If no config is given by the user, the tool will look for a previous login from the user. 174 175 ## Download 176 177 The SDA/BP archive enables for downloading files and datasets in a secure manner. That can be achieved using the `sda-cli` tool and the process consists of the following two steps 178 179 ### Create keys 180 181 In order to make sure that the files are downloaded from the archive in a secure manner, the user is supposed to create the key pair that the files will be encrypted with. The key pair can be created using the following command: 182 ```bash 183 ./sda-cli createKey <keypair_name> 184 ``` 185 where `<keypair_name>` is the base name of the key files. This command will create two keys named `keypair_name.pub.pem` and `keypair_name.sec.pem`. The public key (`pub`) will be used for the encryption of the files, while the private one (`sec`) will be used in the decryption step below. 186 187 **NOTE:** Make sure to keep these keys safe. Losing the keys could lead to sensitive data leaks. 188 189 ### Download file 190 191 The `sda-cli` tool allows for downloading file(s)/datasets. The URLs of the respective dataset files that are available for downloading are stored in a file named `urls_list.txt`. `sda-cli` allows to download files only by using such a file or the URL where it is stored. There are three different ways to pass the location of the file to the tool, similar to the [dataset size section](#get-dataset-size): 192 1. a direct URL to `urls_list.txt` or a file with a different name but containing the locations of the dataset files 193 2. a URL to a folder containing the `urls_list.txt` file 194 3. the path to a local file containing the locations of the dataset files. 195 196 Given this argument, the whole dataset can be retrieved using the following command: 197 ```bash 198 ./sda-cli download <urls_file> 199 ``` 200 where `urls_file` as described above. 201 The tool also allows for selecting a folder where the files will be downloaded, using the `outdir` argument like: 202 ```bash 203 ./sda-cli download -outdir <outdir> <urls_file> 204 ``` 205 **Note**: If needed, the user can download a selection of files from an available dataset by providing a customized `urls_list.txt` file. 206 207 ## Decrypt file 208 209 Given that the instructions in the [download section](#download) have been followed, the key pair and the data files should be stored in some location. The last step is to decrypt the files in order to access their content. That can be achieved using the following command: 210 ```bash 211 ./sda-cli decrypt -key <keypair_name>.sec.pem <file_to_decrypt> 212 ``` 213 where `<keypair_name>.sec.pem` the private key created in the [relevant section](#create-keys) and `<file_to_decrypt>` one of the files downloaded following the instructions of the [download section](#download-file). 214 215 216 ## Login 217 218 You can login to download the configuration file needed for some of the the tool's operation using the login command: 219 ```bash 220 ./sda-cli login <login_target> 221 ``` 222 where `login_target` is the URL to the `sda-auth` service from the [sensitive-data-archive](https://github.com/neicnordic/sensitive-data-archive/) project. 223 224 This will open a link for the user where they can go and log in. 225 After the login is complete, a configuration file will be created in the tool's directory with the name of `.sda-cli-session` 226 227 ## Version 228 You can get the current version of the sda-cli by running: 229 ```bash 230 ./sda-cli version 231 ``` 232 233 234 # Developers' section 235 This section contains the information required to install, modify and run the `sda-cli` tool. 236 237 ## Requirements 238 The `sda-cli` is written in golang. In order to be able to modify, build and run the tool, golang (>= 1.21) needs to be installed. The instructions for installing `go` can be found [here](https://go.dev/doc/install). 239 240 ## Build tool 241 To build the `sda-cli` tool run the following command from the root folder of the repository 242 ```bash 243 go build 244 ``` 245 This command will create an executable file in the root folder, named `sda-cli`. 246 247 # Create new release 248 249 The github actions include a release workflow that builds binaries for different operating systems. In order to create a new release, create a tag either using the graphical interface or through the command line. That should trigger the creation of a release with the latest code of the specified branch. 250 251 In order for the automatic release to get triggered, the releases should be of the format `vX.X.X`, e.g. `v1.0.0`. 252 253 ## Update releaser 254 255 Before pushing a change to the releaser, make sure to run the check for the configuration file, running: 256 ```sh 257 goreleaser check -f .goreleaser.yaml 258 ```