github.com/NBISweden/sda-cli@v0.1.2-0.20240506070033-4c8af88918df/README.md (about)

     1  SDA-CLI
     2  =======
     3  
     4  This is the Sensitive Data Archive (SDA) Command Line Interface (sda-cli). This
     5  tool was created to unify and simplify the tools needed to perform the most
     6  common user actions in the SDA.
     7  
     8  This tool can be used to encrypt and upload data when submitting to the archive,
     9  and to download and decrypt with retrieving data from the archive.
    10  
    11  It is recommended to use precompiled executables for `sda-cli` which can be found at https://github.com/NBISweden/sda-cli/releases
    12  
    13  To get help on the usage of the tool, please use the following command
    14  ```bash
    15  ./sda-cli help
    16  ```
    17  
    18  # Usage
    19  
    20  The main functionalities implemented in this tool are explained in the following sections.
    21  
    22  If any command seems a bit puzzling, you can find guidance on how to interpret it right [here](https://ftpdocs.broadcom.com/cadocs/0/CA%20ARCserve%20%20Backup%2015-ENU/Bookshelf_Files/HTML/CMD_Ref/command_line_syntax_characters.htm) and [here](https://medium.com/@jaewei.j.w/how-to-read-man-page-synopsis-3408e7fd0e42).
    23  
    24  ## Encrypt
    25  
    26  The files stored in the SDA/BP archive are encrypted using the [crypt4gh standard](https://www.ga4gh.org/news/crypt4gh-a-secure-method-for-sharing-human-genetic-data/). The following sections explain how to encrypt and upload files to the archive.
    27  
    28  ### Download the crypt4gh public key
    29  
    30  The files that are uploaded to the SDA/BP services, need to be encrypted with the correct public key. Depending on the service you want to use, this key can be downloaded using this command **if you are uploading to the SDA**:
    31  ```bash
    32  wget https://raw.githubusercontent.com/NBISweden/EGA-SE-user-docs/main/crypt4gh_key.pub
    33  ```
    34  or this command **if you are uploading to Big Picture**:
    35  ```bash
    36  wget https://raw.githubusercontent.com/NBISweden/EGA-SE-user-docs/main/crypt4gh_bp_key.pub
    37  ```
    38  
    39  ### Encrypt file(s)
    40  
    41  Now that the public key is downloaded, the file(s) can be encrypted using the binary file created in the first step of this guide. To encrypt a specific file, use the following command:
    42  ```bash
    43  ./sda-cli encrypt [-key <public_key>] <file_to_encrypt>
    44  ```
    45  where `<public_key>` the key downloaded in the previous step. The tool also allows for encrypting multiple files at once, by listing them separated with space like:
    46  ```bash
    47  ./sda-cli encrypt -key <public_key> <file_1_to_encrypt> <file_2_to_encrypt> <file_3_to_encrypt>
    48  ```
    49  This command comes with the `-continue` option, which will continue encrypting files, even if one of them fails. To enable this feature, the command should be executed with the `-continue=true` option.
    50  If no public key is provided, the tool will look for it from a previous login session.
    51  
    52  ### Encrypt file(s) with multiple keys
    53  
    54  To encrypt files with more than one public keys, repeatedly use the `-key` flag, e.g.
    55  ```bash
    56  ./sda-cli encrypt -key <public_key1> -key <public_key2> <file_to_encrypt>
    57  ```
    58  will encrypt a file using two keys so that it can be decrypted with either of the corresponding private keys. Encryption with more than two keys is possible, as well. Another option is to provide as argument to `-key` a file with concatenated public keys generated e.g. from a command like
    59  ```bash
    60  cat <pub_key1> <pub_key2> > <concatenated_pub_keys>
    61  ```
    62  Passing a combination of the above arguments is allowed, as well:
    63  ```bash
    64  ./sda-cli encrypt -key <concatenated_public_keys> -key <public_key3> <file_to_encrypt>
    65  ```
    66  
    67  **Note**: The `encrypt` command will create four files containing hashes (both md5 and sha256) for the encrypted and unencrypted files, respectively.
    68  
    69  **Developers' Notes:** The tool is creating a key pair when encrypting the files. This key pair is temporary for security reasons.
    70  
    71  
    72  ## Upload
    73  
    74  ### Download the configuration file
    75  
    76  Once your files are encrypted, they are ready to be submitted to the SDA/BP archive. The s3 storage requires users to be authenticated, therefore a configuration file needs to be downloaded before starting the uploading of the files.
    77  
    78  The configuration file can be downloaded by logging in with a Life Science RI account:
    79  
    80  * For BigPicture use https://login.bp.nbis.se/
    81  
    82  Follow the dialogue to get authenticated and then click on `Download inbox s3cmd credentials` to download the configuration file named s3cmd.conf. The configuration file should be placed in the root folder of the repository.
    83  
    84  Alternatively, you can download the configuration file using the [login command](#Login).
    85  
    86  ### Upload file(s)
    87  
    88  Now that the configuration file is downloaded, the file(s) can be uploaded to the archive using the binary file created in the first step of this guide. To upload a specific file, use the following command:
    89  ```bash
    90  ./sda-cli upload -config <configuration_file> <encrypted_file_to_upload>
    91  ```
    92  where `<configuration_file>` the file downloaded in the previous step and `<encrypted_file_to_upload>` a file encrypted in the earlier steps. The tool also allows for uploading multiple files at once, by listing them separated with space like:
    93  ```bash
    94  ./sda-cli upload -config <configuration_file> <encrypted_file_1_to_upload> <encrypted_file_2_to_upload>
    95  ```
    96  Note that the files will be uploaded in the base folder of the user.
    97  
    98  ### Upload folder(s)
    99  
   100  One can also upload entire directories recursively, i.e. including all contained files and folders while keeping the local folder structure. This can be achieved with the `-r` flag, e.g. running:
   101  ```bash
   102  ./sda-cli upload -config <configuration_file> -r <folder_to_upload>
   103  ```
   104  will upload `<folder_to_upload>` as is, i.e. with the same inner folder and file structure as the local one, to the archive.
   105  
   106  It is also possible to specify multiple directories and files for upload with the same command. For example,
   107  ```bash
   108  ./sda-cli upload -config <configuration_file> -r <encrypted_file_1_to_upload> <encrypted_file_2_to_upload> <folder_1_to_upload> <folder_2_to_upload>
   109  ```
   110  However, if `-r` is omitted in the above, any folders will be skipped during upload.
   111  
   112  ### Upload to a different path
   113  
   114  The user can specify a different path for uploading files/folders with the `-targetDir` flag followed by the name of the folder. For example, the command:
   115  ```bash
   116  ./sda-cli upload -config <configuration_file> -r <encrypted_file_1_to_upload> <folder_1_to_upload> -targetDir <upload_folder>
   117  ```
   118  will create `<upload_folder>` under the user's base folder with  contents `<upload_folder>/<encrypted_file_1_to_upload>` and `<upload_folder>/<folder_1_to_upload>`. Note that the given `<upload_folder>` may well be a folder path, e.g. `<folder1/folder2>`, and in this case `<encrypted_file_1_to_upload>` will be uploaded to `folder1/folder2/<encrypted_file_1_to_upload>`.
   119  
   120  As a side note it is possible to include all the contents of a directory with `/.`, for example,
   121  ```bash
   122  ./sda-cli upload -config <configuration_file> -r <folder_to_upload>/. -targetDir <new_folder_name>
   123  ```
   124  will upload all contents of `<folder_to_upload>` to `<new_folder_name>` recursively, effectively renaming `<folder_to_upload>` upon upload to the archive.
   125  
   126  ### Encrypt on upload
   127  
   128  It is possible to combine the encryption and upload steps into with the use of the flag `--encrypt-with-key` followed by the path of the crypt4gh public key to be used for encryption. In this case, the input list of file arguments can only contain *unencrypted* source files. For example the following,
   129  ```bash
   130  ./sda-cli upload -config <configuration_file> --encrypt-with-key <public_key> <unencrypted_file_to_upload>
   131  ```
   132  will encrypt `<unencrypted_file_to_upload>` using `<public_key>` as public key and upload the created `<file_to_upload.c4gh>`  in the base folder of the user.
   133  
   134  Encrypt on upload can be combined with any of the flags above. For example,
   135  ```bash
   136  ./sda-cli upload -config <configuration_file> --encrypt-with-key <public_key> -r <folder_to_upload_with_unencrypted_data> -targetDir <new_folder_name>
   137  ```
   138  will first encrypt all files in `<folder_to_upload_with_unencrypted_data>` and then upload the folder recursively (selecting only the created `c4gh` files) under `<new_folder_name>`.
   139  
   140  **Notes**: The tool calls the [encrypt](#Encrypt) module internally, therefore similar behavior to that command is expected, including the creation of hash files. In addition,
   141  
   142  - For encryption with [multiple public keys](#Encrypt-file(s)-with-multiple-keys), concatenate all public keys into one file and pass it as the argument to `encrypt-with-key`.
   143  - If the input includes encrypted files, the tool will exit without performing further tasks.
   144  - The encrypted files will be created next to their unencrypted counterparts.
   145  - The tool will not overwrite existing encrypted files. It will exit early if encrypted counterparts of the source files already exist with the same source path.
   146  - If the flag `--force-overwrite` is used, the tool will overwrite any already existing file.
   147  - The cli will exit if the input has any un-encrypred files. To override that, use the flag `--force-unencrypted`.
   148  
   149  ## Get dataset size
   150  
   151  Before downloading a dataset or a specific file, the `sda-cli` tool allows for requesting the size of each file, as well as the whole dataset. In order to use this functionality, the tool expects as an argument a file containing the location of the files in the dataset. The argument can be one of the following:
   152  1. a URL to the file containing the locations of the dataset files
   153  2. a URL to a folder containing the `urls_list.txt` file with the locations of the dataset files
   154  3. the path to a local file containing the locations of the dataset files.
   155  
   156  Given this argument, the dataset size can be retrieved using the following command:
   157  ```bash
   158  ./sda-cli datasetsize <urls_file>
   159  ```
   160  where `urls_file` as described above.
   161  
   162  ## List files
   163  
   164  The uploaded files can be listed using the `list` parameter. This feature returns all the files in the user's bucket recursively and can be executed using:
   165  ```bash
   166  ./sda-cli list [-config <configuration_file>]
   167  ```
   168   It also allows for requesting files/filepaths with a specified prefix using:
   169   ```bash
   170  ./sda-cli list [-config <configuration_file>] <prefix>
   171  ```
   172  This command will return any file/path starting with the defined `<prefix>`.
   173  If no config is given by the user, the tool will look for a previous login from the user.
   174  
   175  ## Download
   176  
   177  The SDA/BP archive enables for downloading files and datasets in a secure manner. That can be achieved using the `sda-cli` tool and the process consists of the following two steps
   178  
   179  ### Create keys
   180  
   181  In order to make sure that the files are downloaded from the archive in a secure manner, the user is supposed to create the key pair that the files will be encrypted with. The key pair can be created using the following command:
   182  ```bash
   183  ./sda-cli createKey <keypair_name>
   184  ```
   185  where `<keypair_name>` is the base name of the key files. This command will create two keys named `keypair_name.pub.pem` and `keypair_name.sec.pem`. The public key (`pub`) will be used for the encryption of the files, while the private one (`sec`) will be used in the decryption step below.
   186  
   187  **NOTE:** Make sure to keep these keys safe. Losing the keys could lead to sensitive data leaks.
   188  
   189  ### Download file
   190  
   191  The `sda-cli` tool allows for downloading file(s)/datasets. The URLs of the respective dataset files that are available for downloading are stored in a file named `urls_list.txt`. `sda-cli` allows to download files only by using such a file or the URL where it is stored. There are three different ways to pass the location of the file to the tool, similar to the [dataset size section](#get-dataset-size):
   192  1. a direct URL to `urls_list.txt` or a file with a different name but containing the locations of the dataset files
   193  2. a URL to a folder containing the `urls_list.txt` file
   194  3. the path to a local file containing the locations of the dataset files.
   195  
   196  Given this argument, the whole dataset can be retrieved using the following command:
   197  ```bash
   198  ./sda-cli download <urls_file>
   199  ```
   200  where `urls_file` as described above.
   201  The tool also allows for selecting a folder where the files will be downloaded, using the `outdir` argument like:
   202  ```bash
   203  ./sda-cli download -outdir <outdir> <urls_file>
   204  ```
   205  **Note**: If needed, the user can download a selection of files from an available dataset by providing a customized `urls_list.txt` file.
   206  
   207  ## Decrypt file
   208  
   209  Given that the instructions in the [download section](#download) have been followed, the key pair and the data files should be stored in some location. The last step is to decrypt the files in order to access their content. That can be achieved using the following command:
   210  ```bash
   211  ./sda-cli decrypt -key <keypair_name>.sec.pem <file_to_decrypt>
   212  ```
   213  where `<keypair_name>.sec.pem` the private key created in the [relevant section](#create-keys) and `<file_to_decrypt>` one of the files downloaded following the instructions of the [download section](#download-file).
   214  
   215  
   216  ## Login
   217  
   218  You can login to download the configuration file needed for some of the the tool's operation using the login command:
   219  ```bash
   220  ./sda-cli login <login_target>
   221  ```
   222  where `login_target` is the URL to the `sda-auth` service from the [sensitive-data-archive](https://github.com/neicnordic/sensitive-data-archive/) project.
   223  
   224  This will open a link for the user where they can go and log in.
   225  After the login is complete, a configuration file will be created in the tool's directory with the name of `.sda-cli-session`
   226  
   227  ## Version
   228  You can get the current version of the sda-cli by running:
   229  ```bash
   230  ./sda-cli version
   231  ```
   232  
   233  
   234  # Developers' section
   235  This section contains the information required to install, modify and run the `sda-cli` tool.
   236  
   237  ## Requirements
   238  The `sda-cli` is written in golang. In order to be able to modify, build and run the tool, golang (>= 1.21) needs to be installed. The instructions for installing `go` can be found [here](https://go.dev/doc/install).
   239  
   240  ## Build tool
   241  To build the `sda-cli` tool run the following command from the root folder of the repository
   242  ```bash
   243  go build
   244  ```
   245  This command will create an executable file in the root folder, named `sda-cli`.
   246  
   247  # Create new release
   248  
   249  The github actions include a release workflow that builds binaries for different operating systems. In order to create a new release, create a tag either using the graphical interface or through the command line. That should trigger the creation of a release with the latest code of the specified branch.
   250  
   251  In order for the automatic release to get triggered, the releases should be of the format `vX.X.X`, e.g. `v1.0.0`.
   252  
   253  ## Update releaser
   254  
   255  Before pushing a change to the releaser, make sure to run the check for the configuration file, running:
   256  ```sh
   257  goreleaser check -f .goreleaser.yaml
   258  ```