github.com/sentienttechnologies/studio-go-runner@v0.0.0-20201118202441-6d21f2ced8ee/docs/message_privacy.md

github.com/sentienttechnologies/studio-go-runner@v0.0.0-20201118202441-6d21f2ced8ee/docs/message_privacy.md (about)

     1  # Message Encryption
     2  
     3  This section describes the message encryption, and signing features of the runner.  Message payloads are described in the docs/interface.md file.  Encryption, and signing is only supported within Kubernetes deployments.  The reason for this is that standalone runners cannot be secured and have shared secrets without the isolation provided by Kubernetes.
     4  
     5  Encrypted payloads use a hybrid cryptosystem, [please click for a detailed description](https://en.wikipedia.org/wiki/Hybrid_cryptosystem).
     6  
     7  Message signing uses Ed25519 signing as defined by RFC8032, more information can be found at[https://ed25519.cr.yp.to/](https://ed25519.cr.yp.to/).
     8  
     9  Ed25519 certificate SHA1 fingerprints, not intended to be cryptographicaly secure, will be used by clients to assert identity, confirmed by successful verification.a  Verification still relies on a full public key.
    10  
    11  <!--ts-->
    12  
    13  Table of Contents
    14  =================
    15  
    16  * [Message Encryption](#message-encryption)
    17  * [Table of Contents](#table-of-contents)
    18  * [Introduction](#introduction)
    19  * [Encryption](#encryption)
    20    * [Key creation by the cluster owner](#key-creation-by-the-cluster-owner)
    21  * [Mount secrets into runner deployment](#mount-secrets-into-runner-deployment)
    22    * [Message format](#message-format)
    23  * [Signing](#signing)
    24    * [Signing deployment](#signing-deployment)
    25      * [First time creation](#first-time-creation)
    26      * [Manual insertion](#manual-insertion)
    27      * [Automatted insertion](#automatted-insertion)
    28  * [Python StudioML configuration](#python-studioml-configuration)
    29  <!--te-->
    30  
    31  # Introduction
    32  
    33  This document describes encryption of Request messages sent by StudioML clients to the runner.
    34  
    35  Encryption of messages has two tiers, the first tier is a Public-key scheme that has the runner employ a private key and a public key that is given to experimenters using the python or other client software.
    36  
    37  The concerns to users of the system is to obtain from the computer cluster owner the public key, and only the public key.  The public key can then be made accessible to the client for securing the messages exchanged with the runner compute instances.
    38  
    39  The compute cluster owner will be resposible for generating the public-private key pair and manging the integrity of the private key.  They will also be responsible for distribution of the public key to any experiments, or users of the system.
    40  
    41  The client encrypts a per message secret that is encrypted using the public key, and prepended to a payload that contains the request message encrypted using the secret.
    42  
    43  # Encryption
    44  
    45  ## Key creation by the cluster owner
    46  
    47  The owner of the compute cluster is responsible for the generation of key pair for use with the message encryption.  The following commands show the creation of the key pairs.
    48  
    49  ```
    50  echo -n "PassPhrase" > secret_phrase
    51  ssh-keygen -t rsa -b 4096 -f studioml_message -C "Message Encryption Key" -N "PassPhrase"
    52  ssh-keygen -f studioml_message.pub -e -m PEM > studioml_message.pub.pem
    53  cp studioml_message studioml_message.pem
    54  ssh-keygen -f studioml_message.pem -e -m PEM -p -P "PassPhrase" -N "PassPhrase"
    55  ```
    56  
    57  The private key file and the passphrase should be considered as valuable secrets for your organization that MUST be protected and cared for appropriately.
    58  
    59  Once the keypair has been created they can be loaded into the Kubernetes runner cluster using the following commands:
    60  
    61  ```
    62  kubectl create secret generic studioml-runner-key-secret --from-file=ssh-privatekey=studioml_message.pem --from-file=ssh-publickey=studioml_message.pub.pem
    63  kubectl create secret generic studioml-runner-passphrase-secret --from-file=ssh-passphrase=secret_phrase
    64  ```
    65  
    66  The passphrase is kept in a seperate secret to enable RBAC access to be used to isolate the two pieces of knowledge should your secrets management procedures call for this.
    67  
    68  The public PEM key MUST be the only file delivered to client side users of StudioML in PEM Key file format, for example:
    69  
    70  ```
    71  -----BEGIN RSA PUBLIC KEY-----
    72  MIICCgKCAgEAtZurOEVuT9bhjiUWX7U8EFxL8oMGWSLXf4M6QBsJ5TljtSqyIxvI
    73  kXiQDLIpJXY8KRmiR9RghGopvB5NfAMLZtfwozuju2NtnSn0UPI+6O4ED6TfDP5F
    74  eta/6tUKAuvxVwF5Yvr7en1qnbv4L86vqeukrn/gIPTb7LlsFjt6uHlxA6xTAun/
    75  HfRKlBiWR5rIi/fwuUMmTGpAcCa8s5Gqfla28FfsknGOipy4Vw4Mt7f93ke1dHN+
    76  dY/J2TpCm/GNJuFaHc4EgHE8uw+jU6uBgpZAJSIzK5dxYniEjZS93CWxs2HN8dmV
    77  wEqleT02agWW4cfa13X3Lz1YoQkCjYtSqB8Y2KjT1q7sSll0HExWV58kFPk9FmIy
    78  JniMLcLFzAxGDM5UgtmsdSYmqN49vlqOejxfYxy6GrKXrkRGCDuQKyb2m/WQLXGU
    79  8cGqwuVpN/JNWjiG4+NaxWRzfE2Yk4gbhcYqXRocNMlidG0Sx/xrFTFln86lmGJ1
    80  RCse6jv3beENf5lfrz4ddAzAssjTivmlZgJCTK2oROT3WPI/G6CaBQadt13XkQLW
    81  hAZDbnsZMhOVH3/UiQJ6DwgV0yK5FND4jkbHM3GWGNLRIrnL9F0I8c1p9X2oCx6T
    82  plgCug3iz5cE9+G2455Y1vaVMBEKSm1REhsdTYzPBV/yXPpPR4lUCmkCAwEAAQ==
    83  -----END RSA PUBLIC KEY-----
    84  ```
    85  
    86  A single key pair is used to encrypt all requests on the cluster at this time.  A future feature is envisioned to allow multiple key pairs.
    87  
    88  When the runner is run the secrets are mounted into the container that Kubernetes is managing.  This is done using the deployment yaml.  When performing deployments the yaml should be reviewed for runner pod, and their runner container to ensure that the secrets are available and that they are mounted.  If these secrets are not loaded into the cluster the runner pod should remain in a pending state.
    89  
    90  # Mount secrets into runner deployment
    91  
    92  Secrets used by the runner will be mounted into the runner pod using the Kubernetes deployment pod resource definition.  An example of this is provided within the sample AWS CPU runner that can be found in the [../examples/aws/cpu/deployment.yaml](../examples/aws/cpu/deployment.yaml) file.
    93  
    94  Two mounts will be created firstly for the keyfiles, secondly for the passphrase.  These two are split to allow for RBAC to be employed in the cluster should you want it.  The motivation is that you might want to divide ownership between two parties for the private key and the and avoid revealing one of these to the other.
    95  
    96  If you wish to use encrypted traffic exclusively be sure to remove the ```CLEAR_TEXT_MESSAGES: "true"``` entry from your ConfigMap entries in the yaml.
    97  
    98  In any event the yaml need to mount these secrets appears as follows:
    99  
   100  ```
   101  apiVersion: apps/v1
   102  kind: Deployment
   103  metadata:
   104   name: studioml-go-runner-deployment
   105   labels:
   106     app: studioml-go-runner
   107  spec:
   108   ...
   109   template:
   110     ...
   111     spec:
   112        ...
   113        containers:
   114        - name: studioml-go-runner
   115          ...
   116          volumeMounts:
   117          - name: message-encryption
   118            mountPath: "/runner/certs/message/encryption"
   119            readOnly: true
   120          - name: encryption-passphrase
   121            mountPath: "/runner/certs/message/passphrase"
   122            readOnly: true
   123          - name: queue-signing
   124            mountPath: "/runner/certs/queues/signing"
   125            readOnly: true
   126          ...
   127        volumes:
   128          ...
   129          - name: message-encryption
   130            secret:
   131              optional: false
   132              secretName: studioml-runner-key-secret
   133              items:
   134              - key: ssh-privatekey
   135                path: ssh-privatekey
   136              - key: ssh-publickey
   137                path: ssh-publickey
   138          - name: encryption-passphrase
   139            secret:
   140              optional: false
   141              secretName: studioml-runner-passphrase-secret
   142              items:
   143              - key: ssh-passphrase
   144                path: ssh-passphrase
   145          - name: queue-signing
   146            secret:
   147              optional: false
   148              secretName: studioml-signing
   149  ```
   150  
   151  ## Message format
   152  
   153  The encrypted\_data block contains two comma seperated Base64 strings.  The first string contains a symmetric key that is encrypted using RSA-OAEP with a key length of 4096 bits, and the sha256 hashing algorithm. The second field contains the JSON string for the Request message that is first encrypted using a NaCL SecretBox encryption and then encoded as Base64.
   154  
   155  The encryption works in two steps, first the secretbox based symmetric shared key is generated for every message by the source generating the message.  The data within the messages is encrypted with the symmetric key.  The symmetric key is then encrypted and placed at the front of the message using an asymmetric key.  This has the following effects:
   156  
   157  The sender can decrypt the payload if they retain their original symmetric key.
   158  The sender can not decrypt the symmetric key, once it is placed encrypted into the payload
   159  The legitimate runner if able to access the RSA PEM private key can decrypt the asymmetric key, and only then can subsequently decrypt the Request in the payload.
   160  Evesdropping software cannot decrypt the asymmetricly encrypted secretbox key and so cannot decrypt the rest of the payload.
   161  
   162  # Signing
   163  
   164  Message signing is a way of protecting the runner receiving messages from processing spoofed requests.  To prevent this the runner can be configured to read public key information from Kubernetes secrets and then to use this to validate messages that are being received.  The configuration information for the runner signing keys is detailed in the next section.
   165  
   166  Signing is only supported in Kubernetes deployments.
   167  
   168  The portion of the message that is signed is the Base64 representation of the entire payload field.  The payload field including the base64 string of the key, a comma, and the base64 string of encoded payload proper.
   169  
   170  The format of the signature that is transmitted using the StudioML message signature field consists of the Base64 encoded signature blob, encoded from the binary 64 byte signature.
   171  
   172  Message signing uses Ed25519 signing as defined by RFC8032, more information can be found at[https://ed25519.cr.yp.to/](https://ed25519.cr.yp.to/).
   173  
   174  Ed25519 certificate SHA256 fingerprints, not intended to be cryptographicaly secure, will be used by clients to assert identity, confirmed by successful verification. Verification of messages sent to the runner relies on a public key supplied by the experimenter.  The follow example shows how an experimenter would go about creating a private public key pair suitable for signing:
   175  
   176  ```
   177  ssh-keygen -t ed25519 -f studioml_signing -P ""
   178  ssh-keygen -l -E sha256 -f studioml_signing.pub
   179  256 SHA256:BB+StMfwvv/8Dutb0i1QpdBL171Fg/Fd3ODebi+NX74 kmutch@awsdev (ED25519)
   180  ```
   181  
   182  The finger print can be extracted and sent to the cluster administrator, from the last line of the above output.
   183  
   184  Having generated a key pair the PUBLIC key file should be transmitted to the administrators of any runner compute clusters that will be used.  Along with sending the key the experimenter should decide in conjunction with their community the queue name prefixes they will be assigned to use exclusively. The queue name prefixes should be passed to the administrators with the public key pem file.
   185  
   186  Queue name prefixes should be a minimum of four characters to include the queue technology being used with the underscore, for example 'rmq_', or 'sqs_' to use the public key on all four queues.
   187  
   188  If you send the request via email you might compose something like the following to send:
   189  
   190  ```
   191  Hi,
   192  
   193  I would like to add/replace a signing verification key for any queues on the 54.123.10.5 Rabbit MQ Server for our cluster with the prefix of 'rmq_cpu_andrei_'.
   194  
   195  They public key I wish to use is:
   196  
   197  ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIFITo06Pk8sqCMoMHPaQiQ7BY3pjf7OE8BDcsnYozmIG kmutch@awsdev
   198  
   199  Our fingerprint is:
   200  
   201  SHA256:BB+StMfwvv/8Dutb0i1QpdBL171Fg/Fd3ODebi+NX74
   202  
   203  Thanks,
   204  Andrei
   205  ```
   206  
   207  The above should provide enough information to the administrator to apply your key to the system and reply using email confirming the key has been added.
   208  
   209  Once a message signing public key has been assigned any messages on related queue MUST have a valid signature attached to messages otherwise they will be rejected.
   210  
   211  ## Signing deployment
   212  
   213  Before starting any addition of message signing keys the cluster administrator must check that the request being sent originated from a pre-nominated sender.
   214  
   215  Signing keys can be injected into the compute cluster using Kubernetes secrets.  The runners in a cluster will use a secret in the same namespace called 'studioml-signing' for extracting signing keys.  The addition of new keys is via the addition of data items within the secrets resource via the kubectl apply command. Changes or additions to signing keys are propogated via the mounted resource within the runner pods, see [Mounted Secrets are updated automatically](https://kubernetes.io/docs/concepts/configuration/secret/#mounted-secrets-are-updated-automatically).
   216  
   217  Using the example, above, then a secret data item can be added to the studio signing secrets using a command such as the following example workflow shows:
   218  
   219  ```
   220  $ export KUBECTL_CONFIG=~/.kube/my_cluster.config
   221  $ export KUBECTLCONFIG=~/.kube/my_cluster.config
   222  $ kubectl get secrets
   223  NAME                                TYPE                                  DATA   AGE
   224  default-token-qps8p                 kubernetes.io/service-account-token   3      11s
   225  docker-registry-config              Opaque                                1      11s
   226  release-github-token                Opaque                                1      11s
   227  studioml-runner-key-secret          Opaque                                2      11s
   228  studioml-runner-passphrase-secret   Opaque                                1      11s
   229  studioml-signing                    Opaque                                1      11s
   230  ```
   231  ```
   232  $ kubectl get secrets studioml-signing -o=yaml
   233  apiVersion: v1
   234  data:
   235    info: RHVtbXkgU2VjcmV0IHNvIHJlc291cmNlIHJlbWFpbnMgcHJlc2VudA==
   236  kind: Secret
   237  metadata:
   238    annotations:
   239      kubectl.kubernetes.io/last-applied-configuration: |
   240        {"apiVersion":"v1","data":{"info":"RHVtbXkgU2VjcmV0IHNvIHJlc291cmNlIHJlbWFpbnMgcHJlc2VudA=="},"kind":"Secret","metadata":{"annotations":{},"name":"studioml-signing","namespace":"default"},"type":"Opaque"}
   241    creationTimestamp: "2020-05-15T22:05:26Z"
   242    managedFields:
   243    - apiVersion: v1
   244      fieldsType: FieldsV1
   245      fieldsV1:
   246        f:data:
   247          .: {}
   248          f:info: {}
   249        f:metadata:
   250          f:annotations:
   251            .: {}
   252            f:kubectl.kubernetes.io/last-applied-configuration: {}
   253        f:type: {}
   254      manager: kubectl
   255      operation: Update
   256      time: "2020-05-15T22:05:26Z"
   257    name: studioml-signing
   258    resourceVersion: "790034"
   259    selfLink: /api/v1/namespaces/ci-go-runner-kmutch/secrets/studioml-signing
   260    uid: bc13f78d-199b-4afb-8b3a-31b6ea486c8e
   261  type: Opaque
   262  ```
   263  
   264  This next line will take the public key that was emailed to you and convert it into Base 64 format ready to be inserted into the Kubernetes secret input encoding.
   265  
   266  ```
   267  $ item=`cat << EOF | base64 -w 0
   268  ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIFITo06Pk8sqCMoMHPaQiQ7BY3pjf7OE8BDcsnYozmIG kmutch@awsdev
   269  EOF
   270  `
   271  ```
   272  
   273  ### First time creation
   274  
   275  
   276  The first time the queue secrets are used you must create the Kubernetes resource as the following examples shows.  Also note that when a secret is directly loaded from a file that the data is not Base64 encoded in the input file prior to being read by kubectl.
   277  
   278  ```
   279  tmp_name=`mktemp`
   280  echo -n "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIFITo06Pk8sqCMoMHPaQiQ7BY3pjf7OE8BDcsnYozmIG kmutch@awsdev" > $tmp_name
   281  kubectl create secret generic studioml-signing --from-file=rmq_cpu_andrei_=$tmp_name
   282  rm $tmp_name
   283  ```
   284  
   285  ### Manual insertion
   286  
   287  If you do not have the jq tool installed you will now have to manually edit the secret using the following command:
   288  
   289  ```
   290  $ kubectl edit secrets studioml-signing
   291  ```
   292  
   293  Now manually insert a yaml line after the info: item so that things appear as follows:
   294  
   295  ```
   296    1 # Please edit the object below. Lines beginning with a '#' will be ignored,
   297    2 # and an empty file will abort the edit. If an error occurs while saving this file will be
   298    3 # reopened with the relevant failures.
   299    4 #
   300    5 apiVersion: v1
   301    6 data:
   302    7   info: RHVtbXkgU2VjcmV0IHNvIHJlc291cmNlIHJlbWFpbnMgcHJlc2VudA==
   303    8   rmq_cpu_andrei_: c3NoLWVkMjU1MTkgQUFBQUMzTnphQzFsWkRJMU5URTVBQUFBSUZJVG8wNlBrOHNxQ01vTUhQYVFpUTdCWTNwamY3T0U4QkRjc25Zb3ptSUcga211dGNoQGF3c2Rldgo=
   304    9 kind: Secret
   305   10 metadata:
   306   11   annotations:
   307  ... [redacted] ...
   308  ```
   309  
   310  Now use the ':wq' command to exit the editor and have the secret updated inside the cluster.
   311  
   312  ### Automatted insertion
   313  
   314  Using the jq command the new secret can be inserted into the secret using the following:
   315  
   316  ```
   317  kubectl get secret studioml-signing -o json | jq --arg item= "${item}" '.data["rmq_cpu_andrei_"]=$item' | kubectl apply -f -
   318  ```
   319  
   320  # Python StudioML configuration
   321  
   322  In order to use experiment payload encryption with the Python-based StudioML client,
   323  the StudioML section of experiment configuration must specify
   324  a path to the public key file in PEM format. If a path is not specified,
   325  the experiment payload will be submitted unencrypted, in plain text form.
   326  
   327  If a StudioML configuration is provided as part of the enclosing completion service configuration, in .hocon format, it would include the following (example):
   328  
   329  ```
   330  {
   331     ...
   332     "studio_ml_config": {
   333           ...
   334           "public_key_path": "/home/user/keys/my-key.pub.pem",
   335           ...
   336     }
   337     ...
   338  }
   339  ```
   340  
   341  another possibility is:
   342  
   343  ```
   344  {
   345     ...
   346     "studio_ml_config": {
   347           ...
   348           "public_key_path": ${PUBLIC_KEY_PATH},
   349           ...
   350     }
   351     ...
   352  }
   353  ```
   354  
   355  For the base StudioML configuration, in .yaml format, specifying the public key for encryption would look like:
   356  
   357  ```
   358  public_key_path: /home/user/keys/my-key.pub.pem
   359  ```
   360  
   361  If you wish to use message signing to prove that queue messages you send to the cluster are from a genuine sender then an additional option can be specified, for example:
   362  
   363  ```
   364  {
   365     ...
   366     "studio_ml_config": {
   367           ...
   368           "public_key_path": "/home/user/keys/my-key.pub.pem",
   369           "signing_key_path": "/home/user/keys/studioml_signing",
   370           ...
   371     }
   372     ...
   373  }
   374  ```
   375  
   376  Copyright © 2019-2020 Cognizant Digital Business, Evolutionary AI. All rights reserved. Issued under the Apache 2.0 license.