github.com/sentienttechnologies/studio-go-runner@v0.0.0-20201118202441-6d21f2ced8ee/examples/aws/cpu/README.md (about)

     1  # Quick introduction to using AWS volumes during manual testing and exercises
     2  
     3  This section gives the briefest of overviews for standing up a single CPU runner cluster, with optional encryption support.
     4  
     5  <!--ts-->
     6  <!--te-->
     7  
     8  ## Prerequisties
     9  
    10  ### Configuration
    11  
    12  Have environment variables from the aws\_k8s.md instructions available including :
    13  
    14  ```
    15  AWS_ACCESS_KEY
    16  AWS_SECRET_ACCESS_KEY
    17  ```
    18  You will also need the following additional environment variables with their values set appropriately:
    19  ```
    20  export AWS_ACCOUNT=`aws sts get-caller-identity | jq ".Account" -r`
    21  export AWS_REGION=us-west-2
    22  export EMAIL=karl.mutch@cognizant.com
    23  export AWS_IMAGE=docker.io/leafai/studio-go-runner:0.9.26-master-aaaagninkqg
    24  ```
    25  
    26  ### Software
    27  
    28  Install a template processor based on the Go lang templater used by Kubernetes.
    29  
    30  ```
    31  wget -O stencil https://github.com/karlmutch/duat/releases/download/0.13.0/stencil-linux-amd64
    32  chmod +x stencil
    33  ```
    34  
    35  ## Steps
    36  
    37  1. Start the cluster
    38  
    39  The cluster is started with an EC2 volume that will be mounted by the runner pod.  This works around the issue with the size of the docker image.
    40  
    41  ```
    42  export CLUSTER_NAME=test-eks
    43  eksctl create cluster --name $CLUSTER_NAME --region $AWS_REGION --nodegroup-name $CLUSTER_NAME-workers --node-type t3a.2xlarge --nodes 1 --nodes-min 1 --nodes-max 3 --ssh-access --ssh-public-key ~/.ssh/id_rsa.pub --managed
    44  
    45  export ZONE=`kubectl get nodes -o jsonpath="{.items[0].metadata.labels['failure-domain\.beta\.kubernetes\.io/zone']}"`
    46  export AWS_VOLUME_ID=`aws ec2 create-volume --availability-zone $ZONE --size 60 --volume-type gp2 --output json | jq '.VolumeId' -r`
    47  ```
    48  
    49  2. Ensure that the AWS secrets are loaded for SQS queues
    50  
    51  ```
    52  aws_sqs_cred=`cat ~/.aws/credentials | base64 -w 0`
    53  aws_sqs_config=`cat ~/.aws/config | base64 -w 0`
    54  kubectl apply -f <(cat <<EOF
    55  apiVersion: v1
    56  kind: Secret
    57  metadata:
    58    name: studioml-runner-aws-sqs
    59  type: Opaque
    60  data:
    61    credentials: $aws_sqs_cred
    62    config: $aws_sqs_config
    63  EOF
    64  )
    65  ```
    66  
    67  3. Generate secrets used to encrypt messages
    68  
    69  Further information can in found in the [../../docs/message_privacy.md](../../docs/message_privacy.md) documentation.
    70  
    71  ```
    72  echo -n "PassPhrase" > secret_phrase
    73  ssh-keygen -t rsa -b 4096 -f studioml_message -C "Message Encryption Key" -N "PassPhrase"
    74  ssh-keygen -f studioml_message.pub -e -m PEM > studioml_message.pub.pem
    75  cp studioml_message studioml_message.pem
    76  ssh-keygen -f studioml_message.pem -e -m PEM -p -P "PassPhrase" -N "PassPhrase"
    77  kubectl create secret generic studioml-runner-key-secret --from-file=ssh-privatekey=studioml_message.pem --from-file=ssh-publickey=studioml_message.pub.pem
    78  kubectl create secret generic studioml-runner-passphrase-secret --from-file=ssh-passphrase=secret_phrase
    79  ```
    80  
    81  4. Deploy the runner
    82  
    83  stencil < deployment.yaml | kubectl apply -f -
    84  
    85  5. Run a studioml experiment using the python StudioML client
    86  
    87  ```
    88  aws s3api create-bucket --bucket $USER-cpu-example-metadata --region $AWS_REGION --create-bucket-configuration LocationConstraint=$AWS_REGION
    89  aws s3api create-bucket --bucket $USER-cpu-example-data --region $AWS_REGION --create-bucket-configuration LocationConstraint=$AWS_REGION
    90  
    91  SECRET_CONFIG=`mktemp -p .`
    92  stencil < studioml.config > $SECRET_CONFIG
    93  virtualenv --python=python3.6 ./experiment
    94  source ./experiment/bin/activate
    95  pip install tensorflow==1.15.2
    96  pip install studioml
    97  SUBMIT_LOG=`mktemp -p .`
    98  OUTPUT_LOG=`mktemp -p .`
    99  studio run --config=$SECRET_CONFIG --lifetime=30m --max-duration=20m --gpus 0 --queue=sqs_${USER}_cpu_example  --force-git app.py >$SUBMIT_LOG 2>/dev/null
   100  export EXPERIMENT_ID=`awk 'END {print $NF}' $SUBMIT_LOG`
   101  rm $SUBMIT_LOG
   102  EXIT_STRING="+ exit "
   103  OUTPUT_DIR=`mktemp -d -p .`
   104  for (( ; ; ))
   105      do
   106      sleep 5
   107      aws s3 cp s3://$USER-cpu-example-data/experiments/$EXPERIMENT_ID/output.tar $OUTPUT_DIR/$OUTPUT_LOG.tar 2>/dev/null || continue
   108      tar xvf $OUTPUT_DIR/$OUTPUT_LOG.tar -C $OUTPUT_DIR
   109      LAST_LINE=`tail -n 1 $OUTPUT_DIR/output`
   110      echo $LAST_LINE
   111      [[ $LAST_LINE == ${EXIT_STRING}* ]]; break
   112      rm $OUTPUT_DIR/output || true
   113      rm $OUTPUT_DIR/output.tar || true
   114  done
   115  rm $OUTPUT_DIR/output || true
   116  rm $OUTPUT_DIR/$OUTPUT_LOG.tar || true
   117  rmdir $OUTPUT_DIR
   118  rm $OUTPUT_LOG
   119  deactivate
   120  rm -rf experiment
   121  rm $SECRET_CONFIG
   122  
   123  aws s3 rb s3://$USER-cpu-example-data --force
   124  aws s3 rb s3://$USER-cpu-example-metadata --force
   125  
   126  ```
   127  
   128  6. Clean up
   129  
   130  ```
   131  kubectl delete -f examples/aws/cpu/deployment.yaml
   132  aws ec2 delete-volume --volume-id=$AWS_VOLUME_ID
   133  eksctl delete cluster --region=us-west-2 --name=$CLUSTER_NAME --wait
   134  ```
   135  
   136  Copyright © 2019-2020 Cognizant Digital Business, Evolutionary AI. All rights reserved. Issued under the Apache 2.0 license.