github.com/sentienttechnologies/studio-go-runner@v0.0.0-20201118202441-6d21f2ced8ee/examples/aws/cpu/README.md (about) 1 # Quick introduction to using AWS volumes during manual testing and exercises 2 3 This section gives the briefest of overviews for standing up a single CPU runner cluster, with optional encryption support. 4 5 <!--ts--> 6 <!--te--> 7 8 ## Prerequisties 9 10 ### Configuration 11 12 Have environment variables from the aws\_k8s.md instructions available including : 13 14 ``` 15 AWS_ACCESS_KEY 16 AWS_SECRET_ACCESS_KEY 17 ``` 18 You will also need the following additional environment variables with their values set appropriately: 19 ``` 20 export AWS_ACCOUNT=`aws sts get-caller-identity | jq ".Account" -r` 21 export AWS_REGION=us-west-2 22 export EMAIL=karl.mutch@cognizant.com 23 export AWS_IMAGE=docker.io/leafai/studio-go-runner:0.9.26-master-aaaagninkqg 24 ``` 25 26 ### Software 27 28 Install a template processor based on the Go lang templater used by Kubernetes. 29 30 ``` 31 wget -O stencil https://github.com/karlmutch/duat/releases/download/0.13.0/stencil-linux-amd64 32 chmod +x stencil 33 ``` 34 35 ## Steps 36 37 1. Start the cluster 38 39 The cluster is started with an EC2 volume that will be mounted by the runner pod. This works around the issue with the size of the docker image. 40 41 ``` 42 export CLUSTER_NAME=test-eks 43 eksctl create cluster --name $CLUSTER_NAME --region $AWS_REGION --nodegroup-name $CLUSTER_NAME-workers --node-type t3a.2xlarge --nodes 1 --nodes-min 1 --nodes-max 3 --ssh-access --ssh-public-key ~/.ssh/id_rsa.pub --managed 44 45 export ZONE=`kubectl get nodes -o jsonpath="{.items[0].metadata.labels['failure-domain\.beta\.kubernetes\.io/zone']}"` 46 export AWS_VOLUME_ID=`aws ec2 create-volume --availability-zone $ZONE --size 60 --volume-type gp2 --output json | jq '.VolumeId' -r` 47 ``` 48 49 2. Ensure that the AWS secrets are loaded for SQS queues 50 51 ``` 52 aws_sqs_cred=`cat ~/.aws/credentials | base64 -w 0` 53 aws_sqs_config=`cat ~/.aws/config | base64 -w 0` 54 kubectl apply -f <(cat <<EOF 55 apiVersion: v1 56 kind: Secret 57 metadata: 58 name: studioml-runner-aws-sqs 59 type: Opaque 60 data: 61 credentials: $aws_sqs_cred 62 config: $aws_sqs_config 63 EOF 64 ) 65 ``` 66 67 3. Generate secrets used to encrypt messages 68 69 Further information can in found in the [../../docs/message_privacy.md](../../docs/message_privacy.md) documentation. 70 71 ``` 72 echo -n "PassPhrase" > secret_phrase 73 ssh-keygen -t rsa -b 4096 -f studioml_message -C "Message Encryption Key" -N "PassPhrase" 74 ssh-keygen -f studioml_message.pub -e -m PEM > studioml_message.pub.pem 75 cp studioml_message studioml_message.pem 76 ssh-keygen -f studioml_message.pem -e -m PEM -p -P "PassPhrase" -N "PassPhrase" 77 kubectl create secret generic studioml-runner-key-secret --from-file=ssh-privatekey=studioml_message.pem --from-file=ssh-publickey=studioml_message.pub.pem 78 kubectl create secret generic studioml-runner-passphrase-secret --from-file=ssh-passphrase=secret_phrase 79 ``` 80 81 4. Deploy the runner 82 83 stencil < deployment.yaml | kubectl apply -f - 84 85 5. Run a studioml experiment using the python StudioML client 86 87 ``` 88 aws s3api create-bucket --bucket $USER-cpu-example-metadata --region $AWS_REGION --create-bucket-configuration LocationConstraint=$AWS_REGION 89 aws s3api create-bucket --bucket $USER-cpu-example-data --region $AWS_REGION --create-bucket-configuration LocationConstraint=$AWS_REGION 90 91 SECRET_CONFIG=`mktemp -p .` 92 stencil < studioml.config > $SECRET_CONFIG 93 virtualenv --python=python3.6 ./experiment 94 source ./experiment/bin/activate 95 pip install tensorflow==1.15.2 96 pip install studioml 97 SUBMIT_LOG=`mktemp -p .` 98 OUTPUT_LOG=`mktemp -p .` 99 studio run --config=$SECRET_CONFIG --lifetime=30m --max-duration=20m --gpus 0 --queue=sqs_${USER}_cpu_example --force-git app.py >$SUBMIT_LOG 2>/dev/null 100 export EXPERIMENT_ID=`awk 'END {print $NF}' $SUBMIT_LOG` 101 rm $SUBMIT_LOG 102 EXIT_STRING="+ exit " 103 OUTPUT_DIR=`mktemp -d -p .` 104 for (( ; ; )) 105 do 106 sleep 5 107 aws s3 cp s3://$USER-cpu-example-data/experiments/$EXPERIMENT_ID/output.tar $OUTPUT_DIR/$OUTPUT_LOG.tar 2>/dev/null || continue 108 tar xvf $OUTPUT_DIR/$OUTPUT_LOG.tar -C $OUTPUT_DIR 109 LAST_LINE=`tail -n 1 $OUTPUT_DIR/output` 110 echo $LAST_LINE 111 [[ $LAST_LINE == ${EXIT_STRING}* ]]; break 112 rm $OUTPUT_DIR/output || true 113 rm $OUTPUT_DIR/output.tar || true 114 done 115 rm $OUTPUT_DIR/output || true 116 rm $OUTPUT_DIR/$OUTPUT_LOG.tar || true 117 rmdir $OUTPUT_DIR 118 rm $OUTPUT_LOG 119 deactivate 120 rm -rf experiment 121 rm $SECRET_CONFIG 122 123 aws s3 rb s3://$USER-cpu-example-data --force 124 aws s3 rb s3://$USER-cpu-example-metadata --force 125 126 ``` 127 128 6. Clean up 129 130 ``` 131 kubectl delete -f examples/aws/cpu/deployment.yaml 132 aws ec2 delete-volume --volume-id=$AWS_VOLUME_ID 133 eksctl delete cluster --region=us-west-2 --name=$CLUSTER_NAME --wait 134 ``` 135 136 Copyright © 2019-2020 Cognizant Digital Business, Evolutionary AI. All rights reserved. Issued under the Apache 2.0 license.