go.ligato.io/vpp-agent/v3@v3.5.0/k8s/tests/README.md (about) 1 # VPP Agent startup time analysis 2 3 This is a draft version - tool for time analysis of VPP Agent startup. 4 5 ## Getting Started 6 7 The tool assumes there is installled and running Kubernetes environment 8 Available scripts are to just there to do it. Bellow example explains 9 the meaning of output csv values. 10 11 ### Installing 12 13 Steps to make it run: 14 - Run ./startK8.sh to instal and init the Kubernetes environment 15 to test it run: 16 kubectl get nodes 17 you should get output similar to: 18 kubectl get nodes 19 NAME STATUS AGE VERSION 20 test-virtualbox Ready 1d v1.7.5 21 22 - Run ./download-images.sh to download the docker images with preinstalled VPP Agent 23 to test it run: 24 docker images 25 you should get the docker image named "ligato/vpp-agent" in the list of local repositiory 26 27 REPOSITORY TAG IMAGE ID CREATED SIZE 28 ligato/vpp-agent pantheon-dev 5ae90588c389 4 hours ago 444MB 29 30 31 32 ## Running the tests 33 34 Once the Kubernetes is running and VPP Agent image is availalbe locally, you can run the script to analyse the times on VPP Agent startup. 35 - Run ./timemeasure.sh 36 the script will start two pods, vnf-vpp and vswitch-vpp and loads the topology allowing network traffic. 37 Then it collects the logs from vswitch-vpp pod . 38 Then the pod is killed and after restart the logs are collected again. 39 During test executuion the log files are stored under directory log/. 40 The logs are proccessed and records the logged timestamps into the log/measuring_exp.csv file. This is in format that can be easily 41 imported into the Excel for better visual analylis and statistics. 42 - Alternativelly you can specify number of the kills and restarts by number - as a parameter for ./timemeasure.sh . 43 Default value without parameter specified is 1. 44 Example ./timemeasure.sh 30 - will kill the pod 30 times and collect the logs. 45 Please see that log/out.csv is just an intermediate result from which the times are calculated. 46 The log/measuring_exp.csv is considered the real outputs. 47 Finally all the log files and csv files are zipped into the logresult.zip for further investigation and easy archiving 48 49 50 ## Topology 51 52 There is possibility to load prepared topology into the etcd before the profiling is started 53 There are few scripts topology*.sh that do the loading and are called inside the timemeasure.sh script 54 It is easy to prepare your own topology according these scripts and modify function setup() in timemeasure.sh 55 to load your own topology. 56 57 ## Process to kill 58 59 The mechanism of restart the container is based on this approach: 60 supervisord is the main process, once it is killed, the container 61 is getting restarted. Tehre is a eventlistener in supervisord conf file 62 defined. Its task is to listen on the vpp and vpp-agent processed. 63 If eighter of them is killed supervisod is killed to and container is restarted. 64 To be able to choose which process to kill, modify the timemeasure.sh script 65 variable kill_proc. If kill_proc=0 - VPP process is to be killed 66 if kill_proc=1 VPP-Agent is to be killed 67 68 69 ## Vswitch. 70 There is also possibility to profile VPP acting as vswitch. For that purpose 71 a yaml file vswitch-vpp.yaml is available. To make it used in script find this two lines 72 and use comments to choose proper yaml file . 73 Example for using vswitch yaml file 74 #kubectl apply -f vpp.yaml 75 kubectl apply -f vswitch-vpp.yaml 76 You must provide correct MAC ADDR in yaml file by editing the dev value. 77 dpdk { 78 dev 0000:00:08.0 79 uio-driver igb_uio 80 } 81 Value 0000:00:08.0 is just an example here. 82 83 ### Output example 84 85 This is example of output that is in the measuring_exp.csv file. 86 87 #record,#run,step,timeline,relative time,relative to #record,duration(ms) 88 1,1,Measuring started,11:55:19.92454,00:00:00.0,0 89 2,1,Starting Agent,11:55:24.85929,00:00:4.93475,1 90 3,1,ETCD connected,11:55:24.86087,00:00:0.00158,2,0.5197990000 91 4,1,Kafka connected 11:55:24.85929,00:00:0.03132,2,2, 92 5,1,VPP connected,,-11:-55:-24.85929,2, 93 6,1,Resync done,11:55:26.90698,00:00:2.04769,2,9.9616360000 94 7,1,Agent ready,11:55:27.03256,00:00:2.17327,2 95 8,2,Container Killed,11:55:39.50288,19.57834,2 96 9,2,Starting Agent,11:55:43.61909,00:00:4.11621,8 97 10,2,ETCD connected,11:55:43.62086,00:00:0.00177,9,0.7439120000 98 11,2,Kafka connected,0,-11:-55:-43.61909,9,0 99 12,2,VPP connected,0,-11:-55:-43.61909,9,0 100 13,2,Resync done,11:55:44.39751,00:00:0.77842,9,3.1623150000 101 14,2,Agent ready,11:55:44.51965,00:00:0.90056,9 102 15,3,Container Killed,11:56:2.01177,22.50889,9 103 104 Explanation: 105 #record - the incrementing value 106 run - shows which run the item belongs to 107 In the example above items 1-7 belong to first run 108 step - description of the event 109 timeline - shows the timestamps 110 relative time - calculated value from two timestamps ( actual one and timestamp specified in next column 111 for example 6,1,Resync done,11:55:26.90698,00:00:2.04769,2,9.9616360000 on this line 112 value 00:00:2.04769 is calculated from actual timestamp (11:55:26.90698) minus timestamp on line 2 ( relative time #record is 2) 113 here it is 11:55:24.85929, result is 11:55:26.90698 - 11:55:24.85929 = 00:00:2.04769 114 relative to #record - specifies #record which is relative time calculated to 115 duration(ms) - some logs provide the internal measurement value 116 117 There are situations when the set of logs after the pod was killed is not available due to K8 issues 118 If it is a case, a record with ",Kill failed" is provided. 119