go.ligato.io/vpp-agent/v3@v3.5.0/k8s/tests/README.md

go.ligato.io/vpp-agent/v3@v3.5.0/k8s/tests/README.md (about)

     1  # VPP Agent startup time analysis 
     2  
     3  This is a draft version - tool for time analysis of VPP Agent startup.
     4  
     5  ## Getting Started
     6  
     7  The tool assumes there is installled and running Kubernetes environment
     8  Available scripts are to just there to do it. Bellow example explains
     9  the meaning of output csv values.
    10  
    11  ### Installing
    12  
    13  Steps to make it run:
    14  - Run ./startK8.sh  to instal and init the Kubernetes environment
    15    to test it run: 
    16    kubectl get nodes
    17    you should get  output similar to:
    18    kubectl get nodes
    19    NAME                  STATUS    AGE       VERSION
    20    test-virtualbox       Ready     1d        v1.7.5
    21  
    22  - Run ./download-images.sh  to download the docker images with preinstalled VPP Agent
    23    to test it run:
    24    docker images
    25    you should get the docker image named "ligato/vpp-agent" in the list of local repositiory
    26    
    27    REPOSITORY                                               TAG                 IMAGE ID            CREATED             SIZE
    28    ligato/vpp-agent                                         pantheon-dev        5ae90588c389        4 hours ago         444MB
    29  
    30    
    31  
    32  ## Running the tests
    33  
    34  Once the Kubernetes is running and VPP Agent image is availalbe locally, you can run the script to analyse the times on VPP Agent startup.
    35  - Run ./timemeasure.sh
    36    the script will start two pods, vnf-vpp and vswitch-vpp and loads the topology allowing network traffic.
    37    Then it collects the logs from vswitch-vpp pod .
    38    Then the pod is killed and after restart the logs are collected again.
    39    During test executuion the log files are stored under directory log/.
    40    The logs are proccessed and records the logged timestamps into the log/measuring_exp.csv file. This is in format that can be easily
    41    imported into the Excel for better visual analylis and statistics.
    42  - Alternativelly you can specify number of the kills and restarts by number - as a parameter for ./timemeasure.sh . 
    43    Default value without parameter specified is 1.
    44    Example ./timemeasure.sh 30  - will kill the pod 30 times and collect the logs.
    45    Please see that  log/out.csv is just an intermediate result from which the times are calculated.
    46    The log/measuring_exp.csv is considered the real outputs. 
    47    Finally all the log files and csv files are zipped into the logresult.zip for further investigation and easy archiving
    48    
    49  
    50  ## Topology
    51  
    52  There is possibility to load prepared topology into the etcd before the profiling is started
    53  There are few scripts topology*.sh that do the loading and are called inside the timemeasure.sh script
    54  It is easy to prepare your own topology according these scripts and modify function  setup() in timemeasure.sh
    55  to load your own topology.
    56  
    57  ## Process to kill
    58  
    59  The mechanism of restart the container is based on this approach:
    60  supervisord is the main process, once it is killed, the container 
    61  is getting restarted. Tehre is a eventlistener in supervisord conf file
    62  defined. Its task is to listen on the vpp and vpp-agent processed.
    63  If eighter of them is killed supervisod is killed to and container is restarted.
    64  To be able to choose which process to kill, modify the timemeasure.sh script
    65  variable kill_proc. If kill_proc=0 - VPP process is to be killed
    66  if kill_proc=1  VPP-Agent is to be killed
    67  
    68  
    69  ## Vswitch.
    70  There is also possibility to profile VPP acting as vswitch. For that purpose
    71  a yaml file vswitch-vpp.yaml is available. To make it used in script find this two lines 
    72  and use comments to choose proper yaml file .
    73  Example for using vswitch yaml file
    74  #kubectl apply -f vpp.yaml
    75  kubectl apply -f vswitch-vpp.yaml
    76  You must provide correct MAC ADDR in yaml file by editing the dev value.
    77      dpdk {
    78        dev 0000:00:08.0
    79        uio-driver igb_uio
    80      }
    81  Value 0000:00:08.0 is just an example here.
    82  
    83  ### Output example
    84  
    85  This is example of output that is in the measuring_exp.csv file.
    86  
    87  #record,#run,step,timeline,relative time,relative to #record,duration(ms)
    88  1,1,Measuring started,11:55:19.92454,00:00:00.0,0
    89  2,1,Starting Agent,11:55:24.85929,00:00:4.93475,1
    90  3,1,ETCD connected,11:55:24.86087,00:00:0.00158,2,0.5197990000
    91  4,1,Kafka connected 11:55:24.85929,00:00:0.03132,2,2,
    92  5,1,VPP connected,,-11:-55:-24.85929,2,
    93  6,1,Resync done,11:55:26.90698,00:00:2.04769,2,9.9616360000
    94  7,1,Agent ready,11:55:27.03256,00:00:2.17327,2
    95  8,2,Container Killed,11:55:39.50288,19.57834,2
    96  9,2,Starting Agent,11:55:43.61909,00:00:4.11621,8
    97  10,2,ETCD connected,11:55:43.62086,00:00:0.00177,9,0.7439120000
    98  11,2,Kafka connected,0,-11:-55:-43.61909,9,0
    99  12,2,VPP connected,0,-11:-55:-43.61909,9,0
   100  13,2,Resync done,11:55:44.39751,00:00:0.77842,9,3.1623150000
   101  14,2,Agent ready,11:55:44.51965,00:00:0.90056,9
   102  15,3,Container Killed,11:56:2.01177,22.50889,9
   103  
   104  Explanation:
   105  #record             - the incrementing value 
   106  run                 -  shows which run the item belongs to
   107                      In the example above items 1-7 belong to first run                   
   108  step                -  description of the event 
   109  timeline            - shows the timestamps
   110  relative time       - calculated value from two timestamps ( actual one and timestamp specified in next column
   111                      for example 6,1,Resync done,11:55:26.90698,00:00:2.04769,2,9.9616360000 on this line
   112                      value 00:00:2.04769 is calculated from actual timestamp (11:55:26.90698) minus timestamp on line 2 ( relative time #record is 2)
   113                      here it is 11:55:24.85929, result  is  11:55:26.90698 - 11:55:24.85929 =  00:00:2.04769
   114  relative to #record - specifies #record which is relative time calculated to
   115  duration(ms)        - some logs provide the internal measurement value
   116  
   117  There are situations when the set of logs after the pod was killed is not available due to K8 issues
   118  If it is a case, a record with ",Kill failed" is provided.
   119