github.com/inspektor-gadget/inspektor-gadget@v0.28.1/docs/builtin-gadgets/profile/block-io.md (about)

     1  ---
     2  title: 'Using profile block-io'
     3  weight: 20
     4  description: >
     5    Analyze block I/O performance through a latency distribution.
     6  ---
     7  
     8  The profile block-io gadget gathers information about the usage of the
     9  block device I/O (disk I/O), generating a histogram distribution of I/O
    10  latency (time), when the gadget is stopped.
    11  
    12  Notice that the latency of the disk I/O is measured from when the call is
    13  issued to the device until its completion, it does not include time spent
    14  in the kernel queue. This means that the histogram reflects only the
    15  performance of the device and not the effective latency suffered by the
    16  applications.
    17  
    18  The histogram shows the number of I/O operations (`count` column) that lie in
    19  the latency range `interval-start` -> `interval-end` (`usecs` column), which,
    20  as the columns name indicates, is given in microseconds.
    21  
    22  For this guide, we will use [the `stress` tool](https://man.archlinux.org/man/stress.1) that allows
    23  us to load and stress the system in many different ways. In particular, we will use the `--io` flag
    24  that will generate a given number of workers to spin on the [sync()
    25  syscall](https://man7.org/linux/man-pages/man2/sync.2.html). In this way, we will generate disk I/O
    26  that we will analyse using the biolatency gadget.
    27  
    28  For further details, please refer to [the BCC
    29  documentation](https://github.com/iovisor/bcc/blob/master/tools/biolatency_example.txt).
    30  
    31  ### On Kubernetes
    32  
    33  Firstly, let's use the profile block-io gadget to see the I/O latency in our
    34  testing node with its normal load work:
    35  
    36  ```bash
    37  # Run the gadget on the worker-node node
    38  $ kubectl gadget profile block-io --node worker-node
    39  Tracing block device I/O... Hit Ctrl-C to end
    40  
    41  # Wait for around 1 minute and hit Ctrl+C to stop the gadget and see the results
    42  ^C
    43  
    44       usecs               : count     distribution
    45           0 -> 1          : 0        |                                        |
    46           2 -> 3          : 0        |                                        |
    47           4 -> 7          : 0        |                                        |
    48           8 -> 15         : 0        |                                        |
    49          16 -> 31         : 0        |                                        |
    50          32 -> 63         : 17       |*                                       |
    51          64 -> 127        : 261      |*******************                     |
    52         128 -> 255        : 546      |****************************************|
    53         256 -> 511        : 426      |*******************************         |
    54         512 -> 1023       : 227      |****************                        |
    55        1024 -> 2047       : 18       |*                                       |
    56        2048 -> 4095       : 8        |                                        |
    57        4096 -> 8191       : 23       |*                                       |
    58        8192 -> 16383      : 15       |*                                       |
    59       16384 -> 32767      : 2        |                                        |
    60       32768 -> 65535      : 1        |                                        |
    61  ```
    62  
    63  This output shows that the bulk of the I/O was between 64 and 1023 us, and
    64  that there were 1544 I/O operations during the time the gadget was running.
    65  Notice that we waited for 1 minute but longer time would produce more
    66  stable results.
    67  
    68  Now, let's increase the I/O operations using the stress tool:
    69  
    70  ```bash
    71  # Start by creating our testing namespace
    72  $ kubectl create ns test-biolatency
    73  
    74  # Run stress with 1 worker that will generate I/O operations
    75  $ kubectl run --restart=Never --image=polinux/stress stress-io -n test-biolatency -- stress --io 1
    76  $ kubectl wait --timeout=-1s -n test-biolatency --for=condition=ready pod/stress-io
    77  pod/stress-io condition met
    78  $ kubectl get pod -n test-biolatency -o wide
    79  NAME        READY   STATUS    RESTARTS   AGE   IP           NODE          NOMINATED NODE   READINESS GATES
    80  stress-io   1/1     Running   0          2s    10.244.1.7   worker-node   <none>           <none>
    81  ```
    82  
    83  Using the profile block-io gadget, we can generate another histogram to analyse the
    84  disk I/O with this load:
    85  
    86  ```bash
    87  # Run the gadget again
    88  $ kubectl gadget profile block-io --node worker-node
    89  Tracing block device I/O... Hit Ctrl-C to end
    90  
    91  # Wait again for 1 minute and hit Ctrl+C to stop the gadget and see the results
    92  ^C
    93  
    94       usecs               : count     distribution
    95           0 -> 1          : 0        |                                        |
    96           2 -> 3          : 0        |                                        |
    97           4 -> 7          : 0        |                                        |
    98           8 -> 15         : 0        |                                        |
    99          16 -> 31         : 411      |                                        |
   100          32 -> 63         : 310822   |****************************************|
   101          64 -> 127        : 293404   |*************************************   |
   102         128 -> 255        : 194881   |*************************               |
   103         256 -> 511        : 96520    |************                            |
   104         512 -> 1023       : 33756    |****                                    |
   105        1024 -> 2047       : 4414     |                                        |
   106        2048 -> 4095       : 1007     |                                        |
   107        4096 -> 8191       : 1025     |                                        |
   108        8192 -> 16383      : 176      |                                        |
   109       16384 -> 32767      : 13       |                                        |
   110       32768 -> 65535      : 7        |                                        |
   111       65536 -> 131071     : 0        |                                        |
   112      131072 -> 262143     : 0        |                                        |
   113      262144 -> 524287     : 1        |                                        |
   114      524288 -> 1048575    : 0        |                                        |
   115     1048576 -> 2097151    : 1        |                                        |
   116  
   117  # Remove load
   118  $ kubectl delete pod/stress-io -n test-biolatency
   119  ```
   120  
   121  The new histogram shows how the number of I/O operations increased
   122  significantly, passing from 1544 (normal load) to 936438 (stressing the I/O).
   123  On the other hand, even though this histogram shows that the bulk of the I/O
   124  was still lower than 1023us, we can observe that there were several I/O
   125  operations that suffered a high latency due to the load, one of them,
   126  even more than 1 sec.
   127  
   128  Delete the demo test namespace:
   129  ```bash
   130  $ kubectl delete ns test-biolatency
   131  namespace "test-biolatency" deleted
   132  ```
   133  
   134  ### With `ig`
   135  
   136  * Generate some io load:
   137  
   138  ```bash
   139  $ docker run -d --rm --name stresstest polinux/stress stress --io 10
   140  ```
   141  
   142  * Start `ig`:
   143  
   144  ```bash
   145  $ sudo ./ig profile block-io
   146  ```
   147  
   148  * Observe the results:
   149  
   150  ```bash
   151  $ sudo ./ig profile block-io
   152  Tracing block device I/O... Hit Ctrl-C to end.^C
   153       usecs               : count    distribution
   154           1 -> 1          : 0        |                                        |
   155           2 -> 3          : 0        |                                        |
   156           4 -> 7          : 0        |                                        |
   157           8 -> 15         : 0        |                                        |
   158          16 -> 31         : 0        |                                        |
   159          32 -> 63         : 113      |                                        |
   160          64 -> 127        : 7169     |****************************************|
   161         128 -> 255        : 3724     |********************                    |
   162         256 -> 511        : 2198     |************                            |
   163         512 -> 1023       : 712      |***                                     |
   164        1024 -> 2047       : 203      |*                                       |
   165        2048 -> 4095       : 23       |                                        |
   166        4096 -> 8191       : 7        |                                        |
   167        8192 -> 16383      : 3        |                                        |
   168  ```
   169  
   170  * Remove the docker container:
   171  
   172  ```bash
   173  $ docker stop stresstest
   174  ```