github.com/cilium/cilium@v1.16.2/Documentation/bpf/debug_and_test.rst (about)

     1  .. only:: not (epub or latex or html)
     2  
     3      WARNING: You are looking at unreleased Cilium documentation.
     4      Please use the official rendered version released here:
     5      https://docs.cilium.io
     6  
     7  .. _bpf_debug:
     8  
     9  Debugging and Testing
    10  =====================
    11  
    12  bpftool
    13  -------
    14  
    15  bpftool is the main introspection and debugging tool around BPF and developed
    16  and shipped along with the Linux kernel tree under ``tools/bpf/bpftool/``.
    17  
    18  The tool can dump all BPF programs and maps that are currently loaded in
    19  the system, or list and correlate all BPF maps used by a specific program.
    20  Furthermore, it allows to dump the entire map's key / value pairs, or
    21  lookup, update, delete individual ones as well as retrieve a key's neighbor
    22  key in the map. Such operations can be performed based on BPF program or
    23  map IDs or by specifying the location of a BPF file system pinned program
    24  or map. The tool additionally also offers an option to pin maps or programs
    25  into the BPF file system.
    26  
    27  For a quick overview of all BPF programs currently loaded on the host
    28  invoke the following command:
    29  
    30  .. code-block:: shell-session
    31  
    32       # bpftool prog
    33       398: sched_cls  tag 56207908be8ad877
    34          loaded_at Apr 09/16:24  uid 0
    35          xlated 8800B  jited 6184B  memlock 12288B  map_ids 18,5,17,14
    36       399: sched_cls  tag abc95fb4835a6ec9
    37          loaded_at Apr 09/16:24  uid 0
    38          xlated 344B  jited 223B  memlock 4096B  map_ids 18
    39       400: sched_cls  tag afd2e542b30ff3ec
    40          loaded_at Apr 09/16:24  uid 0
    41          xlated 1720B  jited 1001B  memlock 4096B  map_ids 17
    42       401: sched_cls  tag 2dbbd74ee5d51cc8
    43          loaded_at Apr 09/16:24  uid 0
    44          xlated 3728B  jited 2099B  memlock 4096B  map_ids 17
    45       [...]
    46  
    47  Similarly, to get an overview of all active maps:
    48  
    49  .. code-block:: shell-session
    50  
    51      # bpftool map
    52      5: hash  flags 0x0
    53          key 20B  value 112B  max_entries 65535  memlock 13111296B
    54      6: hash  flags 0x0
    55          key 20B  value 20B  max_entries 65536  memlock 7344128B
    56      7: hash  flags 0x0
    57          key 10B  value 16B  max_entries 8192  memlock 790528B
    58      8: hash  flags 0x0
    59          key 22B  value 28B  max_entries 8192  memlock 987136B
    60      9: hash  flags 0x0
    61          key 20B  value 8B  max_entries 512000  memlock 49352704B
    62      [...]
    63  
    64  Note that for each command, bpftool also supports json based output by
    65  appending ``--json`` at the end of the command line. An additional
    66  ``--pretty`` improves the output to be more human readable.
    67  
    68  .. code-block:: shell-session
    69  
    70       # bpftool prog --json --pretty
    71  
    72  For dumping the post-verifier BPF instruction image of a specific BPF
    73  program, one starting point could be to inspect a specific program, e.g.
    74  attached to the tc ingress hook:
    75  
    76  .. code-block:: shell-session
    77  
    78       # tc filter show dev cilium_host egress
    79       filter protocol all pref 1 bpf chain 0
    80       filter protocol all pref 1 bpf chain 0 handle 0x1 bpf_host.o:[from-netdev] \
    81                           direct-action not_in_hw id 406 tag e0362f5bd9163a0a jited
    82  
    83  The program from the object file ``bpf_host.o``, section ``from-netdev`` has
    84  a BPF program ID of ``406`` as denoted in ``id 406``. Based on this information
    85  bpftool can provide some high-level metadata specific to the program:
    86  
    87  .. code-block:: shell-session
    88  
    89       # bpftool prog show id 406
    90       406: sched_cls  tag e0362f5bd9163a0a
    91            loaded_at Apr 09/16:24  uid 0
    92            xlated 11144B  jited 7721B  memlock 12288B  map_ids 18,20,8,5,6,14
    93  
    94  The program of ID 406 is of type ``sched_cls`` (``BPF_PROG_TYPE_SCHED_CLS``),
    95  has a ``tag`` of ``e0362f5bd9163a0a`` (SHA sum over the instruction sequence),
    96  it was loaded by root ``uid 0`` on ``Apr 09/16:24``. The BPF instruction
    97  sequence is ``11,144 bytes`` long and the JITed image ``7,721 bytes``. The
    98  program itself (excluding maps) consumes ``12,288 bytes`` that are accounted /
    99  charged against user ``uid 0``. And the BPF program uses the BPF maps with
   100  IDs ``18``, ``20``, ``8``, ``5``, ``6`` and ``14``. The latter IDs can further
   101  be used to get information or dump the map themselves.
   102  
   103  Additionally, bpftool can issue a dump request of the BPF instructions the
   104  program runs:
   105  
   106  .. code-block:: shell-session
   107  
   108       # bpftool prog dump xlated id 406
   109        0: (b7) r7 = 0
   110        1: (63) *(u32 *)(r1 +60) = r7
   111        2: (63) *(u32 *)(r1 +56) = r7
   112        3: (63) *(u32 *)(r1 +52) = r7
   113       [...]
   114       47: (bf) r4 = r10
   115       48: (07) r4 += -40
   116       49: (79) r6 = *(u64 *)(r10 -104)
   117       50: (bf) r1 = r6
   118       51: (18) r2 = map[id:18]                    <-- BPF map id 18
   119       53: (b7) r5 = 32
   120       54: (85) call bpf_skb_event_output#5656112  <-- BPF helper call
   121       55: (69) r1 = *(u16 *)(r6 +192)
   122       [...]
   123  
   124  bpftool correlates BPF map IDs into the instruction stream as shown above
   125  as well as calls to BPF helpers or other BPF programs.
   126  
   127  The instruction dump reuses the same 'pretty-printer' as the kernel's BPF
   128  verifier. Since the program was JITed and therefore the actual JIT image
   129  that was generated out of above ``xlated`` instructions is executed, it
   130  can be dumped as well through bpftool:
   131  
   132  .. code-block:: shell-session
   133  
   134       # bpftool prog dump jited id 406
   135        0:        push   %rbp
   136        1:        mov    %rsp,%rbp
   137        4:        sub    $0x228,%rsp
   138        b:        sub    $0x28,%rbp
   139        f:        mov    %rbx,0x0(%rbp)
   140       13:        mov    %r13,0x8(%rbp)
   141       17:        mov    %r14,0x10(%rbp)
   142       1b:        mov    %r15,0x18(%rbp)
   143       1f:        xor    %eax,%eax
   144       21:        mov    %rax,0x20(%rbp)
   145       25:        mov    0x80(%rdi),%r9d
   146       [...]
   147  
   148  Mainly for BPF JIT developers, the option also exists to interleave the
   149  disassembly with the actual native opcodes:
   150  
   151  .. code-block:: shell-session
   152  
   153       # bpftool prog dump jited id 406 opcodes
   154        0:        push   %rbp
   155                  55
   156        1:        mov    %rsp,%rbp
   157                  48 89 e5
   158        4:        sub    $0x228,%rsp
   159                  48 81 ec 28 02 00 00
   160        b:        sub    $0x28,%rbp
   161                  48 83 ed 28
   162        f:        mov    %rbx,0x0(%rbp)
   163                  48 89 5d 00
   164       13:        mov    %r13,0x8(%rbp)
   165                  4c 89 6d 08
   166       17:        mov    %r14,0x10(%rbp)
   167                  4c 89 75 10
   168       1b:        mov    %r15,0x18(%rbp)
   169                  4c 89 7d 18
   170       [...]
   171  
   172  The same interleaving can be done for the normal BPF instructions which
   173  can sometimes be useful for debugging in the kernel:
   174  
   175  .. code-block:: shell-session
   176  
   177       # bpftool prog dump xlated id 406 opcodes
   178        0: (b7) r7 = 0
   179           b7 07 00 00 00 00 00 00
   180        1: (63) *(u32 *)(r1 +60) = r7
   181           63 71 3c 00 00 00 00 00
   182        2: (63) *(u32 *)(r1 +56) = r7
   183           63 71 38 00 00 00 00 00
   184        3: (63) *(u32 *)(r1 +52) = r7
   185           63 71 34 00 00 00 00 00
   186        4: (63) *(u32 *)(r1 +48) = r7
   187           63 71 30 00 00 00 00 00
   188        5: (63) *(u32 *)(r1 +64) = r7
   189           63 71 40 00 00 00 00 00
   190        [...]
   191  
   192  The basic blocks of a program can also be visualized with the help of
   193  ``graphviz``. For this purpose bpftool has a ``visual`` dump mode that
   194  generates a dot file instead of the plain BPF ``xlated`` instruction
   195  dump that can later be converted to a png file:
   196  
   197  .. code-block:: shell-session
   198  
   199       # bpftool prog dump xlated id 406 visual &> output.dot
   200       $ dot -Tpng output.dot -o output.png
   201  
   202  Another option would be to pass the dot file to dotty as a viewer, that
   203  is ``dotty output.dot``, where the result for the ``bpf_host.o`` program
   204  looks as follows (small extract):
   205  
   206  .. image:: /images/bpf_dot.png
   207      :align: center
   208  
   209  Note that the ``xlated`` instruction dump provides the post-verifier BPF
   210  instruction image which means that it dumps the instructions as if they
   211  were to be run through the BPF interpreter. In the kernel, the verifier
   212  performs various rewrites of the original instructions provided by the
   213  BPF loader.
   214  
   215  One example of rewrites is the inlining of helper functions in order to
   216  improve runtime performance, here in the case of a map lookup for hash
   217  tables:
   218  
   219  .. code-block:: shell-session
   220  
   221       # bpftool prog dump xlated id 3
   222        0: (b7) r1 = 2
   223        1: (63) *(u32 *)(r10 -4) = r1
   224        2: (bf) r2 = r10
   225        3: (07) r2 += -4
   226        4: (18) r1 = map[id:2]                      <-- BPF map id 2
   227        6: (85) call __htab_map_lookup_elem#77408   <-+ BPF helper inlined rewrite
   228        7: (15) if r0 == 0x0 goto pc+2                |
   229        8: (07) r0 += 56                              |
   230        9: (79) r0 = *(u64 *)(r0 +0)                <-+
   231       10: (15) if r0 == 0x0 goto pc+24
   232       11: (bf) r2 = r10
   233       12: (07) r2 += -4
   234       [...]
   235  
   236  bpftool correlates calls to helper functions or BPF to BPF calls through
   237  kallsyms. Therefore, make sure that JITed BPF programs are exposed to
   238  kallsyms (``bpf_jit_kallsyms``) and that kallsyms addresses are not
   239  obfuscated (calls are otherwise shown as ``call bpf_unspec#0``):
   240  
   241  .. code-block:: shell-session
   242  
   243       # echo 0 > /proc/sys/kernel/kptr_restrict
   244       # echo 1 > /proc/sys/net/core/bpf_jit_kallsyms
   245  
   246  BPF to BPF calls are correlated as well for both, interpreter as well
   247  as JIT case. In the latter, the tag of the subprogram is shown as
   248  call target. In each case, the ``pc+2`` is the pc-relative offset of
   249  the call target, which denotes the subprogram.
   250  
   251  .. code-block:: shell-session
   252  
   253       # bpftool prog dump xlated id 1
   254       0: (85) call pc+2#__bpf_prog_run_args32
   255       1: (b7) r0 = 1
   256       2: (95) exit
   257       3: (b7) r0 = 2
   258       4: (95) exit
   259  
   260  JITed variant of the dump:
   261  
   262  .. code-block:: shell-session
   263  
   264       # bpftool prog dump xlated id 1
   265       0: (85) call pc+2#bpf_prog_3b185187f1855c4c_F
   266       1: (b7) r0 = 1
   267       2: (95) exit
   268       3: (b7) r0 = 2
   269       4: (95) exit
   270  
   271  In the case of tail calls, the kernel maps them into a single instruction
   272  internally, bpftool will still correlate them as a helper call for ease
   273  of debugging:
   274  
   275  .. code-block:: shell-session
   276  
   277       # bpftool prog dump xlated id 2
   278       [...]
   279       10: (b7) r2 = 8
   280       11: (85) call bpf_trace_printk#-41312
   281       12: (bf) r1 = r6
   282       13: (18) r2 = map[id:1]
   283       15: (b7) r3 = 0
   284       16: (85) call bpf_tail_call#12
   285       17: (b7) r1 = 42
   286       18: (6b) *(u16 *)(r6 +46) = r1
   287       19: (b7) r0 = 0
   288       20: (95) exit
   289  
   290       # bpftool map show id 1
   291       1: prog_array  flags 0x0
   292             key 4B  value 4B  max_entries 1  memlock 4096B
   293  
   294  Dumping an entire map is possible through the ``map dump`` subcommand
   295  which iterates through all present map elements and dumps the key /
   296  value pairs.
   297  
   298  If no BTF (BPF Type Format) data is available for a given map, then
   299  the key / value pairs are dumped as hex:
   300  
   301  .. code-block:: shell-session
   302  
   303       # bpftool map dump id 5
   304       key:
   305       f0 0d 00 00 00 00 00 00  0a 66 00 00 00 00 8a d6
   306       02 00 00 00
   307       value:
   308       00 00 00 00 00 00 00 00  01 00 00 00 00 00 00 00
   309       00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
   310       00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
   311       00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
   312       00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
   313       00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
   314       00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
   315       key:
   316       0a 66 1c ee 00 00 00 00  00 00 00 00 00 00 00 00
   317       01 00 00 00
   318       value:
   319       00 00 00 00 00 00 00 00  01 00 00 00 00 00 00 00
   320       00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
   321       [...]
   322       Found 6 elements
   323  
   324  However, with BTF, the map also holds debugging information about
   325  the key and value structures. For example, BTF in combination with
   326  BPF maps and the BPF_ANNOTATE_KV_PAIR() macro from iproute2 will
   327  result in the following dump (``test_xdp_noinline.o`` from kernel
   328  selftests):
   329  
   330  .. code-block:: shell-session
   331  
   332       # cat tools/testing/selftests/bpf/test_xdp_noinline.c
   333         [...]
   334          struct ctl_value {
   335                union {
   336                        __u64 value;
   337                        __u32 ifindex;
   338                        __u8 mac[6];
   339                };
   340          };
   341  
   342          struct bpf_map_def __attribute__ ((section("maps"), used)) ctl_array = {
   343                 .type		= BPF_MAP_TYPE_ARRAY,
   344                 .key_size	= sizeof(__u32),
   345                 .value_size	= sizeof(struct ctl_value),
   346                 .max_entries	= 16,
   347                 .map_flags	= 0,
   348          };
   349          BPF_ANNOTATE_KV_PAIR(ctl_array, __u32, struct ctl_value);
   350  
   351          [...]
   352  
   353  The BPF_ANNOTATE_KV_PAIR() macro forces a map-specific ELF section
   354  containing an empty key and value, this enables the iproute2 BPF loader
   355  to correlate BTF data with that section and thus allows to choose the
   356  corresponding types out of the BTF for loading the map.
   357  
   358  Compiling through LLVM and generating BTF through debugging information
   359  by ``pahole``:
   360  
   361  .. code-block:: shell-session
   362  
   363       # clang [...] -O2 --target=bpf -g -emit-llvm -c test_xdp_noinline.c -o - |
   364         llc -march=bpf -mcpu=probe -mattr=dwarfris -filetype=obj -o test_xdp_noinline.o
   365       # pahole -J test_xdp_noinline.o
   366  
   367  Now loading into kernel and dumping the map via bpftool:
   368  
   369  .. code-block:: shell-session
   370  
   371       # ip -force link set dev lo xdp obj test_xdp_noinline.o sec xdp-test
   372       # ip a
   373       1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 xdpgeneric/id:227 qdisc noqueue state UNKNOWN group default qlen 1000
   374           link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
   375           inet 127.0.0.1/8 scope host lo
   376              valid_lft forever preferred_lft forever
   377           inet6 ::1/128 scope host
   378              valid_lft forever preferred_lft forever
   379       [...]
   380       # bpftool prog show id 227
   381       227: xdp  tag a85e060c275c5616  gpl
   382           loaded_at 2018-07-17T14:41:29+0000  uid 0
   383           xlated 8152B  not jited  memlock 12288B  map_ids 381,385,386,382,384,383
   384       # bpftool map dump id 386
   385        [{
   386             "key": 0,
   387             "value": {
   388                 "": {
   389                     "value": 0,
   390                     "ifindex": 0,
   391                     "mac": []
   392                 }
   393             }
   394         },{
   395             "key": 1,
   396             "value": {
   397                 "": {
   398                     "value": 0,
   399                     "ifindex": 0,
   400                     "mac": []
   401                 }
   402             }
   403         },{
   404       [...]
   405  
   406  Lookup, update, delete, and 'get next key' operations on the map for specific
   407  keys can be performed through bpftool as well.
   408  
   409  If the BPF program has been successfully loaded with BTF debugging information,
   410  the BTF ID will be shown in ``prog show`` command result denoted in ``btf_id``.
   411  
   412  .. code-block:: shell-session
   413  
   414       # bpftool prog show id 72
   415       72: xdp  name balancer_ingres  tag acf44cabb48385ed  gpl
   416          loaded_at 2020-04-13T23:12:08+0900  uid 0
   417          xlated 19104B  jited 10732B  memlock 20480B  map_ids 126,130,131,127,129,128
   418          btf_id 60
   419  
   420  This can also be confirmed with ``btf show`` command which dumps all BTF
   421  objects loaded on a system.
   422  
   423  .. code-block:: shell-session
   424  
   425       # bpftool btf show
   426       60: size 12243B  prog_ids 72  map_ids 126,130,131,127,129,128
   427  
   428  And the subcommand ``btf dump`` can be used to check which debugging information
   429  is included in the BTF. With this command, BTF dump can be formatted either
   430  'raw' or 'c', the one that is used in C code.
   431  
   432  .. code-block:: shell-session
   433  
   434       # bpftool btf dump id 60 format c
   435         [...]
   436          struct ctl_value {
   437                union {
   438                        __u64 value;
   439                        __u32 ifindex;
   440                        __u8 mac[6];
   441                };
   442          };
   443  
   444          typedef unsigned int u32;
   445          [...]
   446  
   447  .. admonition:: Video
   448    :class: attention
   449  
   450    To learn more about bpftool, check out `eCHO episode 11: Exploring
   451    bpftool <https://www.youtube.com/watch?v=1EOLh3zzWP4&t=650s>`__ with Quentin Monnet, maintainer of bpftool.
   452  
   453  Kernel Testing
   454  --------------
   455  
   456  The Linux kernel ships a BPF selftest suite, which can be found in the kernel
   457  source tree under ``tools/testing/selftests/bpf/``.
   458  
   459  .. code-block:: shell-session
   460  
   461      $ cd tools/testing/selftests/bpf/
   462      $ make
   463      # make run_tests
   464  
   465  The test suite contains test cases against the BPF verifier, program tags,
   466  various tests against the BPF map interface and map types. It contains various
   467  runtime tests from C code for checking LLVM back end, and eBPF as well as cBPF
   468  asm code that is run in the kernel for testing the interpreter and JITs.
   469  
   470  JIT Debugging
   471  -------------
   472  
   473  For JIT developers performing audits or writing extensions, each compile run
   474  can output the generated JIT image into the kernel log through:
   475  
   476  .. code-block:: shell-session
   477  
   478      # echo 2 > /proc/sys/net/core/bpf_jit_enable
   479  
   480  Whenever a new BPF program is loaded, the JIT compiler will dump the output,
   481  which can then be inspected with ``dmesg``, for example:
   482  
   483  ::
   484  
   485      [ 3389.935842] flen=6 proglen=70 pass=3 image=ffffffffa0069c8f from=tcpdump pid=20583
   486      [ 3389.935847] JIT code: 00000000: 55 48 89 e5 48 83 ec 60 48 89 5d f8 44 8b 4f 68
   487      [ 3389.935849] JIT code: 00000010: 44 2b 4f 6c 4c 8b 87 d8 00 00 00 be 0c 00 00 00
   488      [ 3389.935850] JIT code: 00000020: e8 1d 94 ff e0 3d 00 08 00 00 75 16 be 17 00 00
   489      [ 3389.935851] JIT code: 00000030: 00 e8 28 94 ff e0 83 f8 01 75 07 b8 ff ff 00 00
   490      [ 3389.935852] JIT code: 00000040: eb 02 31 c0 c9 c3
   491  
   492  ``flen`` is the length of the BPF program (here, 6 BPF instructions), and ``proglen``
   493  tells the number of bytes generated by the JIT for the opcode image (here, 70 bytes
   494  in size). ``pass`` means that the image was generated in 3 compiler passes, for
   495  example, ``x86_64`` can have various optimization passes to further reduce the image
   496  size when possible. ``image`` contains the address of the generated JIT image, ``from``
   497  and ``pid`` the user space application name and PID respectively, which triggered the
   498  compilation process. The dump output for eBPF and cBPF JITs is the same format.
   499  
   500  In the kernel tree under ``tools/bpf/``, there is a tool called ``bpf_jit_disasm``. It
   501  reads out the latest dump and prints the disassembly for further inspection:
   502  
   503  .. code-block:: shell-session
   504  
   505      # ./bpf_jit_disasm
   506      70 bytes emitted from JIT compiler (pass:3, flen:6)
   507      ffffffffa0069c8f + <x>:
   508         0:       push   %rbp
   509         1:       mov    %rsp,%rbp
   510         4:       sub    $0x60,%rsp
   511         8:       mov    %rbx,-0x8(%rbp)
   512         c:       mov    0x68(%rdi),%r9d
   513        10:       sub    0x6c(%rdi),%r9d
   514        14:       mov    0xd8(%rdi),%r8
   515        1b:       mov    $0xc,%esi
   516        20:       callq  0xffffffffe0ff9442
   517        25:       cmp    $0x800,%eax
   518        2a:       jne    0x0000000000000042
   519        2c:       mov    $0x17,%esi
   520        31:       callq  0xffffffffe0ff945e
   521        36:       cmp    $0x1,%eax
   522        39:       jne    0x0000000000000042
   523        3b:       mov    $0xffff,%eax
   524        40:       jmp    0x0000000000000044
   525        42:       xor    %eax,%eax
   526        44:       leaveq
   527        45:       retq
   528  
   529  Alternatively, the tool can also dump related opcodes along with the disassembly.
   530  
   531  .. code-block:: shell-session
   532  
   533      # ./bpf_jit_disasm -o
   534      70 bytes emitted from JIT compiler (pass:3, flen:6)
   535      ffffffffa0069c8f + <x>:
   536         0:       push   %rbp
   537          55
   538         1:       mov    %rsp,%rbp
   539          48 89 e5
   540         4:       sub    $0x60,%rsp
   541          48 83 ec 60
   542         8:       mov    %rbx,-0x8(%rbp)
   543          48 89 5d f8
   544         c:       mov    0x68(%rdi),%r9d
   545          44 8b 4f 68
   546        10:       sub    0x6c(%rdi),%r9d
   547          44 2b 4f 6c
   548        14:       mov    0xd8(%rdi),%r8
   549          4c 8b 87 d8 00 00 00
   550        1b:       mov    $0xc,%esi
   551          be 0c 00 00 00
   552        20:       callq  0xffffffffe0ff9442
   553          e8 1d 94 ff e0
   554        25:       cmp    $0x800,%eax
   555          3d 00 08 00 00
   556        2a:       jne    0x0000000000000042
   557          75 16
   558        2c:       mov    $0x17,%esi
   559          be 17 00 00 00
   560        31:       callq  0xffffffffe0ff945e
   561          e8 28 94 ff e0
   562        36:       cmp    $0x1,%eax
   563          83 f8 01
   564        39:       jne    0x0000000000000042
   565          75 07
   566        3b:       mov    $0xffff,%eax
   567          b8 ff ff 00 00
   568        40:       jmp    0x0000000000000044
   569          eb 02
   570        42:       xor    %eax,%eax
   571          31 c0
   572        44:       leaveq
   573          c9
   574        45:       retq
   575          c3
   576  
   577  More recently, ``bpftool`` adapted the same feature of dumping the BPF JIT
   578  image based on a given BPF program ID already loaded in the system (see
   579  bpftool section).
   580  
   581  For performance analysis of JITed BPF programs, ``perf`` can be used as
   582  usual. As a prerequisite, JITed programs need to be exported through kallsyms
   583  infrastructure.
   584  
   585  .. code-block:: shell-session
   586  
   587      # echo 1 > /proc/sys/net/core/bpf_jit_enable
   588      # echo 1 > /proc/sys/net/core/bpf_jit_kallsyms
   589  
   590  Enabling or disabling ``bpf_jit_kallsyms`` does not require a reload of the
   591  related BPF programs. Next, a small workflow example is provided for profiling
   592  BPF programs. A crafted tc BPF program is used for demonstration purposes,
   593  where perf records a failed allocation inside ``bpf_clone_redirect()`` helper.
   594  Due to the use of direct write, ``bpf_try_make_head_writable()`` failed, which
   595  would then release the cloned ``skb`` again and return with an error message.
   596  ``perf`` thus records all ``kfree_skb`` events.
   597  
   598  .. code-block:: shell-session
   599  
   600      # tc qdisc add dev em1 clsact
   601      # tc filter add dev em1 ingress bpf da obj prog.o sec main
   602      # tc filter show dev em1 ingress
   603      filter protocol all pref 49152 bpf
   604      filter protocol all pref 49152 bpf handle 0x1 prog.o:[main] direct-action id 1 tag 8227addf251b7543
   605  
   606      # cat /proc/kallsyms
   607      [...]
   608      ffffffffc00349e0 t fjes_hw_init_command_registers    [fjes]
   609      ffffffffc003e2e0 d __tracepoint_fjes_hw_stop_debug_err    [fjes]
   610      ffffffffc0036190 t fjes_hw_epbuf_tx_pkt_send    [fjes]
   611      ffffffffc004b000 t bpf_prog_8227addf251b7543
   612  
   613      # perf record -a -g -e skb:kfree_skb sleep 60
   614      # perf script --kallsyms=/proc/kallsyms
   615      [...]
   616      ksoftirqd/0     6 [000]  1004.578402:    skb:kfree_skb: skbaddr=0xffff9d4161f20a00 protocol=2048 location=0xffffffffc004b52c
   617         7fffb8745961 bpf_clone_redirect (/lib/modules/4.10.0+/build/vmlinux)
   618         7fffc004e52c bpf_prog_8227addf251b7543 (/lib/modules/4.10.0+/build/vmlinux)
   619         7fffc05b6283 cls_bpf_classify (/lib/modules/4.10.0+/build/vmlinux)
   620         7fffb875957a tc_classify (/lib/modules/4.10.0+/build/vmlinux)
   621         7fffb8729840 __netif_receive_skb_core (/lib/modules/4.10.0+/build/vmlinux)
   622         7fffb8729e38 __netif_receive_skb (/lib/modules/4.10.0+/build/vmlinux)
   623         7fffb872ae05 process_backlog (/lib/modules/4.10.0+/build/vmlinux)
   624         7fffb872a43e net_rx_action (/lib/modules/4.10.0+/build/vmlinux)
   625         7fffb886176c __do_softirq (/lib/modules/4.10.0+/build/vmlinux)
   626         7fffb80ac5b9 run_ksoftirqd (/lib/modules/4.10.0+/build/vmlinux)
   627         7fffb80ca7fa smpboot_thread_fn (/lib/modules/4.10.0+/build/vmlinux)
   628         7fffb80c6831 kthread (/lib/modules/4.10.0+/build/vmlinux)
   629         7fffb885e09c ret_from_fork (/lib/modules/4.10.0+/build/vmlinux)
   630  
   631  The stack trace recorded by ``perf`` will then show the ``bpf_prog_8227addf251b7543()``
   632  symbol as part of the call trace, meaning that the BPF program with the
   633  tag ``8227addf251b7543`` was related to the ``kfree_skb`` event, and
   634  such program was attached to netdevice ``em1`` on the ingress hook as
   635  shown by tc.
   636  
   637  Introspection
   638  -------------
   639  
   640  The Linux kernel provides various tracepoints around BPF and XDP which
   641  can be used for additional introspection, for example, to trace interactions
   642  of user space programs with the bpf system call.
   643  
   644  Tracepoints for BPF:
   645  
   646  .. code-block:: shell-session
   647  
   648      # perf list | grep bpf:
   649      bpf:bpf_map_create                                 [Tracepoint event]
   650      bpf:bpf_map_delete_elem                            [Tracepoint event]
   651      bpf:bpf_map_lookup_elem                            [Tracepoint event]
   652      bpf:bpf_map_next_key                               [Tracepoint event]
   653      bpf:bpf_map_update_elem                            [Tracepoint event]
   654      bpf:bpf_obj_get_map                                [Tracepoint event]
   655      bpf:bpf_obj_get_prog                               [Tracepoint event]
   656      bpf:bpf_obj_pin_map                                [Tracepoint event]
   657      bpf:bpf_obj_pin_prog                               [Tracepoint event]
   658      bpf:bpf_prog_get_type                              [Tracepoint event]
   659      bpf:bpf_prog_load                                  [Tracepoint event]
   660      bpf:bpf_prog_put_rcu                               [Tracepoint event]
   661  
   662  Example usage with ``perf`` (alternatively to ``sleep`` example used here,
   663  a specific application like ``tc`` could be used here instead, of course):
   664  
   665  .. code-block:: shell-session
   666  
   667      # perf record -a -e bpf:* sleep 10
   668      # perf script
   669      sock_example  6197 [005]   283.980322:      bpf:bpf_map_create: map type=ARRAY ufd=4 key=4 val=8 max=256 flags=0
   670      sock_example  6197 [005]   283.980721:       bpf:bpf_prog_load: prog=a5ea8fa30ea6849c type=SOCKET_FILTER ufd=5
   671      sock_example  6197 [005]   283.988423:   bpf:bpf_prog_get_type: prog=a5ea8fa30ea6849c type=SOCKET_FILTER
   672      sock_example  6197 [005]   283.988443: bpf:bpf_map_lookup_elem: map type=ARRAY ufd=4 key=[06 00 00 00] val=[00 00 00 00 00 00 00 00]
   673      [...]
   674      sock_example  6197 [005]   288.990868: bpf:bpf_map_lookup_elem: map type=ARRAY ufd=4 key=[01 00 00 00] val=[14 00 00 00 00 00 00 00]
   675           swapper     0 [005]   289.338243:    bpf:bpf_prog_put_rcu: prog=a5ea8fa30ea6849c type=SOCKET_FILTER
   676  
   677  For the BPF programs, their individual program tag is displayed.
   678  
   679  For debugging, XDP also has a tracepoint that is triggered when exceptions are raised:
   680  
   681  .. code-block:: shell-session
   682  
   683      # perf list | grep xdp:
   684      xdp:xdp_exception                                  [Tracepoint event]
   685  
   686  Exceptions are triggered in the following scenarios:
   687  
   688  * The BPF program returned an invalid / unknown XDP action code.
   689  * The BPF program returned with ``XDP_ABORTED`` indicating a non-graceful exit.
   690  * The BPF program returned with ``XDP_TX``, but there was an error on transmit,
   691    for example, due to the port not being up, due to the transmit ring being full,
   692    due to allocation failures, etc.
   693  
   694  Both tracepoint classes can also be inspected with a BPF program itself
   695  attached to one or more tracepoints, collecting further information
   696  in a map or punting such events to a user space collector through the
   697  ``bpf_perf_event_output()`` helper, for example.
   698  
   699  Tracing pipe
   700  ------------
   701  
   702  When a BPF program makes a call to ``bpf_trace_printk()``, the output is sent
   703  to the kernel tracing pipe. Users may read from this file to consume events
   704  that are traced to this buffer:
   705  
   706  .. code-block:: shell-session
   707  
   708     # tail -f /sys/kernel/debug/tracing/trace_pipe
   709     ...
   710  
   711  
   712  Miscellaneous
   713  -------------
   714  
   715  BPF programs and maps are memory accounted against ``RLIMIT_MEMLOCK`` similar
   716  to ``perf``. The currently available size in unit of system pages which may be
   717  locked into memory can be inspected through ``ulimit -l``. The setrlimit system
   718  call man page provides further details.
   719  
   720  The default limit is usually insufficient to load more complex programs or
   721  larger BPF maps, so that the BPF system call will return with ``errno``
   722  of ``EPERM``. In such situations a workaround with ``ulimit -l unlimited`` or
   723  with a sufficiently large limit could be performed. The ``RLIMIT_MEMLOCK`` is
   724  mainly enforcing limits for unprivileged users. Depending on the setup,
   725  setting a higher limit for privileged users is often acceptable.