github.com/cilium/cilium@v1.16.2/Documentation/bpf/debug_and_test.rst (about) 1 .. only:: not (epub or latex or html) 2 3 WARNING: You are looking at unreleased Cilium documentation. 4 Please use the official rendered version released here: 5 https://docs.cilium.io 6 7 .. _bpf_debug: 8 9 Debugging and Testing 10 ===================== 11 12 bpftool 13 ------- 14 15 bpftool is the main introspection and debugging tool around BPF and developed 16 and shipped along with the Linux kernel tree under ``tools/bpf/bpftool/``. 17 18 The tool can dump all BPF programs and maps that are currently loaded in 19 the system, or list and correlate all BPF maps used by a specific program. 20 Furthermore, it allows to dump the entire map's key / value pairs, or 21 lookup, update, delete individual ones as well as retrieve a key's neighbor 22 key in the map. Such operations can be performed based on BPF program or 23 map IDs or by specifying the location of a BPF file system pinned program 24 or map. The tool additionally also offers an option to pin maps or programs 25 into the BPF file system. 26 27 For a quick overview of all BPF programs currently loaded on the host 28 invoke the following command: 29 30 .. code-block:: shell-session 31 32 # bpftool prog 33 398: sched_cls tag 56207908be8ad877 34 loaded_at Apr 09/16:24 uid 0 35 xlated 8800B jited 6184B memlock 12288B map_ids 18,5,17,14 36 399: sched_cls tag abc95fb4835a6ec9 37 loaded_at Apr 09/16:24 uid 0 38 xlated 344B jited 223B memlock 4096B map_ids 18 39 400: sched_cls tag afd2e542b30ff3ec 40 loaded_at Apr 09/16:24 uid 0 41 xlated 1720B jited 1001B memlock 4096B map_ids 17 42 401: sched_cls tag 2dbbd74ee5d51cc8 43 loaded_at Apr 09/16:24 uid 0 44 xlated 3728B jited 2099B memlock 4096B map_ids 17 45 [...] 46 47 Similarly, to get an overview of all active maps: 48 49 .. code-block:: shell-session 50 51 # bpftool map 52 5: hash flags 0x0 53 key 20B value 112B max_entries 65535 memlock 13111296B 54 6: hash flags 0x0 55 key 20B value 20B max_entries 65536 memlock 7344128B 56 7: hash flags 0x0 57 key 10B value 16B max_entries 8192 memlock 790528B 58 8: hash flags 0x0 59 key 22B value 28B max_entries 8192 memlock 987136B 60 9: hash flags 0x0 61 key 20B value 8B max_entries 512000 memlock 49352704B 62 [...] 63 64 Note that for each command, bpftool also supports json based output by 65 appending ``--json`` at the end of the command line. An additional 66 ``--pretty`` improves the output to be more human readable. 67 68 .. code-block:: shell-session 69 70 # bpftool prog --json --pretty 71 72 For dumping the post-verifier BPF instruction image of a specific BPF 73 program, one starting point could be to inspect a specific program, e.g. 74 attached to the tc ingress hook: 75 76 .. code-block:: shell-session 77 78 # tc filter show dev cilium_host egress 79 filter protocol all pref 1 bpf chain 0 80 filter protocol all pref 1 bpf chain 0 handle 0x1 bpf_host.o:[from-netdev] \ 81 direct-action not_in_hw id 406 tag e0362f5bd9163a0a jited 82 83 The program from the object file ``bpf_host.o``, section ``from-netdev`` has 84 a BPF program ID of ``406`` as denoted in ``id 406``. Based on this information 85 bpftool can provide some high-level metadata specific to the program: 86 87 .. code-block:: shell-session 88 89 # bpftool prog show id 406 90 406: sched_cls tag e0362f5bd9163a0a 91 loaded_at Apr 09/16:24 uid 0 92 xlated 11144B jited 7721B memlock 12288B map_ids 18,20,8,5,6,14 93 94 The program of ID 406 is of type ``sched_cls`` (``BPF_PROG_TYPE_SCHED_CLS``), 95 has a ``tag`` of ``e0362f5bd9163a0a`` (SHA sum over the instruction sequence), 96 it was loaded by root ``uid 0`` on ``Apr 09/16:24``. The BPF instruction 97 sequence is ``11,144 bytes`` long and the JITed image ``7,721 bytes``. The 98 program itself (excluding maps) consumes ``12,288 bytes`` that are accounted / 99 charged against user ``uid 0``. And the BPF program uses the BPF maps with 100 IDs ``18``, ``20``, ``8``, ``5``, ``6`` and ``14``. The latter IDs can further 101 be used to get information or dump the map themselves. 102 103 Additionally, bpftool can issue a dump request of the BPF instructions the 104 program runs: 105 106 .. code-block:: shell-session 107 108 # bpftool prog dump xlated id 406 109 0: (b7) r7 = 0 110 1: (63) *(u32 *)(r1 +60) = r7 111 2: (63) *(u32 *)(r1 +56) = r7 112 3: (63) *(u32 *)(r1 +52) = r7 113 [...] 114 47: (bf) r4 = r10 115 48: (07) r4 += -40 116 49: (79) r6 = *(u64 *)(r10 -104) 117 50: (bf) r1 = r6 118 51: (18) r2 = map[id:18] <-- BPF map id 18 119 53: (b7) r5 = 32 120 54: (85) call bpf_skb_event_output#5656112 <-- BPF helper call 121 55: (69) r1 = *(u16 *)(r6 +192) 122 [...] 123 124 bpftool correlates BPF map IDs into the instruction stream as shown above 125 as well as calls to BPF helpers or other BPF programs. 126 127 The instruction dump reuses the same 'pretty-printer' as the kernel's BPF 128 verifier. Since the program was JITed and therefore the actual JIT image 129 that was generated out of above ``xlated`` instructions is executed, it 130 can be dumped as well through bpftool: 131 132 .. code-block:: shell-session 133 134 # bpftool prog dump jited id 406 135 0: push %rbp 136 1: mov %rsp,%rbp 137 4: sub $0x228,%rsp 138 b: sub $0x28,%rbp 139 f: mov %rbx,0x0(%rbp) 140 13: mov %r13,0x8(%rbp) 141 17: mov %r14,0x10(%rbp) 142 1b: mov %r15,0x18(%rbp) 143 1f: xor %eax,%eax 144 21: mov %rax,0x20(%rbp) 145 25: mov 0x80(%rdi),%r9d 146 [...] 147 148 Mainly for BPF JIT developers, the option also exists to interleave the 149 disassembly with the actual native opcodes: 150 151 .. code-block:: shell-session 152 153 # bpftool prog dump jited id 406 opcodes 154 0: push %rbp 155 55 156 1: mov %rsp,%rbp 157 48 89 e5 158 4: sub $0x228,%rsp 159 48 81 ec 28 02 00 00 160 b: sub $0x28,%rbp 161 48 83 ed 28 162 f: mov %rbx,0x0(%rbp) 163 48 89 5d 00 164 13: mov %r13,0x8(%rbp) 165 4c 89 6d 08 166 17: mov %r14,0x10(%rbp) 167 4c 89 75 10 168 1b: mov %r15,0x18(%rbp) 169 4c 89 7d 18 170 [...] 171 172 The same interleaving can be done for the normal BPF instructions which 173 can sometimes be useful for debugging in the kernel: 174 175 .. code-block:: shell-session 176 177 # bpftool prog dump xlated id 406 opcodes 178 0: (b7) r7 = 0 179 b7 07 00 00 00 00 00 00 180 1: (63) *(u32 *)(r1 +60) = r7 181 63 71 3c 00 00 00 00 00 182 2: (63) *(u32 *)(r1 +56) = r7 183 63 71 38 00 00 00 00 00 184 3: (63) *(u32 *)(r1 +52) = r7 185 63 71 34 00 00 00 00 00 186 4: (63) *(u32 *)(r1 +48) = r7 187 63 71 30 00 00 00 00 00 188 5: (63) *(u32 *)(r1 +64) = r7 189 63 71 40 00 00 00 00 00 190 [...] 191 192 The basic blocks of a program can also be visualized with the help of 193 ``graphviz``. For this purpose bpftool has a ``visual`` dump mode that 194 generates a dot file instead of the plain BPF ``xlated`` instruction 195 dump that can later be converted to a png file: 196 197 .. code-block:: shell-session 198 199 # bpftool prog dump xlated id 406 visual &> output.dot 200 $ dot -Tpng output.dot -o output.png 201 202 Another option would be to pass the dot file to dotty as a viewer, that 203 is ``dotty output.dot``, where the result for the ``bpf_host.o`` program 204 looks as follows (small extract): 205 206 .. image:: /images/bpf_dot.png 207 :align: center 208 209 Note that the ``xlated`` instruction dump provides the post-verifier BPF 210 instruction image which means that it dumps the instructions as if they 211 were to be run through the BPF interpreter. In the kernel, the verifier 212 performs various rewrites of the original instructions provided by the 213 BPF loader. 214 215 One example of rewrites is the inlining of helper functions in order to 216 improve runtime performance, here in the case of a map lookup for hash 217 tables: 218 219 .. code-block:: shell-session 220 221 # bpftool prog dump xlated id 3 222 0: (b7) r1 = 2 223 1: (63) *(u32 *)(r10 -4) = r1 224 2: (bf) r2 = r10 225 3: (07) r2 += -4 226 4: (18) r1 = map[id:2] <-- BPF map id 2 227 6: (85) call __htab_map_lookup_elem#77408 <-+ BPF helper inlined rewrite 228 7: (15) if r0 == 0x0 goto pc+2 | 229 8: (07) r0 += 56 | 230 9: (79) r0 = *(u64 *)(r0 +0) <-+ 231 10: (15) if r0 == 0x0 goto pc+24 232 11: (bf) r2 = r10 233 12: (07) r2 += -4 234 [...] 235 236 bpftool correlates calls to helper functions or BPF to BPF calls through 237 kallsyms. Therefore, make sure that JITed BPF programs are exposed to 238 kallsyms (``bpf_jit_kallsyms``) and that kallsyms addresses are not 239 obfuscated (calls are otherwise shown as ``call bpf_unspec#0``): 240 241 .. code-block:: shell-session 242 243 # echo 0 > /proc/sys/kernel/kptr_restrict 244 # echo 1 > /proc/sys/net/core/bpf_jit_kallsyms 245 246 BPF to BPF calls are correlated as well for both, interpreter as well 247 as JIT case. In the latter, the tag of the subprogram is shown as 248 call target. In each case, the ``pc+2`` is the pc-relative offset of 249 the call target, which denotes the subprogram. 250 251 .. code-block:: shell-session 252 253 # bpftool prog dump xlated id 1 254 0: (85) call pc+2#__bpf_prog_run_args32 255 1: (b7) r0 = 1 256 2: (95) exit 257 3: (b7) r0 = 2 258 4: (95) exit 259 260 JITed variant of the dump: 261 262 .. code-block:: shell-session 263 264 # bpftool prog dump xlated id 1 265 0: (85) call pc+2#bpf_prog_3b185187f1855c4c_F 266 1: (b7) r0 = 1 267 2: (95) exit 268 3: (b7) r0 = 2 269 4: (95) exit 270 271 In the case of tail calls, the kernel maps them into a single instruction 272 internally, bpftool will still correlate them as a helper call for ease 273 of debugging: 274 275 .. code-block:: shell-session 276 277 # bpftool prog dump xlated id 2 278 [...] 279 10: (b7) r2 = 8 280 11: (85) call bpf_trace_printk#-41312 281 12: (bf) r1 = r6 282 13: (18) r2 = map[id:1] 283 15: (b7) r3 = 0 284 16: (85) call bpf_tail_call#12 285 17: (b7) r1 = 42 286 18: (6b) *(u16 *)(r6 +46) = r1 287 19: (b7) r0 = 0 288 20: (95) exit 289 290 # bpftool map show id 1 291 1: prog_array flags 0x0 292 key 4B value 4B max_entries 1 memlock 4096B 293 294 Dumping an entire map is possible through the ``map dump`` subcommand 295 which iterates through all present map elements and dumps the key / 296 value pairs. 297 298 If no BTF (BPF Type Format) data is available for a given map, then 299 the key / value pairs are dumped as hex: 300 301 .. code-block:: shell-session 302 303 # bpftool map dump id 5 304 key: 305 f0 0d 00 00 00 00 00 00 0a 66 00 00 00 00 8a d6 306 02 00 00 00 307 value: 308 00 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 309 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 310 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 311 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 312 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 313 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 314 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 315 key: 316 0a 66 1c ee 00 00 00 00 00 00 00 00 00 00 00 00 317 01 00 00 00 318 value: 319 00 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 320 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 321 [...] 322 Found 6 elements 323 324 However, with BTF, the map also holds debugging information about 325 the key and value structures. For example, BTF in combination with 326 BPF maps and the BPF_ANNOTATE_KV_PAIR() macro from iproute2 will 327 result in the following dump (``test_xdp_noinline.o`` from kernel 328 selftests): 329 330 .. code-block:: shell-session 331 332 # cat tools/testing/selftests/bpf/test_xdp_noinline.c 333 [...] 334 struct ctl_value { 335 union { 336 __u64 value; 337 __u32 ifindex; 338 __u8 mac[6]; 339 }; 340 }; 341 342 struct bpf_map_def __attribute__ ((section("maps"), used)) ctl_array = { 343 .type = BPF_MAP_TYPE_ARRAY, 344 .key_size = sizeof(__u32), 345 .value_size = sizeof(struct ctl_value), 346 .max_entries = 16, 347 .map_flags = 0, 348 }; 349 BPF_ANNOTATE_KV_PAIR(ctl_array, __u32, struct ctl_value); 350 351 [...] 352 353 The BPF_ANNOTATE_KV_PAIR() macro forces a map-specific ELF section 354 containing an empty key and value, this enables the iproute2 BPF loader 355 to correlate BTF data with that section and thus allows to choose the 356 corresponding types out of the BTF for loading the map. 357 358 Compiling through LLVM and generating BTF through debugging information 359 by ``pahole``: 360 361 .. code-block:: shell-session 362 363 # clang [...] -O2 --target=bpf -g -emit-llvm -c test_xdp_noinline.c -o - | 364 llc -march=bpf -mcpu=probe -mattr=dwarfris -filetype=obj -o test_xdp_noinline.o 365 # pahole -J test_xdp_noinline.o 366 367 Now loading into kernel and dumping the map via bpftool: 368 369 .. code-block:: shell-session 370 371 # ip -force link set dev lo xdp obj test_xdp_noinline.o sec xdp-test 372 # ip a 373 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 xdpgeneric/id:227 qdisc noqueue state UNKNOWN group default qlen 1000 374 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 375 inet 127.0.0.1/8 scope host lo 376 valid_lft forever preferred_lft forever 377 inet6 ::1/128 scope host 378 valid_lft forever preferred_lft forever 379 [...] 380 # bpftool prog show id 227 381 227: xdp tag a85e060c275c5616 gpl 382 loaded_at 2018-07-17T14:41:29+0000 uid 0 383 xlated 8152B not jited memlock 12288B map_ids 381,385,386,382,384,383 384 # bpftool map dump id 386 385 [{ 386 "key": 0, 387 "value": { 388 "": { 389 "value": 0, 390 "ifindex": 0, 391 "mac": [] 392 } 393 } 394 },{ 395 "key": 1, 396 "value": { 397 "": { 398 "value": 0, 399 "ifindex": 0, 400 "mac": [] 401 } 402 } 403 },{ 404 [...] 405 406 Lookup, update, delete, and 'get next key' operations on the map for specific 407 keys can be performed through bpftool as well. 408 409 If the BPF program has been successfully loaded with BTF debugging information, 410 the BTF ID will be shown in ``prog show`` command result denoted in ``btf_id``. 411 412 .. code-block:: shell-session 413 414 # bpftool prog show id 72 415 72: xdp name balancer_ingres tag acf44cabb48385ed gpl 416 loaded_at 2020-04-13T23:12:08+0900 uid 0 417 xlated 19104B jited 10732B memlock 20480B map_ids 126,130,131,127,129,128 418 btf_id 60 419 420 This can also be confirmed with ``btf show`` command which dumps all BTF 421 objects loaded on a system. 422 423 .. code-block:: shell-session 424 425 # bpftool btf show 426 60: size 12243B prog_ids 72 map_ids 126,130,131,127,129,128 427 428 And the subcommand ``btf dump`` can be used to check which debugging information 429 is included in the BTF. With this command, BTF dump can be formatted either 430 'raw' or 'c', the one that is used in C code. 431 432 .. code-block:: shell-session 433 434 # bpftool btf dump id 60 format c 435 [...] 436 struct ctl_value { 437 union { 438 __u64 value; 439 __u32 ifindex; 440 __u8 mac[6]; 441 }; 442 }; 443 444 typedef unsigned int u32; 445 [...] 446 447 .. admonition:: Video 448 :class: attention 449 450 To learn more about bpftool, check out `eCHO episode 11: Exploring 451 bpftool <https://www.youtube.com/watch?v=1EOLh3zzWP4&t=650s>`__ with Quentin Monnet, maintainer of bpftool. 452 453 Kernel Testing 454 -------------- 455 456 The Linux kernel ships a BPF selftest suite, which can be found in the kernel 457 source tree under ``tools/testing/selftests/bpf/``. 458 459 .. code-block:: shell-session 460 461 $ cd tools/testing/selftests/bpf/ 462 $ make 463 # make run_tests 464 465 The test suite contains test cases against the BPF verifier, program tags, 466 various tests against the BPF map interface and map types. It contains various 467 runtime tests from C code for checking LLVM back end, and eBPF as well as cBPF 468 asm code that is run in the kernel for testing the interpreter and JITs. 469 470 JIT Debugging 471 ------------- 472 473 For JIT developers performing audits or writing extensions, each compile run 474 can output the generated JIT image into the kernel log through: 475 476 .. code-block:: shell-session 477 478 # echo 2 > /proc/sys/net/core/bpf_jit_enable 479 480 Whenever a new BPF program is loaded, the JIT compiler will dump the output, 481 which can then be inspected with ``dmesg``, for example: 482 483 :: 484 485 [ 3389.935842] flen=6 proglen=70 pass=3 image=ffffffffa0069c8f from=tcpdump pid=20583 486 [ 3389.935847] JIT code: 00000000: 55 48 89 e5 48 83 ec 60 48 89 5d f8 44 8b 4f 68 487 [ 3389.935849] JIT code: 00000010: 44 2b 4f 6c 4c 8b 87 d8 00 00 00 be 0c 00 00 00 488 [ 3389.935850] JIT code: 00000020: e8 1d 94 ff e0 3d 00 08 00 00 75 16 be 17 00 00 489 [ 3389.935851] JIT code: 00000030: 00 e8 28 94 ff e0 83 f8 01 75 07 b8 ff ff 00 00 490 [ 3389.935852] JIT code: 00000040: eb 02 31 c0 c9 c3 491 492 ``flen`` is the length of the BPF program (here, 6 BPF instructions), and ``proglen`` 493 tells the number of bytes generated by the JIT for the opcode image (here, 70 bytes 494 in size). ``pass`` means that the image was generated in 3 compiler passes, for 495 example, ``x86_64`` can have various optimization passes to further reduce the image 496 size when possible. ``image`` contains the address of the generated JIT image, ``from`` 497 and ``pid`` the user space application name and PID respectively, which triggered the 498 compilation process. The dump output for eBPF and cBPF JITs is the same format. 499 500 In the kernel tree under ``tools/bpf/``, there is a tool called ``bpf_jit_disasm``. It 501 reads out the latest dump and prints the disassembly for further inspection: 502 503 .. code-block:: shell-session 504 505 # ./bpf_jit_disasm 506 70 bytes emitted from JIT compiler (pass:3, flen:6) 507 ffffffffa0069c8f + <x>: 508 0: push %rbp 509 1: mov %rsp,%rbp 510 4: sub $0x60,%rsp 511 8: mov %rbx,-0x8(%rbp) 512 c: mov 0x68(%rdi),%r9d 513 10: sub 0x6c(%rdi),%r9d 514 14: mov 0xd8(%rdi),%r8 515 1b: mov $0xc,%esi 516 20: callq 0xffffffffe0ff9442 517 25: cmp $0x800,%eax 518 2a: jne 0x0000000000000042 519 2c: mov $0x17,%esi 520 31: callq 0xffffffffe0ff945e 521 36: cmp $0x1,%eax 522 39: jne 0x0000000000000042 523 3b: mov $0xffff,%eax 524 40: jmp 0x0000000000000044 525 42: xor %eax,%eax 526 44: leaveq 527 45: retq 528 529 Alternatively, the tool can also dump related opcodes along with the disassembly. 530 531 .. code-block:: shell-session 532 533 # ./bpf_jit_disasm -o 534 70 bytes emitted from JIT compiler (pass:3, flen:6) 535 ffffffffa0069c8f + <x>: 536 0: push %rbp 537 55 538 1: mov %rsp,%rbp 539 48 89 e5 540 4: sub $0x60,%rsp 541 48 83 ec 60 542 8: mov %rbx,-0x8(%rbp) 543 48 89 5d f8 544 c: mov 0x68(%rdi),%r9d 545 44 8b 4f 68 546 10: sub 0x6c(%rdi),%r9d 547 44 2b 4f 6c 548 14: mov 0xd8(%rdi),%r8 549 4c 8b 87 d8 00 00 00 550 1b: mov $0xc,%esi 551 be 0c 00 00 00 552 20: callq 0xffffffffe0ff9442 553 e8 1d 94 ff e0 554 25: cmp $0x800,%eax 555 3d 00 08 00 00 556 2a: jne 0x0000000000000042 557 75 16 558 2c: mov $0x17,%esi 559 be 17 00 00 00 560 31: callq 0xffffffffe0ff945e 561 e8 28 94 ff e0 562 36: cmp $0x1,%eax 563 83 f8 01 564 39: jne 0x0000000000000042 565 75 07 566 3b: mov $0xffff,%eax 567 b8 ff ff 00 00 568 40: jmp 0x0000000000000044 569 eb 02 570 42: xor %eax,%eax 571 31 c0 572 44: leaveq 573 c9 574 45: retq 575 c3 576 577 More recently, ``bpftool`` adapted the same feature of dumping the BPF JIT 578 image based on a given BPF program ID already loaded in the system (see 579 bpftool section). 580 581 For performance analysis of JITed BPF programs, ``perf`` can be used as 582 usual. As a prerequisite, JITed programs need to be exported through kallsyms 583 infrastructure. 584 585 .. code-block:: shell-session 586 587 # echo 1 > /proc/sys/net/core/bpf_jit_enable 588 # echo 1 > /proc/sys/net/core/bpf_jit_kallsyms 589 590 Enabling or disabling ``bpf_jit_kallsyms`` does not require a reload of the 591 related BPF programs. Next, a small workflow example is provided for profiling 592 BPF programs. A crafted tc BPF program is used for demonstration purposes, 593 where perf records a failed allocation inside ``bpf_clone_redirect()`` helper. 594 Due to the use of direct write, ``bpf_try_make_head_writable()`` failed, which 595 would then release the cloned ``skb`` again and return with an error message. 596 ``perf`` thus records all ``kfree_skb`` events. 597 598 .. code-block:: shell-session 599 600 # tc qdisc add dev em1 clsact 601 # tc filter add dev em1 ingress bpf da obj prog.o sec main 602 # tc filter show dev em1 ingress 603 filter protocol all pref 49152 bpf 604 filter protocol all pref 49152 bpf handle 0x1 prog.o:[main] direct-action id 1 tag 8227addf251b7543 605 606 # cat /proc/kallsyms 607 [...] 608 ffffffffc00349e0 t fjes_hw_init_command_registers [fjes] 609 ffffffffc003e2e0 d __tracepoint_fjes_hw_stop_debug_err [fjes] 610 ffffffffc0036190 t fjes_hw_epbuf_tx_pkt_send [fjes] 611 ffffffffc004b000 t bpf_prog_8227addf251b7543 612 613 # perf record -a -g -e skb:kfree_skb sleep 60 614 # perf script --kallsyms=/proc/kallsyms 615 [...] 616 ksoftirqd/0 6 [000] 1004.578402: skb:kfree_skb: skbaddr=0xffff9d4161f20a00 protocol=2048 location=0xffffffffc004b52c 617 7fffb8745961 bpf_clone_redirect (/lib/modules/4.10.0+/build/vmlinux) 618 7fffc004e52c bpf_prog_8227addf251b7543 (/lib/modules/4.10.0+/build/vmlinux) 619 7fffc05b6283 cls_bpf_classify (/lib/modules/4.10.0+/build/vmlinux) 620 7fffb875957a tc_classify (/lib/modules/4.10.0+/build/vmlinux) 621 7fffb8729840 __netif_receive_skb_core (/lib/modules/4.10.0+/build/vmlinux) 622 7fffb8729e38 __netif_receive_skb (/lib/modules/4.10.0+/build/vmlinux) 623 7fffb872ae05 process_backlog (/lib/modules/4.10.0+/build/vmlinux) 624 7fffb872a43e net_rx_action (/lib/modules/4.10.0+/build/vmlinux) 625 7fffb886176c __do_softirq (/lib/modules/4.10.0+/build/vmlinux) 626 7fffb80ac5b9 run_ksoftirqd (/lib/modules/4.10.0+/build/vmlinux) 627 7fffb80ca7fa smpboot_thread_fn (/lib/modules/4.10.0+/build/vmlinux) 628 7fffb80c6831 kthread (/lib/modules/4.10.0+/build/vmlinux) 629 7fffb885e09c ret_from_fork (/lib/modules/4.10.0+/build/vmlinux) 630 631 The stack trace recorded by ``perf`` will then show the ``bpf_prog_8227addf251b7543()`` 632 symbol as part of the call trace, meaning that the BPF program with the 633 tag ``8227addf251b7543`` was related to the ``kfree_skb`` event, and 634 such program was attached to netdevice ``em1`` on the ingress hook as 635 shown by tc. 636 637 Introspection 638 ------------- 639 640 The Linux kernel provides various tracepoints around BPF and XDP which 641 can be used for additional introspection, for example, to trace interactions 642 of user space programs with the bpf system call. 643 644 Tracepoints for BPF: 645 646 .. code-block:: shell-session 647 648 # perf list | grep bpf: 649 bpf:bpf_map_create [Tracepoint event] 650 bpf:bpf_map_delete_elem [Tracepoint event] 651 bpf:bpf_map_lookup_elem [Tracepoint event] 652 bpf:bpf_map_next_key [Tracepoint event] 653 bpf:bpf_map_update_elem [Tracepoint event] 654 bpf:bpf_obj_get_map [Tracepoint event] 655 bpf:bpf_obj_get_prog [Tracepoint event] 656 bpf:bpf_obj_pin_map [Tracepoint event] 657 bpf:bpf_obj_pin_prog [Tracepoint event] 658 bpf:bpf_prog_get_type [Tracepoint event] 659 bpf:bpf_prog_load [Tracepoint event] 660 bpf:bpf_prog_put_rcu [Tracepoint event] 661 662 Example usage with ``perf`` (alternatively to ``sleep`` example used here, 663 a specific application like ``tc`` could be used here instead, of course): 664 665 .. code-block:: shell-session 666 667 # perf record -a -e bpf:* sleep 10 668 # perf script 669 sock_example 6197 [005] 283.980322: bpf:bpf_map_create: map type=ARRAY ufd=4 key=4 val=8 max=256 flags=0 670 sock_example 6197 [005] 283.980721: bpf:bpf_prog_load: prog=a5ea8fa30ea6849c type=SOCKET_FILTER ufd=5 671 sock_example 6197 [005] 283.988423: bpf:bpf_prog_get_type: prog=a5ea8fa30ea6849c type=SOCKET_FILTER 672 sock_example 6197 [005] 283.988443: bpf:bpf_map_lookup_elem: map type=ARRAY ufd=4 key=[06 00 00 00] val=[00 00 00 00 00 00 00 00] 673 [...] 674 sock_example 6197 [005] 288.990868: bpf:bpf_map_lookup_elem: map type=ARRAY ufd=4 key=[01 00 00 00] val=[14 00 00 00 00 00 00 00] 675 swapper 0 [005] 289.338243: bpf:bpf_prog_put_rcu: prog=a5ea8fa30ea6849c type=SOCKET_FILTER 676 677 For the BPF programs, their individual program tag is displayed. 678 679 For debugging, XDP also has a tracepoint that is triggered when exceptions are raised: 680 681 .. code-block:: shell-session 682 683 # perf list | grep xdp: 684 xdp:xdp_exception [Tracepoint event] 685 686 Exceptions are triggered in the following scenarios: 687 688 * The BPF program returned an invalid / unknown XDP action code. 689 * The BPF program returned with ``XDP_ABORTED`` indicating a non-graceful exit. 690 * The BPF program returned with ``XDP_TX``, but there was an error on transmit, 691 for example, due to the port not being up, due to the transmit ring being full, 692 due to allocation failures, etc. 693 694 Both tracepoint classes can also be inspected with a BPF program itself 695 attached to one or more tracepoints, collecting further information 696 in a map or punting such events to a user space collector through the 697 ``bpf_perf_event_output()`` helper, for example. 698 699 Tracing pipe 700 ------------ 701 702 When a BPF program makes a call to ``bpf_trace_printk()``, the output is sent 703 to the kernel tracing pipe. Users may read from this file to consume events 704 that are traced to this buffer: 705 706 .. code-block:: shell-session 707 708 # tail -f /sys/kernel/debug/tracing/trace_pipe 709 ... 710 711 712 Miscellaneous 713 ------------- 714 715 BPF programs and maps are memory accounted against ``RLIMIT_MEMLOCK`` similar 716 to ``perf``. The currently available size in unit of system pages which may be 717 locked into memory can be inspected through ``ulimit -l``. The setrlimit system 718 call man page provides further details. 719 720 The default limit is usually insufficient to load more complex programs or 721 larger BPF maps, so that the BPF system call will return with ``errno`` 722 of ``EPERM``. In such situations a workaround with ``ulimit -l unlimited`` or 723 with a sufficiently large limit could be performed. The ``RLIMIT_MEMLOCK`` is 724 mainly enforcing limits for unprivileged users. Depending on the setup, 725 setting a higher limit for privileged users is often acceptable.