github.com/google/syzkaller@v0.0.0-20240517125934-c0f1611a36d6/docs/syscall_descriptions.md (about)

     1  # Syscall descriptions
     2  
     3  `syzkaller` uses declarative description of syscall interfaces to manipulate
     4  programs (sequences of syscalls). Below you can see (hopefully self-explanatory)
     5  excerpt from the descriptions:
     6  
     7  ```
     8  open(file filename, flags flags[open_flags], mode flags[open_mode]) fd
     9  read(fd fd, buf buffer[out], count len[buf])
    10  close(fd fd)
    11  open_mode = S_IRUSR, S_IWUSR, S_IXUSR, S_IRGRP, S_IWGRP, S_IXGRP, S_IROTH, S_IWOTH, S_IXOTH
    12  ```
    13  
    14  The descriptions are contained in `sys/$OS/*.txt` files.
    15  For example see the [sys/linux/dev_snd_midi.txt](/sys/linux/dev_snd_midi.txt) file
    16  for descriptions of the Linux MIDI interfaces.
    17  
    18  A more formal description of the description syntax can be found [here](syscall_descriptions_syntax.md).
    19  
    20  ## Programs
    21  
    22  The translated descriptions are then used to generate, mutate, execute, minimize, serialize
    23  and deserialize programs. A program is a sequences of syscalls with concrete values for arguments.
    24  Here is an example (of a textual representation) of a program:
    25  
    26  ```
    27  r0 = open(&(0x7f0000000000)="./file0", 0x3, 0x9)
    28  read(r0, &(0x7f0000000000), 42)
    29  close(r0)
    30  ```
    31  
    32  For actual manipulations `syzkaller` uses in-memory AST-like representation consisting of
    33  `Call` and `Arg` values defined in [prog/prog.go](/prog/prog.go). That representation is used to
    34  [analyze](/prog/analysis.go), [generate](/prog/rand.go), [mutate](/prog/mutation.go),
    35  [minimize](/prog/minimization.go), [validate](/prog/validation.go), etc programs.
    36  
    37  The in-memory representation can be [transformed](/prog/encoding.go) to/from
    38  textual form to store in on-disk corpus, show to humans, etc.
    39  
    40  There is also another [binary representation](/prog/decodeexec.go)
    41  of the programs (called `exec`), that is much simpler, does not contain rich type information (irreversible)
    42  and is used for actual execution (interpretation) of programs by [executor](/executor/executor.cc).
    43  
    44  ## Describing new system calls
    45  
    46  This section describes how to extend syzkaller to allow fuzz testing of more kernel interfaces.
    47  This is particularly useful for kernel developers who are proposing new system calls.
    48  
    49  Currently all syscall descriptions are manually-written. There is an
    50  [open issue](https://github.com/google/syzkaller/issues/590) to provide some aid
    51  for this process and some ongoing work, but we are not there yet to have a
    52  fully-automated way to generate descriptions.
    53  There is a helper [headerparser](headerparser_usage.md) utility that can auto-generate
    54  some parts of descriptions from header files. Visual Studio Code has [syz-lang extension](https://marketplace.visualstudio.com/items?itemName=AndreyArtemiev.syzlang-extension&ssr=false#overview) for highlighting syntax.
    55  
    56  To enable fuzzing of a new kernel interface:
    57  
    58  1. Study the interface, find out which syscalls are required to use it. Sometimes there is nothing besides the source code, but here are some things that may help:
    59  
    60     - Searching the Internet for the interface name and/or some unique constants.
    61     - Grepping Documentation/ dir in the kernel.
    62     - Searching tools/testing/ dir in the kernel.
    63     - Looking for large comment blocks in the source code.
    64     - Finding commit that added the interface via `git blame` or `git log` and reading the commit description.
    65     - Reading source code of or tracing libraries or applications that are known to use this interface.
    66  
    67  2. Using [syntax documentation](syscall_descriptions_syntax.md) and
    68     [existing descriptions](/sys/linux/) as an example, add a declarative
    69     description of this interface to the appropriate file:
    70  
    71      - `sys/linux/<subsystem>.txt` files hold system calls for particular kernel
    72        subsystems, for example [bpf.txt](/sys/linux/bpf.txt) or [socket.txt](/sys/linux/socket.txt).
    73      - [sys/linux/sys.txt](/sys/linux/sys.txt) holds descriptions for more general system calls.
    74      - An entirely new subsystem can be added as a new `sys/linux/<new>.txt` file.
    75      - If subsystem descriptions are split across multiple files, prefix the name of each file with the name of the subsystem (e.g. use `dev_*.txt` for descriptions of `/dev/` devices, use `socket_*.txt` for sockets, etc).
    76  
    77  3. After adding/changing descriptions run:
    78  
    79      ``` bash
    80      make extract TARGETOS=linux SOURCEDIR=$KSRC
    81      make generate
    82      make
    83      ```
    84  
    85  4. Run syzkaller. Make sure that the newly added interface in being reached by
    86     syzkaller using the [coverage](coverage.md) information page.
    87  
    88  In the instructions above `make extract` generates/updates the `*.const` files.
    89  `$KSRC` should point to the _latest_ kernel checkout.\
    90  _Note_: for Linux the _latest_ kernel checkout generally means the
    91  [mainline](https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/log/) tree.\
    92  However, in some cases we add descriptions for interfaces that are not in the mainline tree yet,
    93  so if `make extract` complains about missing header files or constants undefined on all architectures,
    94  try to use the latest [linux-next](https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/log/)
    95  tree (or if it happens to be broken at the moment, try a slightly older linux-next tree).\
    96  _Note_: `make extract` overwrites `.config` in `$KSRC` and `mrproper`'s it.
    97  _Note_: `*.const` files are checked-in with the `*.txt` changes in the same commit.
    98  
    99  Then `make generate` updates generated code and `make` rebuilds binaries.\
   100  Note: `make generate` does not require any kernel sources, native compilers, etc
   101  and is pure text processing.
   102  Note: `make generate` also updates the SYZ_REVISION under `executor/defs.h`, which
   103  is required for machine check while running syz-manager. This should be taken care
   104  of especially if you are trying to rebase with your own change on syscall description.
   105  
   106  Note: `make extract` extracts constants for all architectures which requires
   107  installed cross-compilers. If you get errors about missing compilers/libraries,
   108  try `sudo make install_prerequisites` or install equivalent package for your distro.
   109  Note: `sudo make install_prerequisites` will success even with some package failed to
   110  install, `sudo apt-get update && sudo apt-get upgrade` might be required to make this
   111  more efficient.
   112  
   113  If you want to fuzz only the new subsystem that you described locally, you may
   114  find the `enable_syscalls` configuration parameter useful to specifically target
   115  the new system calls. All system calls in the `enable_syscalls` list
   116  will be enabled if their requirements are met (ie. if they are supported
   117  in the target machine and any other system calls that need to run in
   118  order to provide inputs for them are also enabled). You can also include
   119  wildcard definitions to enable multiple system calls in a single line,
   120  for example: `"ioctl"` will enable all the described ioctls syscalls
   121  that have their requirements met, ``"ioctl$UDMABUF_CREATE"`` enables
   122  only that particular ioctl call, ``"write$UHID_*"`` enables all write
   123  system calls that start with that description identifier.
   124  
   125  When updating existing syzkaller descriptions, note, that unless there's a drastic
   126  change in descriptions for a particular syscall, the programs that are already in
   127  the corpus will be kept there, unless you manually clear them out (for example by
   128  removing the `corpus.db` file).
   129  
   130  <div id="tips"/>
   131  
   132  ## Description tips and FAQ
   133  
   134  <div id="names"/>
   135  
   136  ### Syscall, struct, field, flags names
   137  
   138  Stick with existing kernel names for things, don't invent new names if possible.
   139  
   140  Following established naming conventions provides the following benefits:
   141  (1) consistency and familiarity of names used across kernel interfaces,
   142  which also enables searching kernel sources for related names; and
   143  (2) enable static checking of descriptions (e.g. missed flags or mistyped fields)
   144  with [syz-check](/tools/syz-check/check.go).
   145  
   146  For example, if there is an existing enum `v4l2_buf_type` in the kernel headers,
   147  use this name for flags in descriptions as well. The same for structs, unions,
   148  fields, etc. For syscall and struct variants, append the variant name after the `$` sign.
   149  For example, `fcntl$F_GET_RW_HINT`, `ioctl$FIOCLEX`, `setsockopt$SO_TIMESTAMP`.
   150  
   151  <div id="ordering"/>
   152  
   153  ### Resources for syscall ordering
   154  
   155  Resources and resource directions (`in`, `out`, `inout`) impose implicit ordering
   156  constraints on involved syscalls.
   157  
   158  If a syscall accepts a resource of a particular type (e.g. has `fd_cdrom` as an input),
   159  then it will be generally placed after a syscall that has this resource as output,
   160  so that the resource value can be passed between syscalls. For example:
   161  
   162  ```
   163  r0 = openat$cdrom(...)
   164  ioctl$CDROMPAUSE(r0, 0x123)
   165  close(r0)
   166  ```
   167  
   168  Syscall arguments are always `in`, return values are `out` and pointer indirections
   169  have explicit direction as `ptr` type attribute. Also, it is possible to specify
   170  direction attribute individually for struct fields to account for more complex
   171  producer/consumer scenarios with structs that include both input/output resources.
   172  
   173  <div id="values"/>
   174  
   175  ### Use of unexpected/undeclared values
   176  
   177  When specifying integer/string flags or integer fields stick with the official expected values only.
   178  
   179  Commonly, bugs are triggered by unexpected inputs. With that in mind, it can be too tempting to introduce
   180  some unexpected values to descriptions (e.g. `-1` or `INT_MAX`). This is not encouraged for several reasons.
   181  First, this is a cross-cutting aspect and these special unexpected values are applicable to just
   182  any flags and integer fields. Manually specifying them thousands of times is not scalable and
   183  is not maintainable. Second, It's hard for the fuzzer to come up with correct complex syscall sequences,
   184  and the descriptions are meant to help with this. Coming up with unexpected integer values is easy
   185  and the fuzzer does not need help here. Overall the idea is to improve the generic fuzzer logic
   186  to handle these cases better, which will help all descriptions, rather than over-specializing each
   187  individual integer separately. Fuzzer already has several tricks to deal with this, e.g. comparison
   188  operand value interception and list of typical magic values.
   189  
   190  Note: some values for flags may be undocumented only as an oversight. These values should be added to descriptions.
   191  
   192  <div id="flags"/>
   193  
   194  ### Flags/enums
   195  
   196  The `flags` type is used for all of:
   197  
   198   - sets of mutually exclusive values, where only one of them should be chosen (like C enum);
   199   - sets of bit flags, where multiple values can be combined with bitwise OR (like mmap flags);
   200   - any combination of the above.
   201  
   202  The fuzzer has logic to distinguish enums and bit flags, and generates values
   203  accordingly. So the general guideline is just to enumerate the meaningful values
   204  in `flags` without adding any "special" values to "help" the current fuzzer logic.
   205  When/if the fuzzer logic changes/improves, these manual additions may become
   206  unnecessary, or, worse, interfere with the fuzzer ability to generate good values.
   207  
   208  <div id="order"/>
   209  
   210  ### Declaration order
   211  
   212  `syzlang` does not require declaring entities before use (like C/C++ does), entities can refer to entities
   213  declared later (like in Go). It's recommended to declare things in the order of importance so that the reader
   214  sees the most important things first and then proceeds to finer and finer implementation details. For example,
   215  system calls usually should go before flag declarations used in these system calls. Note: this order is usually
   216  the exact opposite of how things are declared in C: the least important things go first.
   217  
   218  ## Description compilation internals
   219  
   220  The process of compiling the textual syscall descriptions into machine-usable
   221  form used by `syzkaller` to actually generate programs consists of 2 steps.
   222  
   223  The first step is extraction of values of symbolic constants from kernel sources using
   224  [syz-extract](/sys/syz-extract) utility. `syz-extract` generates a small C program that
   225  includes kernel headers referenced by `include` directives, defines macros as specified
   226  by `define` directives and prints values of symbolic constants.
   227  Results are stored in `.const` files, one per arch.
   228  For example, [sys/linux/dev_ptmx.txt](/sys/linux/dev_ptmx.txt) is translated into
   229  [sys/linux/dev_ptmx.txt.const](/sys/linux/dev_ptmx.txt.const).
   230  
   231  The second step is translation of descriptions into Go code using
   232  [syz-sysgen](/sys/syz-sysgen) utility (the actual compiler code lives in
   233  [pkg/ast](/pkg/ast/) and [pkg/compiler](/pkg/compiler/)).
   234  This step uses syscall descriptions and the const files generated during the first step
   235  and produces instantiations of `Syscall` and `Type` types defined in [prog/types.go](/prog/types.go).
   236  You can see an example of the compiler output for Linux/AMD64 in `sys/linux/gen/amd64.go`.
   237  This step also generates some minimal syscall metadata for C++ code in `executor/syscalls.h`.
   238  
   239  ## Non-mainline subsystems
   240  
   241  `make extract` extracts constants for all `*.txt` files and for all supported architectures.
   242  This may not work for subsystems that are not present in mainline kernel or if you have
   243  problems with native kernel compilers, etc. In such cases the `syz-extract` utility
   244  used by `make extract` can be run manually for single file/arch as:
   245  
   246  ```
   247  make bin/syz-extract
   248  bin/syz-extract -os linux -arch $ARCH -sourcedir $KSRC -builddir $LINUXBLD <new>.txt
   249  make generate
   250  make
   251  ```
   252  
   253  `$ARCH` is one of `amd64`, `386` `arm64`, `arm`, `ppc64le`, `mips64le`.
   254  If the subsystem is supported on several architectures, then run `syz-extract` for each arch.
   255  `$LINUX` should point to kernel source checkout, which is configured for the
   256  corresponding arch (i.e. you need to run `make ARCH=arch someconfig && make ARCH=arch` there first,
   257  remember to add `CROSS_COMPILE=arm-linux-gnueabi-/aarch64-linux-gnu-/powerpc64le-linux-gnu-` if needed).
   258  If the kernel was built into a separate directory (with `make O=output_dir`, remember to put .config
   259  into output_dir, this will be helpful if you'd like to work on different arch at the same time)
   260  then also set `$LINUXBLD` to the location of the build directory.
   261  
   262  <div id="testing"/>
   263  
   264  ### Testing of descriptions
   265  
   266  Descriptions themselves may contain bugs. After running `syz-manager` with the new descriptions
   267  it's always useful to check the kernel code coverage report available in the `syz-manager` web UI.
   268  The report allows to assess if everything one expects to be covered is in fact covered,
   269  and if not, where the fuzzer gets stuck. However, this is a useful but quite indirect assessment
   270  of the descriptions correctness. The fuzzer may get around some bugs in the descriptions by diverging
   271  from what the descriptions say, but it makes it considerably harder for the fuzzer to progress.
   272  
   273  Tests stored in `sys/OS/test/*` provide a more direct testing of the descriptions. Each test is just
   274  a program with checked syscall return values. The syntax of the programs is briefly described [here](program_syntax.md).
   275  You can also look at the [existing examples](/sys/linux/test) and at the program [deserialization code](/prog/encoding.go).
   276  `AUTO` keyword can be used as a value for consts and pointers, for pointers it will lead to
   277  some reasonable sequential allocation of memory addresses.
   278  
   279  It's always good to add a test at least for "the main successful scenario" for the subsystem.
   280  It will ensure that the descriptions are actually correct and that it's possible for the fuzzer
   281  to come up with the successful scenario. See [io_uring test](/sys/linux/test/io_uring) as a good example.
   282  
   283  The tests can be run with the `syz-runtest` utility as:
   284  ```
   285  make runtest && bin/syz-runtest -config manager.config
   286  ```
   287  `syz-runtest` boots multiple VMs and runs these tests in different execution modes inside of the VMs.
   288  
   289  However, full `syz-runtest` run takes time, so while developing the test, it's more handy to run it
   290  using the `syz-execprog` utility. To run the test, copy `syz-execprog`, `syz-executor` and the test
   291  into a manually booted VM and then run the following command inside of the VM:
   292  ```
   293  syz-execprog -debug -threaded=0 mytest
   294  ```
   295  It will show results of all executed syscalls. It's also handy for manual debugging of pseudo-syscall code:
   296  if you add some temporal `debug` calls to the pseudo-syscall, `syz-execprog -debug` will show their output.
   297  
   298  The test syntax can be checked by running:
   299  ```
   300  go test -run=TestParsing ./pkg/runtest
   301  ```