github.com/google/syzkaller@v0.0.0-20251211124644-a066d2bc4b02/docs/syscall_descriptions_syntax.md (about)

     1  # Syscall description language
     2  
     3  aka `syzlang` (`[siːzˈlæŋg]`)
     4  
     5  Pseudo-formal grammar of syscall description:
     6  
     7  ```
     8  syscallname "(" [arg ["," arg]*] ")" [type] ["(" attribute* ")"]
     9  arg = argname type
    10  argname = identifier
    11  type = typename [ "[" type-options "]" ]
    12  typename = "const" | "intN" | "intptr" | "flags" | "array" | "ptr" |
    13  	   "string" | "filename" | "glob" | "len" |
    14  	   "bytesize" | "bytesizeN" | "bitsize" | "vma" | "proc" |
    15  	   "compressed_image"
    16  type-options = [type-opt ["," type-opt]]
    17  ```
    18  
    19  common type-options include:
    20  
    21  ```
    22  "opt" - the argument is optional (like mmap fd argument, or accept peer argument)
    23  ```
    24  
    25  rest of the type-options are type-specific:
    26  
    27  ```
    28  "const": integer constant, type-options:
    29  	value, underlying type (one of "intN", "intptr")
    30  "intN"/"intptr": an integer without a particular meaning, type-options:
    31  	either an optional range of values (e.g. "5:10", or "100:200")
    32  	or a reference to flags description (see below),
    33  	or a single value
    34  	optionally followed by an alignment parameter if using a range
    35  "flags": a set of values, type-options:
    36  	reference to flags description (see below), underlying int type (e.g. "int32")
    37  "array": a variable/fixed-length array, type-options:
    38  	type of elements, optional size (fixed "5", or ranged "5:10", boundaries inclusive)
    39  "ptr"/"ptr64": a pointer to an object, type-options:
    40  	direction (in/out/inout); type of the object
    41  	ptr64 has size of 8 bytes regardless of target pointer size
    42  "string": a zero-terminated memory buffer (no pointer indirection implied), type-options:
    43  	either a string value in quotes for constant strings (e.g. "foo" or `deadbeef` for hex literal),
    44  	or a reference to string flags (special value `filename` produces file names),
    45  	optionally followed by a buffer size (string values will be padded with \x00 to that size)
    46  "stringnoz": a non-zero-terminated memory buffer (no pointer indirection implied), type-options:
    47  	either a string value in quotes for constant strings (e.g. "foo" or `deadbeef` for hex literal),
    48  	or a reference to string flags,
    49  "glob": glob pattern to match on the target files, type-options:
    50  	a pattern string in quotes (syntax: https://golang.org/pkg/path/filepath/#Match)
    51  	(e.g. "/sys/" or "/sys/**/*"),
    52  	or include exclude glob too (e.g. "/sys/**/*:-/sys/power/state")
    53  "fmt": a string representation of an integer (not zero-terminated), type-options:
    54  	format (one of "dec", "hex", "oct") and the value (a resource, int, flags or proc)
    55  	the resulting data is always fixed-size (formatted as "%020llu", "0x%016llx" or "%023llo", respectively)
    56  "len": length of another field (for array it is number of elements), type-options:
    57  	argname of the object
    58  "bytesize": similar to "len", but always denotes the size in bytes, type-options:
    59  	argname of the object
    60  "bitsize": similar to "len", but always denotes the size in bits, type-options:
    61  	argname of the object
    62  "offsetof": offset of the field from the beginning of the parent struct, type-options:
    63  	field
    64  "vma"/"vma64": a pointer to a set of pages (used as input for mmap/munmap/mremap/madvise), type-options:
    65  	optional number of pages (e.g. vma[7]), or a range of pages (e.g. vma[2-4])
    66  	vma64 has size of 8 bytes regardless of target pointer size
    67  "proc": per process int (see description below), type-options:
    68  	value range start, how many values per process, underlying type
    69  "compressed_image": zlib-compressed disk image
    70  	syscalls accepting compressed images must be marked with `no_generate`
    71  	and `no_minimize` call attributes. if the content of the decompressed image
    72  	can be checked by a `fsck`-like command, use the `fsck` syscall attribute
    73  "text": machine code of the specified type, type-options:
    74  	text type (x86_real, x86_16, x86_32, x86_64, arm64)
    75  "void": type with static size 0
    76  	mostly useful inside of templates and varlen unions, can't be syscall argument
    77  ```
    78  
    79  flags/len/flags also have trailing underlying type type-option when used in structs/unions/pointers.
    80  
    81  Flags are described as:
    82  
    83  ```
    84  flagname = const ["," const]*
    85  ```
    86  
    87  or for string flags as:
    88  
    89  ```
    90  flagname = "\"" literal "\"" ["," "\"" literal "\""]*
    91  ```
    92  
    93  Call attributes are:
    94  
    95  ```
    96  "disabled": the call will not be used in fuzzing; useful to temporary disable some calls
    97  	or prohibit particular argument combinations.
    98  "timeout[N]": additional execution timeout (in ms) for the call on top of some default value
    99  "prog_timeout[N]": additional execution timeout (in ms) for the whole program if it contains this call;
   100  	if a program contains several such calls, the max value is used.
   101  "ignore_return": ignore return value of this syscall in fallback feedback; need to be used for calls
   102  	that don't return fixed error codes but rather something else (e.g. the current time).
   103  "breaks_returns": ignore return values of all subsequent calls in the program in fallback feedback (can't be trusted).
   104  "no_generate": do not try to generate this syscall, i.e. use only seed descriptions to produce it.
   105  "no_minimize": do not modify instances of this syscall when trying to minimize a crashing program.
   106  "no_squash": do not attempt to pass squashed arguments to this syscall.
   107  	Without that, the fuzzer will sometimes attempt to replace complex structures with arrays of bytes,
   108  	possibly triggering interesting mutations, but also making programs hard to reason about.
   109  "fsck": the content of the compressed buffer argument for this syscall is a file system and the
   110      string argument is a fsck-like command that will be called to verify the filesystem
   111  "remote_cover": wait longer to collect remote coverage for this call.
   112  "kfuzz_test": the call is a kfuzztest target
   113  ```
   114  
   115  ## Ints
   116  
   117  `int8`, `int16`, `int32` and `int64` denote an integer of the corresponding size.
   118  `intptr` denotes a pointer-sized integer, i.e. C `long` type.
   119  
   120  By appending `be` suffix (e.g. `int16be`) integers become big-endian.
   121  
   122  It's possible to specify a range of values for an integer in the format of `int32[0:100]` or `int32[0:4096, 512]` for a 512-aligned int.
   123  
   124  Integers can also take a reference to flags description or a value as its first type-option.
   125  In that case, the alignment parameter is not supported.
   126  
   127  To denote a bitfield of size N use `int64:N`.
   128  
   129  It's possible to use these various kinds of ints as base types for `const`, `flags`, `len` and `proc`.
   130  
   131  ```
   132  example_struct {
   133  	f0	int8			# random 1-byte integer
   134  	f1	const[0x42, int16be]	# const 2-byte integer with value 0x4200 (big-endian 0x42)
   135  	f2	int32[0:100]		# random 4-byte integer with values from 0 to 100 inclusive
   136  	f3	int32[1:10, 2]		# random 4-byte integer with values {1, 3, 5, 7, 9}
   137  	f4	int64:20		# random 20-bit bitfield
   138  	f5	int8[10]		# const 1-byte integer with value 10
   139  	f6	int32[flagname]		# random 4-byte integer from the set of values referenced by flagname
   140  }
   141  ```
   142  
   143  ## Structs
   144  
   145  Structs are described as:
   146  
   147  ```
   148  structname "{" "\n"
   149  	(fieldname type ("(" fieldattribute* ")")? (if[expression])? "\n")+
   150  "}" ("[" attribute* "]")?
   151  ```
   152  
   153  Fields can have attributes specified in parentheses after the field, independent
   154  of their type. `in/out/inout` attribute specify per-field direction, for example:
   155  
   156  ```
   157  foo {
   158  	field0	const[1, int32]	(in)
   159  	field1	int32		(inout)
   160  	field2	fd		(out)
   161  }
   162  ```
   163  
   164  You may specify conditions that determine whether a field will be included:
   165  
   166  ```
   167  foo {
   168  	field0	int32
   169  	field1	int32 (if[value[field0] == 0x1])
   170  }
   171  ```
   172  
   173  See [the corresponding section](syscall_descriptions_syntax.md#conditional-fields)
   174  for more details.
   175  
   176  `out_overlay` attribute allows to have separate input and output layouts for the struct.
   177  Fields before the `out_overlay` field are input, fields starting from `out_overlay` are output.
   178  Input and output fields overlap in memory (both start from the beginning of the struct in memory).
   179  For example:
   180  
   181  ```
   182  foo {
   183  	in0	const[1, int32]
   184  	in1	flags[bar, int8]
   185  	in2	ptr[in, string]
   186  	out0	fd	(out_overlay)
   187  	out1	int32
   188  }
   189  ```
   190  
   191  Structs can have attributes specified in square brackets after the struct.
   192  Attributes are:
   193  
   194  - `packed`: the struct does not have paddings between fields and has alignment 1; this is similar to GNU C `__attribute__((packed))`; struct alignment can be overridden with `align` attribute
   195  - `align[N]`: the struct has alignment N and padded up to multiple of `N`; contents of the padding are unspecified (though, frequently are zeros); similar to GNU C `__attribute__((aligned(N)))`
   196  - `size[N]`: the struct is padded up to the specified size `N`; contents of the padding are unspecified (though, frequently are zeros)
   197  
   198  ## Unions
   199  
   200  Unions are described as:
   201  
   202  ```
   203  unionname "[" "\n"
   204  	(fieldname type (if[expression])? "\n")+
   205  "]" ("[" attribute* "]")?
   206  ```
   207  
   208  During fuzzing, syzkaller randomly picks one of the union options.
   209  
   210  You may also specify conditions that determine whether the corresponding
   211  option may or may not be selected, depending on values of other fields. See
   212  [the corresponding section](syscall_descriptions_syntax.md#conditional-fields)
   213  for more details.
   214  
   215  Unions can have attributes specified in square brackets after the union.
   216  Attributes are:
   217  
   218  - `varlen`: union size is the size of the particular chosen option (not statically known); without this attribute unions are statically sized as maximum of all options (similar to C unions)
   219  - `size[N]`: the union is padded up to the specified size `N`; contents of the padding are unspecified (though, frequently are zeros)
   220  
   221  ## Resources
   222  
   223  Resources represent values that need to be passed from output of one syscall to input of another syscall. For example, `close` syscall requires an input value (fd) previously returned by `open` or `pipe` syscall. To achieve this, `fd` is declared as a resource. This is a way of modelling dependencies between syscalls, as defining a syscall as the producer of a resource and another syscall as the consumer defines a loose sense of ordering between them. Resources are described as:
   224  
   225  ```
   226  "resource" identifier "[" underlying_type "]" [ ":" const ("," const)* ]
   227  ```
   228  
   229  `underlying_type` is either one of `int8`, `int16`, `int32`, `int64`, `intptr` or another resource (which models inheritance, for example, a socket is a subtype of fd). The optional set of constants represent resource special values, for example, `0xffffffffffffffff` (-1) for "no fd", or `AT_FDCWD` for "the current dir". Special values are used once in a while as resource values. If no special values specified, special value of `0` is used. Resources can then be used as types, for example:
   230  
   231  ```
   232  resource fd[int32]: 0xffffffffffffffff, AT_FDCWD, 1000000
   233  resource sock[fd]
   234  resource sock_unix[sock]
   235  
   236  socket(...) sock
   237  accept(fd sock, ...) sock
   238  listen(fd sock, backlog int32)
   239  ```
   240  
   241  Resources don't have to be necessarily returned by a syscall. They can be used as any other data type. For example:
   242  
   243  ```
   244  resource my_resource[int32]
   245  
   246  request_producer(..., arg ptr[out, my_resource])
   247  request_consumer(..., arg ptr[inout, test_struct])
   248  
   249  test_struct {
   250  	...
   251  	attr	my_resource
   252  }
   253  ```
   254  
   255  For more complex producer/consumer scenarios, field attributes can be utilized.
   256  For example:
   257  
   258  ```
   259  resource my_resource_1[int32]
   260  resource my_resource_2[int32]
   261  
   262  request_produce1_consume2(..., arg ptr[inout, test_struct])
   263  
   264  test_struct {
   265  	...
   266  	field0	my_resource_1	(out)
   267  	field1	my_resource_2	(in)
   268  }
   269  ```
   270  
   271  Each resource type must be "produced" (used as an output) by at least one syscall
   272  (outside of unions and optional pointers) and "consumed" (used as an input)
   273  by at least one syscall.
   274  
   275  ## Type Aliases
   276  
   277  Complex types that are often repeated can be given short type aliases using the
   278  following syntax:
   279  
   280  ```
   281  type identifier underlying_type
   282  ```
   283  
   284  For example:
   285  
   286  ```
   287  type signalno int32[0:65]
   288  type net_port proc[20000, 4, int16be]
   289  ```
   290  
   291  Then, type alias can be used instead of the underlying type in any contexts.
   292  Underlying type needs to be described as if it's a struct field, that is,
   293  with the base type if it's required. However, type alias can be used as syscall
   294  arguments as well. Underlying types are currently restricted to integer types,
   295  `ptr`, `ptr64`, `const`, `flags` and `proc` types.
   296  
   297  There are some builtin type aliases:
   298  
   299  ```
   300  type bool8	int8[0:1]
   301  type bool16	int16[0:1]
   302  type bool32	int32[0:1]
   303  type bool64	int64[0:1]
   304  type boolptr	intptr[0:1]
   305  
   306  type fileoff[BASE] BASE
   307  
   308  type filename string[filename]
   309  
   310  type buffer[DIR] ptr[DIR, array[int8]]
   311  ```
   312  
   313  ## Type Templates
   314  
   315  Type templates can be declared as follows:
   316  
   317  ```
   318  type buffer[DIR] ptr[DIR, array[int8]]
   319  type fileoff[BASE] BASE
   320  type nlattr[TYPE, PAYLOAD] {
   321  	nla_len		len[parent, int16]
   322  	nla_type	const[TYPE, int16]
   323  	payload		PAYLOAD
   324  } [align_4]
   325  ```
   326  
   327  and later used as follows:
   328  
   329  ```
   330  syscall(a buffer[in], b fileoff[int64], c ptr[in, nlattr[FOO, int32]])
   331  ```
   332  
   333  There is builtin type template `optional` defined as:
   334  
   335  ```
   336  type optional[T] [
   337  	val	T
   338  	void	void
   339  ] [varlen]
   340  ```
   341  
   342  ## Length
   343  
   344  You can specify length of a particular field in struct or a named argument by
   345  using `len`, `bytesize` and `bitsize` types, for example:
   346  
   347  ```
   348  write(fd fd, buf ptr[in, array[int8]], count len[buf])
   349  
   350  sock_fprog {
   351  	len	len[filter, int16]
   352  	filter	ptr[in, array[sock_filter]]
   353  }
   354  ```
   355  
   356  If `len`'s argument is a pointer, then the length of the pointee argument is used.
   357  
   358  To denote the length of a field in N-byte words use `bytesizeN`, possible values
   359  for N are 1, 2, 4 and 8.
   360  
   361  To denote the length of the parent struct, you can use `len[parent, int8]`.
   362  To denote the length of the higher level parent when structs are embedded into
   363  one another, you can specify the type name of the particular parent:
   364  
   365  ```
   366  s1 {
   367      f0      len[s2]  # length of s2
   368  }
   369  
   370  s2 {
   371      f0      s1
   372      f1      array[int32]
   373      f2      len[parent, int32]
   374  }
   375  ```
   376  
   377  `len` argument can also be a path expression which allows more complex
   378  addressing. Path expressions are similar to C field references, but also allow
   379  referencing parent and sibling elements. A special reference `syscall` used
   380  in the beginning of the path allows to refer directly to the syscall arguments.
   381  For example:
   382  
   383  ```
   384  s1 {
   385  	a	ptr[in, s2]
   386  	b	ptr[in, s3]
   387  	c	array[int8]
   388  }
   389  
   390  s2 {
   391  	d	array[int8]
   392  }
   393  
   394  s3 {
   395  # This refers to the array c in the parent s1.
   396  	e	len[s1:c, int32]
   397  # This refers to the array d in the sibling s2.
   398  	f	len[s1:a:d, int32]
   399  # This refers to the array k in the child s4.
   400  	g	len[i:j, int32]
   401  # This refers to syscall argument l.
   402  	h	len[syscall:l, int32]
   403  	i	ptr[in, s4]
   404  }
   405  
   406  s4 {
   407  	j	array[int8]
   408  }
   409  
   410  foo(k ptr[in, s1], l ptr[in, array[int8]])
   411  ```
   412  
   413  ## Proc
   414  
   415  The `proc` type can be used to denote per process integers.
   416  The idea is to have a separate range of values for each executor, so they don't interfere.
   417  
   418  The simplest example is a port number.
   419  The `proc[20000, 4, int16be]` type means that we want to generate an `int16be`
   420  integer starting from `20000` and assign `4` values for each process.
   421  As a result the executor number `n` will get values in the `[20000 + n * 4, 20000 + (n + 1) * 4)` range.
   422  
   423  ## Integer Constants
   424  
   425  Integer constants can be specified as decimal literals, as `0x`-prefixed
   426  hex literals, as `'`-surrounded char literals, or as symbolic constants
   427  extracted from kernel headers or defined by `define` directives. For example:
   428  
   429  ```
   430  foo(a const[10], b const[-10])
   431  foo(a const[0xabcd])
   432  foo(a int8['a':'z'])
   433  foo(a const[PATH_MAX])
   434  foo(a int32[PATH_MAX])
   435  foo(a ptr[in, array[int8, MY_PATH_MAX]])
   436  define MY_PATH_MAX	PATH_MAX + 2
   437  ```
   438  
   439  ## Conditional fields
   440  
   441  ### In structures
   442  
   443  In syzlang, it's possible to specify a condition for every struct field that
   444  determines whether the field should be included or omitted:
   445  
   446  ```
   447  header_fields {
   448    magic       const[0xabcd, int16]
   449    haveInteger int8
   450  } [packed]
   451  
   452  packet {
   453    header  header_fields
   454    integer int64  (if[value[header:haveInteger] == 0x1])
   455    body    array[int8]
   456  } [packed]
   457  
   458  some_call(a ptr[in, packet])
   459  ```
   460  
   461  In this example, the `packet` structure will include the field `integer` only
   462  if `header.haveInteger == 1`. In memory, `packet` will have the following
   463  layout:
   464  
   465  | header.magic = 0xabcd | header.haveInteger = 0x1 | integer | body |
   466  | --------------------- | ------------------------ | ------- | ---- |
   467  
   468  That corresponds to e.g. the following program:
   469  
   470  ```
   471  some_call(&AUTO={{AUTO, 0x1}, @value=0xabcd, []})
   472  ```
   473  
   474  If `header.haveInteger` is not `1`, syzkaller will just pretend that the field
   475  `integer` does not exist.
   476  
   477  ```
   478  some_call(&AUTO={{AUTO, 0x0}, @void, []})
   479  ```
   480  
   481  | header.magic = 0xabcd | header.haveInteger = 0x0 | body |
   482  | --------------------- | ------------------------ | ---- |
   483  
   484  Every conditional field is assumed to be of variable length and so is the struct
   485  to which this field belongs.
   486  
   487  When a variable length field appears in the middle of a structure, the structure
   488  must be marked with `[packed].`
   489  
   490  Conditions on bitfields are prohibited:
   491  
   492  ```
   493  struct {
   494    f0 int
   495    f1 int:3 (if[value[f0] == 0x1])  # It will not compile.
   496  }
   497  ```
   498  
   499  But you may reference bitfields in your conditions:
   500  
   501  ```
   502  struct {
   503    f0 int:1
   504    f1 int:7
   505    f2 int   (if[value[f0] == value[f1]])
   506  } [packed]
   507  ```
   508  
   509  ### In unions
   510  
   511  Let's consider the following example.
   512  
   513  ```
   514  struct {
   515    type int
   516    body alternatives
   517  }
   518  
   519  alternatives [
   520    int     int64 (if[value[struct:type] == 0x1])
   521    arr     array[int64, 5] (if[value[struct:type] == 0x2])
   522    default int32
   523  ] [varlen]
   524  
   525  some_call(a ptr[in, struct])
   526  ```
   527  
   528  In this case, the union option will be selected depending on the value of the
   529  `type` field. For example, if `type` is `0x1`, then it can be either `int` or
   530  `default`:
   531  
   532  ```
   533  some_call(&AUTO={0x1, @int=0x123})
   534  some_call(&AUTO={0x1, @default=0x123})
   535  ```
   536  
   537  If `type` is `0x2`, it can be either `arr` or `default`.
   538  
   539  If `type` is neither `0x1` nor `0x2`, syzkaller may only select `default`:
   540  
   541  ```
   542  some_call(&AUTO={0x0, @default=0xabcd})
   543  ```
   544  
   545  To ensure that a union can always be constructed, the last union field **must always
   546  have no condition**.
   547  
   548  Thus, the following definition would fail to compile:
   549  
   550  ```
   551  alternatives [
   552    int int64 (if[value[struct:type] == 0x1])
   553    arr array[int64, 5] (if[value[struct:type] == 0x1])
   554  ] [varlen]
   555  ```
   556  
   557  During prog mutation and generation syzkaller will select a random union field
   558  whose condition is satisfied.
   559  
   560  ### Expression syntax
   561  
   562  Currently, only `==`, `!=`, `&` and `||` operators are supported. However, the
   563  functionality was designed in such a way that adding more operators is easy.
   564  Feel free to file a GitHub issue or write us an email in case it's needed.
   565  
   566  Expressions are evaluated as `int64` values. If the final result of an
   567  expression is not 0, it's assumed to be satisfied.
   568  
   569  If you want to reference a field's value, you can do it via
   570  `value[path:to:field]`, which is similar to the `len[]` argument.
   571  
   572  ```
   573  sub_struct {
   574    f0 int
   575    # Reference a field in a parent struct.
   576    f1 int (if[value[struct:f2]]) # Same as if[value[struct:f2] != 0].
   577  }
   578  
   579  struct {
   580    f2 int
   581    f3 sub_struct
   582    f4 int (if[value[f2] == 0x2]) # Reference a sibling field.
   583    f5 int (if[value[f3:f0] == 0x1]) # Reference a nested field.
   584    f6 int (if[value[f3:f0] == 0x1 || value[f3:f0] == 0x2]) # Reference a nested field which either equals to 0x1 or 0x2.
   585  } [packed]
   586  
   587  call(a ptr[in, struct])
   588  ```
   589  
   590  The referenced field must be of integer type and there must be no
   591  conditional fields in the path to it. For example, the following
   592  descriptions will not compile.
   593  
   594  ```
   595  struct {
   596    f0 int
   597    f1 int (if[value[f0] == 0x1])
   598    f2 int (if[value[f1] == 0x1])
   599  }
   600  ```
   601  
   602  You may also reference constants in expressions:
   603  
   604  ```
   605  struct {
   606    f0 int
   607    f1 int
   608    f2 int (if[value[f0] & SOME_CONST == OTHER_CONST])
   609  }
   610  ```
   611  
   612  ## Meta
   613  
   614  Description files can also contain `meta` directives that specify meta-information for the whole file.
   615  
   616  ```
   617  meta noextract
   618  ```
   619  
   620  Tells `make extract` to not extract constants for this file.
   621  Though, `syz-extract` can still be invoked manually on this file.
   622  
   623  ```
   624  meta arches["arch1", "arch2"]
   625  ```
   626  
   627  Restricts this file only to the given set of architectures.
   628  `make extract` and `make generate` will not use it on other architectures.
   629  
   630  ## Misc
   631  
   632  Description files also contain `include` directives that refer to Linux kernel header files,
   633  `incdir` directives that refer to custom Linux kernel header directories
   634  and `define` directives that define symbolic constant values.
   635  
   636  The syzkaller executor defines some [pseudo system calls](./pseudo_syscalls.md)
   637  that can be used as any other syscall in a description file. These pseudo
   638  system calls expand to literal C code and can perform user-defined
   639  custom actions. You can find some examples in
   640  [executor/common_linux.h](../executor/common_linux.h).
   641  
   642  Also see [tips](syscall_descriptions.md#tips) on writing good descriptions.