github.com/google/syzkaller@v0.0.0-20240517125934-c0f1611a36d6/docs/syscall_descriptions_syntax.md (about)

     1  # Syscall description language
     2  aka `syzlang` (`[siːzˈlæŋg]`)
     3  
     4  Pseudo-formal grammar of syscall description:
     5  
     6  ```
     7  syscallname "(" [arg ["," arg]*] ")" [type] ["(" attribute* ")"]
     8  arg = argname type
     9  argname = identifier
    10  type = typename [ "[" type-options "]" ]
    11  typename = "const" | "intN" | "intptr" | "flags" | "array" | "ptr" |
    12  	   "string" | "strconst" | "filename" | "glob" | "len" |
    13  	   "bytesize" | "bytesizeN" | "bitsize" | "vma" | "proc" |
    14  	   "compressed_image"
    15  type-options = [type-opt ["," type-opt]]
    16  ```
    17  
    18  common type-options include:
    19  
    20  ```
    21  "opt" - the argument is optional (like mmap fd argument, or accept peer argument)
    22  ```
    23  
    24  rest of the type-options are type-specific:
    25  
    26  ```
    27  "const": integer constant, type-options:
    28  	value, underlying type (one of "intN", "intptr")
    29  "intN"/"intptr": an integer without a particular meaning, type-options:
    30  	either an optional range of values (e.g. "5:10", or "100:200")
    31  	or a reference to flags description (see below),
    32  	or a single value
    33  	optionally followed by an alignment parameter if using a range
    34  "flags": a set of values, type-options:
    35  	reference to flags description (see below), underlying int type (e.g. "int32")
    36  "array": a variable/fixed-length array, type-options:
    37  	type of elements, optional size (fixed "5", or ranged "5:10", boundaries inclusive)
    38  "ptr"/"ptr64": a pointer to an object, type-options:
    39  	direction (in/out/inout); type of the object
    40  	ptr64 has size of 8 bytes regardless of target pointer size
    41  "string": a zero-terminated memory buffer (no pointer indirection implied), type-options:
    42  	either a string value in quotes for constant strings (e.g. "foo" or `deadbeef` for hex literal),
    43  	or a reference to string flags (special value `filename` produces file names),
    44  	optionally followed by a buffer size (string values will be padded with \x00 to that size)
    45  "stringnoz": a non-zero-terminated memory buffer (no pointer indirection implied), type-options:
    46  	either a string value in quotes for constant strings (e.g. "foo" or `deadbeef` for hex literal),
    47  	or a reference to string flags,
    48  "glob": glob pattern to match on the target files, type-options:
    49  	a pattern string in quotes (syntax: https://golang.org/pkg/path/filepath/#Match)
    50  	(e.g. "/sys/" or "/sys/**/*"),
    51  	or include exclude glob too (e.g. "/sys/**/*:-/sys/power/state")
    52  "fmt": a string representation of an integer (not zero-terminated), type-options:
    53  	format (one of "dec", "hex", "oct") and the value (a resource, int, flags, const or proc)
    54  	the resulting data is always fixed-size (formatted as "%020llu", "0x%016llx" or "%023llo", respectively)
    55  "len": length of another field (for array it is number of elements), type-options:
    56  	argname of the object
    57  "bytesize": similar to "len", but always denotes the size in bytes, type-options:
    58  	argname of the object
    59  "bitsize": similar to "len", but always denotes the size in bits, type-options:
    60  	argname of the object
    61  "offsetof": offset of the field from the beginning of the parent struct, type-options:
    62  	field
    63  "vma"/"vma64": a pointer to a set of pages (used as input for mmap/munmap/mremap/madvise), type-options:
    64  	optional number of pages (e.g. vma[7]), or a range of pages (e.g. vma[2-4])
    65  	vma64 has size of 8 bytes regardless of target pointer size
    66  "proc": per process int (see description below), type-options:
    67  	value range start, how many values per process, underlying type
    68  "compressed_image": zlib-compressed disk image
    69  	syscalls accepting compressed images must be marked with `no_generate`
    70  	and `no_minimize` call attributes.
    71  "text": machine code of the specified type, type-options:
    72  	text type (x86_real, x86_16, x86_32, x86_64, arm64)
    73  "void": type with static size 0
    74  	mostly useful inside of templates and varlen unions, can't be syscall argument
    75  ```
    76  
    77  flags/len/flags also have trailing underlying type type-option when used in structs/unions/pointers.
    78  
    79  Flags are described as:
    80  
    81  ```
    82  flagname = const ["," const]*
    83  ```
    84  
    85  or for string flags as:
    86  
    87  ```
    88  flagname = "\"" literal "\"" ["," "\"" literal "\""]*
    89  ```
    90  
    91  Call attributes are:
    92  
    93  ```
    94  "disabled": the call will not be used in fuzzing; useful to temporary disable some calls
    95  	or prohibit particular argument combinations.
    96  "timeout[N]": additional execution timeout (in ms) for the call on top of some default value
    97  "prog_timeout[N]": additional execution timeout (in ms) for the whole program if it contains this call;
    98  	if a program contains several such calls, the max value is used.
    99  "ignore_return": ignore return value of this syscall in fallback feedback; need to be used for calls
   100  	that don't return fixed error codes but rather something else (e.g. the current time).
   101  "breaks_returns": ignore return values of all subsequent calls in the program in fallback feedback (can't be trusted).
   102  "no_generate": do not try to generate this syscall, i.e. use only seed descriptions to produce it.
   103  "no_minimize": do not modify instances of this syscall when trying to minimize a crashing program.
   104  ```
   105  
   106  ## Ints
   107  
   108  `int8`, `int16`, `int32` and `int64` denote an integer of the corresponding size.
   109  `intptr` denotes a pointer-sized integer, i.e. C `long` type.
   110  
   111  By appending `be` suffix (e.g. `int16be`) integers become big-endian.
   112  
   113  It's possible to specify a range of values for an integer in the format of `int32[0:100]` or `int32[0:4096, 512]` for a 512-aligned int.
   114  
   115  Integers can also take a reference to flags description or a value as its first type-option.
   116  In that case, the alignment parameter is not supported.
   117  
   118  To denote a bitfield of size N use `int64:N`.
   119  
   120  It's possible to use these various kinds of ints as base types for `const`, `flags`, `len` and `proc`.
   121  
   122  ```
   123  example_struct {
   124  	f0	int8			# random 1-byte integer
   125  	f1	const[0x42, int16be]	# const 2-byte integer with value 0x4200 (big-endian 0x42)
   126  	f2	int32[0:100]		# random 4-byte integer with values from 0 to 100 inclusive
   127  	f3	int32[1:10, 2]		# random 4-byte integer with values {1, 3, 5, 7, 9}
   128  	f4	int64:20		# random 20-bit bitfield
   129  	f5	int8[10]		# const 1-byte integer with value 10
   130  	f6	int32[flagname]		# random 4-byte integer from the set of values referenced by flagname
   131  }
   132  ```
   133  
   134  ## Structs
   135  
   136  Structs are described as:
   137  
   138  ```
   139  structname "{" "\n"
   140  	(fieldname type ("(" fieldattribute* ")")? (if[expression])? "\n")+
   141  "}" ("[" attribute* "]")?
   142  ```
   143  
   144  Fields can have attributes specified in parentheses after the field, independent
   145  of their type. `in/out/inout` attribute specify per-field direction, for example:
   146  
   147  ```
   148  foo {
   149  	field0	const[1, int32]	(in)
   150  	field1	int32		(inout)
   151  	field2	fd		(out)
   152  }
   153  ```
   154  
   155  You may specify conditions that determine whether a field will be included:
   156  
   157  ```
   158  foo {
   159  	field0	int32
   160  	field1	int32 (if[value[field0] == 0x1])
   161  }
   162  ```
   163  
   164  See [the corresponding section](syscall_descriptions_syntax.md#conditional-fields)
   165  for more details.
   166  
   167  `out_overlay` attribute allows to have separate input and output layouts for the struct.
   168  Fields before the `out_overlay` field are input, fields starting from `out_overlay` are output.
   169  Input and output fields overlap in memory (both start from the beginning of the struct in memory).
   170  For example:
   171  
   172  ```
   173  foo {
   174  	in0	const[1, int32]
   175  	in1	flags[bar, int8]
   176  	in2	ptr[in, string]
   177  	out0	fd	(out_overlay)
   178  	out1	int32
   179  }
   180  ```
   181  
   182  Structs can have attributes specified in square brackets after the struct.
   183  Attributes are:
   184  
   185  - `packed`: the struct does not have paddings between fields and has alignment 1; this is similar to GNU C `__attribute__((packed))`; struct alignment can be overriden with `align` attribute
   186  - `align[N]`: the struct has alignment N and padded up to multiple of `N`; contents of the padding are unspecified (though, frequently are zeros); similar to GNU C `__attribute__((aligned(N)))`
   187  - `size[N]`: the struct is padded up to the specified size `N`; contents of the padding are unspecified (though, frequently are zeros)
   188  
   189  ## Unions
   190  
   191  Unions are described as:
   192  
   193  ```
   194  unionname "[" "\n"
   195  	(fieldname type (if[expression])? "\n")+
   196  "]" ("[" attribute* "]")?
   197  ```
   198  
   199  During fuzzing, syzkaller randomly picks one of the union options.
   200  
   201  You may also specify conditions that determine whether the corresponding
   202  option may or may not be selected, depending on values of other fields. See
   203  [the corresponding section](syscall_descriptions_syntax.md#conditional-fields)
   204  for more details.
   205  
   206  Unions can have attributes specified in square brackets after the union.
   207  Attributes are:
   208  
   209  - `varlen`: union size is the size of the particular chosen option (not statically known); without this attribute unions are statically sized as maximum of all options (similar to C unions)
   210  - `size[N]`: the union is padded up to the specified size `N`; contents of the padding are unspecified (though, frequently are zeros)
   211  
   212  ## Resources
   213  
   214  Resources represent values that need to be passed from output of one syscall to input of another syscall. For example, `close` syscall requires an input value (fd) previously returned by `open` or `pipe` syscall. To achieve this, `fd` is declared as a resource. This is a way of modelling dependencies between syscalls, as defining a syscall as the producer of a resource and another syscall as the consumer defines a loose sense of ordering between them. Resources are described as:
   215  
   216  ```
   217  "resource" identifier "[" underlying_type "]" [ ":" const ("," const)* ]
   218  ```
   219  
   220  `underlying_type` is either one of `int8`, `int16`, `int32`, `int64`, `intptr` or another resource (which models inheritance, for example, a socket is a subtype of fd). The optional set of constants represent resource special values, for example, `0xffffffffffffffff` (-1) for "no fd", or `AT_FDCWD` for "the current dir". Special values are used once in a while as resource values. If no special values specified, special value of `0` is used. Resources can then be used as types, for example:
   221  
   222  ```
   223  resource fd[int32]: 0xffffffffffffffff, AT_FDCWD, 1000000
   224  resource sock[fd]
   225  resource sock_unix[sock]
   226  
   227  socket(...) sock
   228  accept(fd sock, ...) sock
   229  listen(fd sock, backlog int32)
   230  ```
   231  
   232  Resources don't have to be necessarily returned by a syscall. They can be used as any other data type. For example:
   233  
   234  ```
   235  resource my_resource[int32]
   236  
   237  request_producer(..., arg ptr[out, my_resource])
   238  request_consumer(..., arg ptr[inout, test_struct])
   239  
   240  test_struct {
   241  	...
   242  	attr	my_resource
   243  }
   244  ```
   245  
   246  For more complex producer/consumer scenarios, field attributes can be utilized.
   247  For example:
   248  
   249  ```
   250  resource my_resource_1[int32]
   251  resource my_resource_2[int32]
   252  
   253  request_produce1_consume2(..., arg ptr[inout, test_struct])
   254  
   255  test_struct {
   256  	...
   257  	field0	my_resource_1	(out)
   258  	field1	my_resource_2	(in)
   259  }
   260  ```
   261  
   262  Each resource type must be "produced" (used as an output) by at least one syscall
   263  (outside of unions and optional pointers) and "consumed" (used as an input)
   264  by at least one syscall.
   265  
   266  ## Type Aliases
   267  
   268  Complex types that are often repeated can be given short type aliases using the
   269  following syntax:
   270  
   271  ```
   272  type identifier underlying_type
   273  ```
   274  
   275  For example:
   276  
   277  ```
   278  type signalno int32[0:65]
   279  type net_port proc[20000, 4, int16be]
   280  ```
   281  
   282  Then, type alias can be used instead of the underlying type in any contexts.
   283  Underlying type needs to be described as if it's a struct field, that is,
   284  with the base type if it's required. However, type alias can be used as syscall
   285  arguments as well. Underlying types are currently restricted to integer types,
   286  `ptr`, `ptr64`, `const`, `flags` and `proc` types.
   287  
   288  There are some builtin type aliases:
   289  ```
   290  type bool8	int8[0:1]
   291  type bool16	int16[0:1]
   292  type bool32	int32[0:1]
   293  type bool64	int64[0:1]
   294  type boolptr	intptr[0:1]
   295  
   296  type fileoff[BASE] BASE
   297  
   298  type filename string[filename]
   299  
   300  type buffer[DIR] ptr[DIR, array[int8]]
   301  ```
   302  
   303  ## Type Templates
   304  
   305  Type templates can be declared as follows:
   306  ```
   307  type buffer[DIR] ptr[DIR, array[int8]]
   308  type fileoff[BASE] BASE
   309  type nlattr[TYPE, PAYLOAD] {
   310  	nla_len		len[parent, int16]
   311  	nla_type	const[TYPE, int16]
   312  	payload		PAYLOAD
   313  } [align_4]
   314  ```
   315  
   316  and later used as follows:
   317  ```
   318  syscall(a buffer[in], b fileoff[int64], c ptr[in, nlattr[FOO, int32]])
   319  ```
   320  
   321  There is builtin type template `optional` defined as:
   322  ```
   323  type optional[T] [
   324  	val	T
   325  	void	void
   326  ] [varlen]
   327  ```
   328  
   329  ## Length
   330  
   331  You can specify length of a particular field in struct or a named argument by
   332  using `len`, `bytesize` and `bitsize` types, for example:
   333  
   334  ```
   335  write(fd fd, buf ptr[in, array[int8]], count len[buf])
   336  
   337  sock_fprog {
   338  	len	len[filter, int16]
   339  	filter	ptr[in, array[sock_filter]]
   340  }
   341  ```
   342  
   343  If `len`'s argument is a pointer, then the length of the pointee argument is used.
   344  
   345  To denote the length of a field in N-byte words use `bytesizeN`, possible values
   346  for N are 1, 2, 4 and 8.
   347  
   348  To denote the length of the parent struct, you can use `len[parent, int8]`.
   349  To denote the length of the higher level parent when structs are embedded into
   350  one another, you can specify the type name of the particular parent:
   351  
   352  ```
   353  s1 {
   354      f0      len[s2]  # length of s2
   355  }
   356  
   357  s2 {
   358      f0      s1
   359      f1      array[int32]
   360      f2      len[parent, int32]
   361  }
   362  ```
   363  
   364  `len` argument can also be a path expression which allows more complex
   365  addressing. Path expressions are similar to C field references, but also allow
   366  referencing parent and sibling elements. A special reference `syscall` used
   367  in the beginning of the path allows to refer directly to the syscall arguments.
   368  For example:
   369  
   370  ```
   371  s1 {
   372  	a	ptr[in, s2]
   373  	b	ptr[in, s3]
   374  	c	array[int8]
   375  }
   376  
   377  s2 {
   378  	d	array[int8]
   379  }
   380  
   381  s3 {
   382  # This refers to the array c in the parent s1.
   383  	e	len[s1:c, int32]
   384  # This refers to the array d in the sibling s2.
   385  	f	len[s1:a:d, int32]
   386  # This refers to the array k in the child s4.
   387  	g	len[i:j, int32]
   388  # This refers to syscall argument l.
   389  	h	len[syscall:l, int32]
   390  	i	ptr[in, s4]
   391  }
   392  
   393  s4 {
   394  	j	array[int8]
   395  }
   396  
   397  foo(k ptr[in, s1], l ptr[in, array[int8]])
   398  ```
   399  
   400  ## Proc
   401  
   402  The `proc` type can be used to denote per process integers.
   403  The idea is to have a separate range of values for each executor, so they don't interfere.
   404  
   405  The simplest example is a port number.
   406  The `proc[20000, 4, int16be]` type means that we want to generate an `int16be`
   407  integer starting from `20000` and assign `4` values for each process.
   408  As a result the executor number `n` will get values in the `[20000 + n * 4, 20000 + (n + 1) * 4)` range.
   409  
   410  ## Integer Constants
   411  
   412  Integer constants can be specified as decimal literals, as `0x`-prefixed
   413  hex literals, as `'`-surrounded char literals, or as symbolic constants
   414  extracted from kernel headers or defined by `define` directives. For example:
   415  
   416  ```
   417  foo(a const[10], b const[-10])
   418  foo(a const[0xabcd])
   419  foo(a int8['a':'z'])
   420  foo(a const[PATH_MAX])
   421  foo(a int32[PATH_MAX])
   422  foo(a ptr[in, array[int8, MY_PATH_MAX]])
   423  define MY_PATH_MAX	PATH_MAX + 2
   424  ```
   425  
   426  ## Conditional fields
   427  
   428  ### In structures
   429  
   430  In syzlang, it's possible to specify a condition for every struct field that
   431  determines whether the field should be included or omitted:
   432  
   433  ```
   434  header_fields {
   435    magic       const[0xabcd, int16]
   436    haveInteger int8
   437  } [packed]
   438  
   439  packet {
   440    header  header_fields
   441    integer int64  (if[value[header:haveInteger] == 0x1])
   442    body    array[int8]
   443  } [packed]
   444  
   445  some_call(a ptr[in, packet])
   446  ```
   447  
   448  In this example, the `packet` structure will include the field `integer` only
   449  if `header.haveInteger == 1`. In memory, `packet` will have the following
   450  layout:
   451  
   452  | header_files.magic = 0xabcd | header_files.haveInteger = 0x1 | integer | body |
   453  | - | - | - | - |
   454  
   455  
   456  That corresponds to e.g. the following program:
   457  ```
   458  some_call(&AUTO={{AUTO, 0x1}, @value=0xabcd, []})
   459  ```
   460  
   461  If `header.haveInteger` is not `1`, syzkaller will just pretend that the field
   462  `integer` does not exist.
   463  ```
   464  some_call(&AUTO={{AUTO, 0x0}, @void, []})
   465  ```
   466  
   467  | header_files.magic = 0xabcd | header_files.haveInteger = 0x0 | body |
   468  | - | - | - |
   469  
   470  Every conditional field is assumed to be of variable length and so is the struct
   471  to which this field belongs.
   472  
   473  When a variable length field appears in the middle of a structure, the structure
   474  must be marked with `[packed].`
   475  
   476  Conditions on bitfields are prohibited:
   477  ```
   478  struct {
   479    f0 int
   480    f1 int:3 (if[value[f0] == 0x1])  # It will not compile.
   481  }
   482  ```
   483  
   484  But you may reference bitfields in your conditions:
   485  ```
   486  struct {
   487    f0 int:1
   488    f1 int:7
   489    f2 int   (if[value[f0] == value[f1]])
   490  } [packed]
   491  ```
   492  
   493  ### In unions
   494  
   495  Let's consider the following example.
   496  
   497  ```
   498  struct {
   499    type int
   500    body alternatives
   501  }
   502  
   503  alternatives [
   504    int     int64 (if[value[struct:type] == 0x1])
   505    arr     array[int64, 5] (if[value[struct:type] == 0x2])
   506    default int32
   507  ] [varlen]
   508  
   509  some_call(a ptr[in, struct])
   510  ```
   511  
   512  In this case, the union option will be selected depending on the value of the
   513  `type` field. For example, if `type` is `0x1`, then it can be either `int` or
   514  `default`:
   515  ```
   516  some_call(&AUTO={0x1, @int=0x123})
   517  some_call(&AUTO={0x1, @default=0x123})
   518  ```
   519  
   520  If `type` is `0x2`, it can be either `arr` or `default`.
   521  
   522  If `type` is neither `0x1` nor `0x2`, syzkaller may only select `default`:
   523  ```
   524  some_call(&AUTO={0x0, @default=0xabcd})
   525  ```
   526  
   527  To ensure that a union can always be constructed, the last union field **must always
   528  have no condition**.
   529  
   530  Thus, the following definition would fail to compile:
   531  
   532  ```
   533  alternatives [
   534    int int64 (if[value[struct:type] == 0x1])
   535    arr array[int64, 5] (if[value[struct:type] == 0x1])
   536  ] [varlen]
   537  ```
   538  
   539  During prog mutation and generation syzkaller will select a random union field
   540  whose condition is satisfied.
   541  
   542  
   543  ### Expression syntax
   544  
   545  Currently, only `==`, `!=` and `&` operators are supported. However, the
   546  functionality was designed in such a way that adding more operators is easy.
   547  Feel free to file a GitHub issue or write us an email in case it's needed.
   548  
   549  Expressions are evaluated as `int64` values. If the final result of an
   550  expression is not 0, it's assumed to be satisfied.
   551  
   552  If you want to reference a field's value, you can do it via
   553  `value[path:to:field]`, which is similar to the `len[]` argument.
   554  
   555  ```
   556  sub_struct {
   557    f0 int
   558    # Reference a field in a parent struct.
   559    f1 int (if[value[struct:f2]]) # Same as if[value[struct:f2] != 0]].
   560  }
   561  
   562  struct {
   563    f2 int
   564    f3 sub_struct
   565    f4 int (if[value[f2] == 0x2]) # Reference a sibling field.
   566    f5 int (if[value[f3:f0] == 0x1]) # Reference a nested field.
   567  } [packed]
   568  
   569  call(a ptr[in, struct])
   570  ```
   571  
   572  The referenced field must be of integer type and there must be no
   573  conditional fields in the path to it. For example, the following
   574  descriptions will not compile.
   575  
   576  ```
   577  struct {
   578    f0 int
   579    f1 int (if[value[f0] == 0x1])
   580    f2 int (if[value[f1] == 0x1])
   581  }
   582  ```
   583  
   584  You may also reference constants in expressions:
   585  ```
   586  struct {
   587    f0 int
   588    f1 int
   589    f2 int (if[value[f0] & SOME_CONST == OTHER_CONST])
   590  }
   591  ```
   592  
   593  ## Meta
   594  
   595  Description files can also contain `meta` directives that specify meta-information for the whole file.
   596  
   597  ```
   598  meta noextract
   599  ```
   600  Tells `make extract` to not extract constants for this file.
   601  Though, `syz-extract` can still be invoked manually on this file.
   602  
   603  ```
   604  meta arches["arch1", "arch2"]
   605  ```
   606  Restricts this file only to the given set of architectures.
   607  `make extract` and `make generate` will not use it on other architectures.
   608  
   609  ## Misc
   610  
   611  Description files also contain `include` directives that refer to Linux kernel header files,
   612  `incdir` directives that refer to custom Linux kernel header directories
   613  and `define` directives that define symbolic constant values.
   614  
   615  The syzkaller executor defines some [pseudo system calls](./pseudo_syscalls.md)
   616  that can be used as any other syscall in a description file. These pseudo
   617  system calls expand to literal C code and can perform user-defined
   618  custom actions. You can find some examples in
   619  [executor/common_linux.h](../executor/common_linux.h).
   620  
   621  Also see [tips](syscall_descriptions.md#tips) on writing good descriptions.