github.com/google/syzkaller@v0.0.0-20251211124644-a066d2bc4b02/docs/syscall_descriptions_syntax.md (about) 1 # Syscall description language 2 3 aka `syzlang` (`[siːzˈlæŋg]`) 4 5 Pseudo-formal grammar of syscall description: 6 7 ``` 8 syscallname "(" [arg ["," arg]*] ")" [type] ["(" attribute* ")"] 9 arg = argname type 10 argname = identifier 11 type = typename [ "[" type-options "]" ] 12 typename = "const" | "intN" | "intptr" | "flags" | "array" | "ptr" | 13 "string" | "filename" | "glob" | "len" | 14 "bytesize" | "bytesizeN" | "bitsize" | "vma" | "proc" | 15 "compressed_image" 16 type-options = [type-opt ["," type-opt]] 17 ``` 18 19 common type-options include: 20 21 ``` 22 "opt" - the argument is optional (like mmap fd argument, or accept peer argument) 23 ``` 24 25 rest of the type-options are type-specific: 26 27 ``` 28 "const": integer constant, type-options: 29 value, underlying type (one of "intN", "intptr") 30 "intN"/"intptr": an integer without a particular meaning, type-options: 31 either an optional range of values (e.g. "5:10", or "100:200") 32 or a reference to flags description (see below), 33 or a single value 34 optionally followed by an alignment parameter if using a range 35 "flags": a set of values, type-options: 36 reference to flags description (see below), underlying int type (e.g. "int32") 37 "array": a variable/fixed-length array, type-options: 38 type of elements, optional size (fixed "5", or ranged "5:10", boundaries inclusive) 39 "ptr"/"ptr64": a pointer to an object, type-options: 40 direction (in/out/inout); type of the object 41 ptr64 has size of 8 bytes regardless of target pointer size 42 "string": a zero-terminated memory buffer (no pointer indirection implied), type-options: 43 either a string value in quotes for constant strings (e.g. "foo" or `deadbeef` for hex literal), 44 or a reference to string flags (special value `filename` produces file names), 45 optionally followed by a buffer size (string values will be padded with \x00 to that size) 46 "stringnoz": a non-zero-terminated memory buffer (no pointer indirection implied), type-options: 47 either a string value in quotes for constant strings (e.g. "foo" or `deadbeef` for hex literal), 48 or a reference to string flags, 49 "glob": glob pattern to match on the target files, type-options: 50 a pattern string in quotes (syntax: https://golang.org/pkg/path/filepath/#Match) 51 (e.g. "/sys/" or "/sys/**/*"), 52 or include exclude glob too (e.g. "/sys/**/*:-/sys/power/state") 53 "fmt": a string representation of an integer (not zero-terminated), type-options: 54 format (one of "dec", "hex", "oct") and the value (a resource, int, flags or proc) 55 the resulting data is always fixed-size (formatted as "%020llu", "0x%016llx" or "%023llo", respectively) 56 "len": length of another field (for array it is number of elements), type-options: 57 argname of the object 58 "bytesize": similar to "len", but always denotes the size in bytes, type-options: 59 argname of the object 60 "bitsize": similar to "len", but always denotes the size in bits, type-options: 61 argname of the object 62 "offsetof": offset of the field from the beginning of the parent struct, type-options: 63 field 64 "vma"/"vma64": a pointer to a set of pages (used as input for mmap/munmap/mremap/madvise), type-options: 65 optional number of pages (e.g. vma[7]), or a range of pages (e.g. vma[2-4]) 66 vma64 has size of 8 bytes regardless of target pointer size 67 "proc": per process int (see description below), type-options: 68 value range start, how many values per process, underlying type 69 "compressed_image": zlib-compressed disk image 70 syscalls accepting compressed images must be marked with `no_generate` 71 and `no_minimize` call attributes. if the content of the decompressed image 72 can be checked by a `fsck`-like command, use the `fsck` syscall attribute 73 "text": machine code of the specified type, type-options: 74 text type (x86_real, x86_16, x86_32, x86_64, arm64) 75 "void": type with static size 0 76 mostly useful inside of templates and varlen unions, can't be syscall argument 77 ``` 78 79 flags/len/flags also have trailing underlying type type-option when used in structs/unions/pointers. 80 81 Flags are described as: 82 83 ``` 84 flagname = const ["," const]* 85 ``` 86 87 or for string flags as: 88 89 ``` 90 flagname = "\"" literal "\"" ["," "\"" literal "\""]* 91 ``` 92 93 Call attributes are: 94 95 ``` 96 "disabled": the call will not be used in fuzzing; useful to temporary disable some calls 97 or prohibit particular argument combinations. 98 "timeout[N]": additional execution timeout (in ms) for the call on top of some default value 99 "prog_timeout[N]": additional execution timeout (in ms) for the whole program if it contains this call; 100 if a program contains several such calls, the max value is used. 101 "ignore_return": ignore return value of this syscall in fallback feedback; need to be used for calls 102 that don't return fixed error codes but rather something else (e.g. the current time). 103 "breaks_returns": ignore return values of all subsequent calls in the program in fallback feedback (can't be trusted). 104 "no_generate": do not try to generate this syscall, i.e. use only seed descriptions to produce it. 105 "no_minimize": do not modify instances of this syscall when trying to minimize a crashing program. 106 "no_squash": do not attempt to pass squashed arguments to this syscall. 107 Without that, the fuzzer will sometimes attempt to replace complex structures with arrays of bytes, 108 possibly triggering interesting mutations, but also making programs hard to reason about. 109 "fsck": the content of the compressed buffer argument for this syscall is a file system and the 110 string argument is a fsck-like command that will be called to verify the filesystem 111 "remote_cover": wait longer to collect remote coverage for this call. 112 "kfuzz_test": the call is a kfuzztest target 113 ``` 114 115 ## Ints 116 117 `int8`, `int16`, `int32` and `int64` denote an integer of the corresponding size. 118 `intptr` denotes a pointer-sized integer, i.e. C `long` type. 119 120 By appending `be` suffix (e.g. `int16be`) integers become big-endian. 121 122 It's possible to specify a range of values for an integer in the format of `int32[0:100]` or `int32[0:4096, 512]` for a 512-aligned int. 123 124 Integers can also take a reference to flags description or a value as its first type-option. 125 In that case, the alignment parameter is not supported. 126 127 To denote a bitfield of size N use `int64:N`. 128 129 It's possible to use these various kinds of ints as base types for `const`, `flags`, `len` and `proc`. 130 131 ``` 132 example_struct { 133 f0 int8 # random 1-byte integer 134 f1 const[0x42, int16be] # const 2-byte integer with value 0x4200 (big-endian 0x42) 135 f2 int32[0:100] # random 4-byte integer with values from 0 to 100 inclusive 136 f3 int32[1:10, 2] # random 4-byte integer with values {1, 3, 5, 7, 9} 137 f4 int64:20 # random 20-bit bitfield 138 f5 int8[10] # const 1-byte integer with value 10 139 f6 int32[flagname] # random 4-byte integer from the set of values referenced by flagname 140 } 141 ``` 142 143 ## Structs 144 145 Structs are described as: 146 147 ``` 148 structname "{" "\n" 149 (fieldname type ("(" fieldattribute* ")")? (if[expression])? "\n")+ 150 "}" ("[" attribute* "]")? 151 ``` 152 153 Fields can have attributes specified in parentheses after the field, independent 154 of their type. `in/out/inout` attribute specify per-field direction, for example: 155 156 ``` 157 foo { 158 field0 const[1, int32] (in) 159 field1 int32 (inout) 160 field2 fd (out) 161 } 162 ``` 163 164 You may specify conditions that determine whether a field will be included: 165 166 ``` 167 foo { 168 field0 int32 169 field1 int32 (if[value[field0] == 0x1]) 170 } 171 ``` 172 173 See [the corresponding section](syscall_descriptions_syntax.md#conditional-fields) 174 for more details. 175 176 `out_overlay` attribute allows to have separate input and output layouts for the struct. 177 Fields before the `out_overlay` field are input, fields starting from `out_overlay` are output. 178 Input and output fields overlap in memory (both start from the beginning of the struct in memory). 179 For example: 180 181 ``` 182 foo { 183 in0 const[1, int32] 184 in1 flags[bar, int8] 185 in2 ptr[in, string] 186 out0 fd (out_overlay) 187 out1 int32 188 } 189 ``` 190 191 Structs can have attributes specified in square brackets after the struct. 192 Attributes are: 193 194 - `packed`: the struct does not have paddings between fields and has alignment 1; this is similar to GNU C `__attribute__((packed))`; struct alignment can be overridden with `align` attribute 195 - `align[N]`: the struct has alignment N and padded up to multiple of `N`; contents of the padding are unspecified (though, frequently are zeros); similar to GNU C `__attribute__((aligned(N)))` 196 - `size[N]`: the struct is padded up to the specified size `N`; contents of the padding are unspecified (though, frequently are zeros) 197 198 ## Unions 199 200 Unions are described as: 201 202 ``` 203 unionname "[" "\n" 204 (fieldname type (if[expression])? "\n")+ 205 "]" ("[" attribute* "]")? 206 ``` 207 208 During fuzzing, syzkaller randomly picks one of the union options. 209 210 You may also specify conditions that determine whether the corresponding 211 option may or may not be selected, depending on values of other fields. See 212 [the corresponding section](syscall_descriptions_syntax.md#conditional-fields) 213 for more details. 214 215 Unions can have attributes specified in square brackets after the union. 216 Attributes are: 217 218 - `varlen`: union size is the size of the particular chosen option (not statically known); without this attribute unions are statically sized as maximum of all options (similar to C unions) 219 - `size[N]`: the union is padded up to the specified size `N`; contents of the padding are unspecified (though, frequently are zeros) 220 221 ## Resources 222 223 Resources represent values that need to be passed from output of one syscall to input of another syscall. For example, `close` syscall requires an input value (fd) previously returned by `open` or `pipe` syscall. To achieve this, `fd` is declared as a resource. This is a way of modelling dependencies between syscalls, as defining a syscall as the producer of a resource and another syscall as the consumer defines a loose sense of ordering between them. Resources are described as: 224 225 ``` 226 "resource" identifier "[" underlying_type "]" [ ":" const ("," const)* ] 227 ``` 228 229 `underlying_type` is either one of `int8`, `int16`, `int32`, `int64`, `intptr` or another resource (which models inheritance, for example, a socket is a subtype of fd). The optional set of constants represent resource special values, for example, `0xffffffffffffffff` (-1) for "no fd", or `AT_FDCWD` for "the current dir". Special values are used once in a while as resource values. If no special values specified, special value of `0` is used. Resources can then be used as types, for example: 230 231 ``` 232 resource fd[int32]: 0xffffffffffffffff, AT_FDCWD, 1000000 233 resource sock[fd] 234 resource sock_unix[sock] 235 236 socket(...) sock 237 accept(fd sock, ...) sock 238 listen(fd sock, backlog int32) 239 ``` 240 241 Resources don't have to be necessarily returned by a syscall. They can be used as any other data type. For example: 242 243 ``` 244 resource my_resource[int32] 245 246 request_producer(..., arg ptr[out, my_resource]) 247 request_consumer(..., arg ptr[inout, test_struct]) 248 249 test_struct { 250 ... 251 attr my_resource 252 } 253 ``` 254 255 For more complex producer/consumer scenarios, field attributes can be utilized. 256 For example: 257 258 ``` 259 resource my_resource_1[int32] 260 resource my_resource_2[int32] 261 262 request_produce1_consume2(..., arg ptr[inout, test_struct]) 263 264 test_struct { 265 ... 266 field0 my_resource_1 (out) 267 field1 my_resource_2 (in) 268 } 269 ``` 270 271 Each resource type must be "produced" (used as an output) by at least one syscall 272 (outside of unions and optional pointers) and "consumed" (used as an input) 273 by at least one syscall. 274 275 ## Type Aliases 276 277 Complex types that are often repeated can be given short type aliases using the 278 following syntax: 279 280 ``` 281 type identifier underlying_type 282 ``` 283 284 For example: 285 286 ``` 287 type signalno int32[0:65] 288 type net_port proc[20000, 4, int16be] 289 ``` 290 291 Then, type alias can be used instead of the underlying type in any contexts. 292 Underlying type needs to be described as if it's a struct field, that is, 293 with the base type if it's required. However, type alias can be used as syscall 294 arguments as well. Underlying types are currently restricted to integer types, 295 `ptr`, `ptr64`, `const`, `flags` and `proc` types. 296 297 There are some builtin type aliases: 298 299 ``` 300 type bool8 int8[0:1] 301 type bool16 int16[0:1] 302 type bool32 int32[0:1] 303 type bool64 int64[0:1] 304 type boolptr intptr[0:1] 305 306 type fileoff[BASE] BASE 307 308 type filename string[filename] 309 310 type buffer[DIR] ptr[DIR, array[int8]] 311 ``` 312 313 ## Type Templates 314 315 Type templates can be declared as follows: 316 317 ``` 318 type buffer[DIR] ptr[DIR, array[int8]] 319 type fileoff[BASE] BASE 320 type nlattr[TYPE, PAYLOAD] { 321 nla_len len[parent, int16] 322 nla_type const[TYPE, int16] 323 payload PAYLOAD 324 } [align_4] 325 ``` 326 327 and later used as follows: 328 329 ``` 330 syscall(a buffer[in], b fileoff[int64], c ptr[in, nlattr[FOO, int32]]) 331 ``` 332 333 There is builtin type template `optional` defined as: 334 335 ``` 336 type optional[T] [ 337 val T 338 void void 339 ] [varlen] 340 ``` 341 342 ## Length 343 344 You can specify length of a particular field in struct or a named argument by 345 using `len`, `bytesize` and `bitsize` types, for example: 346 347 ``` 348 write(fd fd, buf ptr[in, array[int8]], count len[buf]) 349 350 sock_fprog { 351 len len[filter, int16] 352 filter ptr[in, array[sock_filter]] 353 } 354 ``` 355 356 If `len`'s argument is a pointer, then the length of the pointee argument is used. 357 358 To denote the length of a field in N-byte words use `bytesizeN`, possible values 359 for N are 1, 2, 4 and 8. 360 361 To denote the length of the parent struct, you can use `len[parent, int8]`. 362 To denote the length of the higher level parent when structs are embedded into 363 one another, you can specify the type name of the particular parent: 364 365 ``` 366 s1 { 367 f0 len[s2] # length of s2 368 } 369 370 s2 { 371 f0 s1 372 f1 array[int32] 373 f2 len[parent, int32] 374 } 375 ``` 376 377 `len` argument can also be a path expression which allows more complex 378 addressing. Path expressions are similar to C field references, but also allow 379 referencing parent and sibling elements. A special reference `syscall` used 380 in the beginning of the path allows to refer directly to the syscall arguments. 381 For example: 382 383 ``` 384 s1 { 385 a ptr[in, s2] 386 b ptr[in, s3] 387 c array[int8] 388 } 389 390 s2 { 391 d array[int8] 392 } 393 394 s3 { 395 # This refers to the array c in the parent s1. 396 e len[s1:c, int32] 397 # This refers to the array d in the sibling s2. 398 f len[s1:a:d, int32] 399 # This refers to the array k in the child s4. 400 g len[i:j, int32] 401 # This refers to syscall argument l. 402 h len[syscall:l, int32] 403 i ptr[in, s4] 404 } 405 406 s4 { 407 j array[int8] 408 } 409 410 foo(k ptr[in, s1], l ptr[in, array[int8]]) 411 ``` 412 413 ## Proc 414 415 The `proc` type can be used to denote per process integers. 416 The idea is to have a separate range of values for each executor, so they don't interfere. 417 418 The simplest example is a port number. 419 The `proc[20000, 4, int16be]` type means that we want to generate an `int16be` 420 integer starting from `20000` and assign `4` values for each process. 421 As a result the executor number `n` will get values in the `[20000 + n * 4, 20000 + (n + 1) * 4)` range. 422 423 ## Integer Constants 424 425 Integer constants can be specified as decimal literals, as `0x`-prefixed 426 hex literals, as `'`-surrounded char literals, or as symbolic constants 427 extracted from kernel headers or defined by `define` directives. For example: 428 429 ``` 430 foo(a const[10], b const[-10]) 431 foo(a const[0xabcd]) 432 foo(a int8['a':'z']) 433 foo(a const[PATH_MAX]) 434 foo(a int32[PATH_MAX]) 435 foo(a ptr[in, array[int8, MY_PATH_MAX]]) 436 define MY_PATH_MAX PATH_MAX + 2 437 ``` 438 439 ## Conditional fields 440 441 ### In structures 442 443 In syzlang, it's possible to specify a condition for every struct field that 444 determines whether the field should be included or omitted: 445 446 ``` 447 header_fields { 448 magic const[0xabcd, int16] 449 haveInteger int8 450 } [packed] 451 452 packet { 453 header header_fields 454 integer int64 (if[value[header:haveInteger] == 0x1]) 455 body array[int8] 456 } [packed] 457 458 some_call(a ptr[in, packet]) 459 ``` 460 461 In this example, the `packet` structure will include the field `integer` only 462 if `header.haveInteger == 1`. In memory, `packet` will have the following 463 layout: 464 465 | header.magic = 0xabcd | header.haveInteger = 0x1 | integer | body | 466 | --------------------- | ------------------------ | ------- | ---- | 467 468 That corresponds to e.g. the following program: 469 470 ``` 471 some_call(&AUTO={{AUTO, 0x1}, @value=0xabcd, []}) 472 ``` 473 474 If `header.haveInteger` is not `1`, syzkaller will just pretend that the field 475 `integer` does not exist. 476 477 ``` 478 some_call(&AUTO={{AUTO, 0x0}, @void, []}) 479 ``` 480 481 | header.magic = 0xabcd | header.haveInteger = 0x0 | body | 482 | --------------------- | ------------------------ | ---- | 483 484 Every conditional field is assumed to be of variable length and so is the struct 485 to which this field belongs. 486 487 When a variable length field appears in the middle of a structure, the structure 488 must be marked with `[packed].` 489 490 Conditions on bitfields are prohibited: 491 492 ``` 493 struct { 494 f0 int 495 f1 int:3 (if[value[f0] == 0x1]) # It will not compile. 496 } 497 ``` 498 499 But you may reference bitfields in your conditions: 500 501 ``` 502 struct { 503 f0 int:1 504 f1 int:7 505 f2 int (if[value[f0] == value[f1]]) 506 } [packed] 507 ``` 508 509 ### In unions 510 511 Let's consider the following example. 512 513 ``` 514 struct { 515 type int 516 body alternatives 517 } 518 519 alternatives [ 520 int int64 (if[value[struct:type] == 0x1]) 521 arr array[int64, 5] (if[value[struct:type] == 0x2]) 522 default int32 523 ] [varlen] 524 525 some_call(a ptr[in, struct]) 526 ``` 527 528 In this case, the union option will be selected depending on the value of the 529 `type` field. For example, if `type` is `0x1`, then it can be either `int` or 530 `default`: 531 532 ``` 533 some_call(&AUTO={0x1, @int=0x123}) 534 some_call(&AUTO={0x1, @default=0x123}) 535 ``` 536 537 If `type` is `0x2`, it can be either `arr` or `default`. 538 539 If `type` is neither `0x1` nor `0x2`, syzkaller may only select `default`: 540 541 ``` 542 some_call(&AUTO={0x0, @default=0xabcd}) 543 ``` 544 545 To ensure that a union can always be constructed, the last union field **must always 546 have no condition**. 547 548 Thus, the following definition would fail to compile: 549 550 ``` 551 alternatives [ 552 int int64 (if[value[struct:type] == 0x1]) 553 arr array[int64, 5] (if[value[struct:type] == 0x1]) 554 ] [varlen] 555 ``` 556 557 During prog mutation and generation syzkaller will select a random union field 558 whose condition is satisfied. 559 560 ### Expression syntax 561 562 Currently, only `==`, `!=`, `&` and `||` operators are supported. However, the 563 functionality was designed in such a way that adding more operators is easy. 564 Feel free to file a GitHub issue or write us an email in case it's needed. 565 566 Expressions are evaluated as `int64` values. If the final result of an 567 expression is not 0, it's assumed to be satisfied. 568 569 If you want to reference a field's value, you can do it via 570 `value[path:to:field]`, which is similar to the `len[]` argument. 571 572 ``` 573 sub_struct { 574 f0 int 575 # Reference a field in a parent struct. 576 f1 int (if[value[struct:f2]]) # Same as if[value[struct:f2] != 0]. 577 } 578 579 struct { 580 f2 int 581 f3 sub_struct 582 f4 int (if[value[f2] == 0x2]) # Reference a sibling field. 583 f5 int (if[value[f3:f0] == 0x1]) # Reference a nested field. 584 f6 int (if[value[f3:f0] == 0x1 || value[f3:f0] == 0x2]) # Reference a nested field which either equals to 0x1 or 0x2. 585 } [packed] 586 587 call(a ptr[in, struct]) 588 ``` 589 590 The referenced field must be of integer type and there must be no 591 conditional fields in the path to it. For example, the following 592 descriptions will not compile. 593 594 ``` 595 struct { 596 f0 int 597 f1 int (if[value[f0] == 0x1]) 598 f2 int (if[value[f1] == 0x1]) 599 } 600 ``` 601 602 You may also reference constants in expressions: 603 604 ``` 605 struct { 606 f0 int 607 f1 int 608 f2 int (if[value[f0] & SOME_CONST == OTHER_CONST]) 609 } 610 ``` 611 612 ## Meta 613 614 Description files can also contain `meta` directives that specify meta-information for the whole file. 615 616 ``` 617 meta noextract 618 ``` 619 620 Tells `make extract` to not extract constants for this file. 621 Though, `syz-extract` can still be invoked manually on this file. 622 623 ``` 624 meta arches["arch1", "arch2"] 625 ``` 626 627 Restricts this file only to the given set of architectures. 628 `make extract` and `make generate` will not use it on other architectures. 629 630 ## Misc 631 632 Description files also contain `include` directives that refer to Linux kernel header files, 633 `incdir` directives that refer to custom Linux kernel header directories 634 and `define` directives that define symbolic constant values. 635 636 The syzkaller executor defines some [pseudo system calls](./pseudo_syscalls.md) 637 that can be used as any other syscall in a description file. These pseudo 638 system calls expand to literal C code and can perform user-defined 639 custom actions. You can find some examples in 640 [executor/common_linux.h](../executor/common_linux.h). 641 642 Also see [tips](syscall_descriptions.md#tips) on writing good descriptions.