github.com/google/syzkaller@v0.0.0-20240517125934-c0f1611a36d6/docs/syscall_descriptions_syntax.md (about) 1 # Syscall description language 2 aka `syzlang` (`[siːzˈlæŋg]`) 3 4 Pseudo-formal grammar of syscall description: 5 6 ``` 7 syscallname "(" [arg ["," arg]*] ")" [type] ["(" attribute* ")"] 8 arg = argname type 9 argname = identifier 10 type = typename [ "[" type-options "]" ] 11 typename = "const" | "intN" | "intptr" | "flags" | "array" | "ptr" | 12 "string" | "strconst" | "filename" | "glob" | "len" | 13 "bytesize" | "bytesizeN" | "bitsize" | "vma" | "proc" | 14 "compressed_image" 15 type-options = [type-opt ["," type-opt]] 16 ``` 17 18 common type-options include: 19 20 ``` 21 "opt" - the argument is optional (like mmap fd argument, or accept peer argument) 22 ``` 23 24 rest of the type-options are type-specific: 25 26 ``` 27 "const": integer constant, type-options: 28 value, underlying type (one of "intN", "intptr") 29 "intN"/"intptr": an integer without a particular meaning, type-options: 30 either an optional range of values (e.g. "5:10", or "100:200") 31 or a reference to flags description (see below), 32 or a single value 33 optionally followed by an alignment parameter if using a range 34 "flags": a set of values, type-options: 35 reference to flags description (see below), underlying int type (e.g. "int32") 36 "array": a variable/fixed-length array, type-options: 37 type of elements, optional size (fixed "5", or ranged "5:10", boundaries inclusive) 38 "ptr"/"ptr64": a pointer to an object, type-options: 39 direction (in/out/inout); type of the object 40 ptr64 has size of 8 bytes regardless of target pointer size 41 "string": a zero-terminated memory buffer (no pointer indirection implied), type-options: 42 either a string value in quotes for constant strings (e.g. "foo" or `deadbeef` for hex literal), 43 or a reference to string flags (special value `filename` produces file names), 44 optionally followed by a buffer size (string values will be padded with \x00 to that size) 45 "stringnoz": a non-zero-terminated memory buffer (no pointer indirection implied), type-options: 46 either a string value in quotes for constant strings (e.g. "foo" or `deadbeef` for hex literal), 47 or a reference to string flags, 48 "glob": glob pattern to match on the target files, type-options: 49 a pattern string in quotes (syntax: https://golang.org/pkg/path/filepath/#Match) 50 (e.g. "/sys/" or "/sys/**/*"), 51 or include exclude glob too (e.g. "/sys/**/*:-/sys/power/state") 52 "fmt": a string representation of an integer (not zero-terminated), type-options: 53 format (one of "dec", "hex", "oct") and the value (a resource, int, flags, const or proc) 54 the resulting data is always fixed-size (formatted as "%020llu", "0x%016llx" or "%023llo", respectively) 55 "len": length of another field (for array it is number of elements), type-options: 56 argname of the object 57 "bytesize": similar to "len", but always denotes the size in bytes, type-options: 58 argname of the object 59 "bitsize": similar to "len", but always denotes the size in bits, type-options: 60 argname of the object 61 "offsetof": offset of the field from the beginning of the parent struct, type-options: 62 field 63 "vma"/"vma64": a pointer to a set of pages (used as input for mmap/munmap/mremap/madvise), type-options: 64 optional number of pages (e.g. vma[7]), or a range of pages (e.g. vma[2-4]) 65 vma64 has size of 8 bytes regardless of target pointer size 66 "proc": per process int (see description below), type-options: 67 value range start, how many values per process, underlying type 68 "compressed_image": zlib-compressed disk image 69 syscalls accepting compressed images must be marked with `no_generate` 70 and `no_minimize` call attributes. 71 "text": machine code of the specified type, type-options: 72 text type (x86_real, x86_16, x86_32, x86_64, arm64) 73 "void": type with static size 0 74 mostly useful inside of templates and varlen unions, can't be syscall argument 75 ``` 76 77 flags/len/flags also have trailing underlying type type-option when used in structs/unions/pointers. 78 79 Flags are described as: 80 81 ``` 82 flagname = const ["," const]* 83 ``` 84 85 or for string flags as: 86 87 ``` 88 flagname = "\"" literal "\"" ["," "\"" literal "\""]* 89 ``` 90 91 Call attributes are: 92 93 ``` 94 "disabled": the call will not be used in fuzzing; useful to temporary disable some calls 95 or prohibit particular argument combinations. 96 "timeout[N]": additional execution timeout (in ms) for the call on top of some default value 97 "prog_timeout[N]": additional execution timeout (in ms) for the whole program if it contains this call; 98 if a program contains several such calls, the max value is used. 99 "ignore_return": ignore return value of this syscall in fallback feedback; need to be used for calls 100 that don't return fixed error codes but rather something else (e.g. the current time). 101 "breaks_returns": ignore return values of all subsequent calls in the program in fallback feedback (can't be trusted). 102 "no_generate": do not try to generate this syscall, i.e. use only seed descriptions to produce it. 103 "no_minimize": do not modify instances of this syscall when trying to minimize a crashing program. 104 ``` 105 106 ## Ints 107 108 `int8`, `int16`, `int32` and `int64` denote an integer of the corresponding size. 109 `intptr` denotes a pointer-sized integer, i.e. C `long` type. 110 111 By appending `be` suffix (e.g. `int16be`) integers become big-endian. 112 113 It's possible to specify a range of values for an integer in the format of `int32[0:100]` or `int32[0:4096, 512]` for a 512-aligned int. 114 115 Integers can also take a reference to flags description or a value as its first type-option. 116 In that case, the alignment parameter is not supported. 117 118 To denote a bitfield of size N use `int64:N`. 119 120 It's possible to use these various kinds of ints as base types for `const`, `flags`, `len` and `proc`. 121 122 ``` 123 example_struct { 124 f0 int8 # random 1-byte integer 125 f1 const[0x42, int16be] # const 2-byte integer with value 0x4200 (big-endian 0x42) 126 f2 int32[0:100] # random 4-byte integer with values from 0 to 100 inclusive 127 f3 int32[1:10, 2] # random 4-byte integer with values {1, 3, 5, 7, 9} 128 f4 int64:20 # random 20-bit bitfield 129 f5 int8[10] # const 1-byte integer with value 10 130 f6 int32[flagname] # random 4-byte integer from the set of values referenced by flagname 131 } 132 ``` 133 134 ## Structs 135 136 Structs are described as: 137 138 ``` 139 structname "{" "\n" 140 (fieldname type ("(" fieldattribute* ")")? (if[expression])? "\n")+ 141 "}" ("[" attribute* "]")? 142 ``` 143 144 Fields can have attributes specified in parentheses after the field, independent 145 of their type. `in/out/inout` attribute specify per-field direction, for example: 146 147 ``` 148 foo { 149 field0 const[1, int32] (in) 150 field1 int32 (inout) 151 field2 fd (out) 152 } 153 ``` 154 155 You may specify conditions that determine whether a field will be included: 156 157 ``` 158 foo { 159 field0 int32 160 field1 int32 (if[value[field0] == 0x1]) 161 } 162 ``` 163 164 See [the corresponding section](syscall_descriptions_syntax.md#conditional-fields) 165 for more details. 166 167 `out_overlay` attribute allows to have separate input and output layouts for the struct. 168 Fields before the `out_overlay` field are input, fields starting from `out_overlay` are output. 169 Input and output fields overlap in memory (both start from the beginning of the struct in memory). 170 For example: 171 172 ``` 173 foo { 174 in0 const[1, int32] 175 in1 flags[bar, int8] 176 in2 ptr[in, string] 177 out0 fd (out_overlay) 178 out1 int32 179 } 180 ``` 181 182 Structs can have attributes specified in square brackets after the struct. 183 Attributes are: 184 185 - `packed`: the struct does not have paddings between fields and has alignment 1; this is similar to GNU C `__attribute__((packed))`; struct alignment can be overriden with `align` attribute 186 - `align[N]`: the struct has alignment N and padded up to multiple of `N`; contents of the padding are unspecified (though, frequently are zeros); similar to GNU C `__attribute__((aligned(N)))` 187 - `size[N]`: the struct is padded up to the specified size `N`; contents of the padding are unspecified (though, frequently are zeros) 188 189 ## Unions 190 191 Unions are described as: 192 193 ``` 194 unionname "[" "\n" 195 (fieldname type (if[expression])? "\n")+ 196 "]" ("[" attribute* "]")? 197 ``` 198 199 During fuzzing, syzkaller randomly picks one of the union options. 200 201 You may also specify conditions that determine whether the corresponding 202 option may or may not be selected, depending on values of other fields. See 203 [the corresponding section](syscall_descriptions_syntax.md#conditional-fields) 204 for more details. 205 206 Unions can have attributes specified in square brackets after the union. 207 Attributes are: 208 209 - `varlen`: union size is the size of the particular chosen option (not statically known); without this attribute unions are statically sized as maximum of all options (similar to C unions) 210 - `size[N]`: the union is padded up to the specified size `N`; contents of the padding are unspecified (though, frequently are zeros) 211 212 ## Resources 213 214 Resources represent values that need to be passed from output of one syscall to input of another syscall. For example, `close` syscall requires an input value (fd) previously returned by `open` or `pipe` syscall. To achieve this, `fd` is declared as a resource. This is a way of modelling dependencies between syscalls, as defining a syscall as the producer of a resource and another syscall as the consumer defines a loose sense of ordering between them. Resources are described as: 215 216 ``` 217 "resource" identifier "[" underlying_type "]" [ ":" const ("," const)* ] 218 ``` 219 220 `underlying_type` is either one of `int8`, `int16`, `int32`, `int64`, `intptr` or another resource (which models inheritance, for example, a socket is a subtype of fd). The optional set of constants represent resource special values, for example, `0xffffffffffffffff` (-1) for "no fd", or `AT_FDCWD` for "the current dir". Special values are used once in a while as resource values. If no special values specified, special value of `0` is used. Resources can then be used as types, for example: 221 222 ``` 223 resource fd[int32]: 0xffffffffffffffff, AT_FDCWD, 1000000 224 resource sock[fd] 225 resource sock_unix[sock] 226 227 socket(...) sock 228 accept(fd sock, ...) sock 229 listen(fd sock, backlog int32) 230 ``` 231 232 Resources don't have to be necessarily returned by a syscall. They can be used as any other data type. For example: 233 234 ``` 235 resource my_resource[int32] 236 237 request_producer(..., arg ptr[out, my_resource]) 238 request_consumer(..., arg ptr[inout, test_struct]) 239 240 test_struct { 241 ... 242 attr my_resource 243 } 244 ``` 245 246 For more complex producer/consumer scenarios, field attributes can be utilized. 247 For example: 248 249 ``` 250 resource my_resource_1[int32] 251 resource my_resource_2[int32] 252 253 request_produce1_consume2(..., arg ptr[inout, test_struct]) 254 255 test_struct { 256 ... 257 field0 my_resource_1 (out) 258 field1 my_resource_2 (in) 259 } 260 ``` 261 262 Each resource type must be "produced" (used as an output) by at least one syscall 263 (outside of unions and optional pointers) and "consumed" (used as an input) 264 by at least one syscall. 265 266 ## Type Aliases 267 268 Complex types that are often repeated can be given short type aliases using the 269 following syntax: 270 271 ``` 272 type identifier underlying_type 273 ``` 274 275 For example: 276 277 ``` 278 type signalno int32[0:65] 279 type net_port proc[20000, 4, int16be] 280 ``` 281 282 Then, type alias can be used instead of the underlying type in any contexts. 283 Underlying type needs to be described as if it's a struct field, that is, 284 with the base type if it's required. However, type alias can be used as syscall 285 arguments as well. Underlying types are currently restricted to integer types, 286 `ptr`, `ptr64`, `const`, `flags` and `proc` types. 287 288 There are some builtin type aliases: 289 ``` 290 type bool8 int8[0:1] 291 type bool16 int16[0:1] 292 type bool32 int32[0:1] 293 type bool64 int64[0:1] 294 type boolptr intptr[0:1] 295 296 type fileoff[BASE] BASE 297 298 type filename string[filename] 299 300 type buffer[DIR] ptr[DIR, array[int8]] 301 ``` 302 303 ## Type Templates 304 305 Type templates can be declared as follows: 306 ``` 307 type buffer[DIR] ptr[DIR, array[int8]] 308 type fileoff[BASE] BASE 309 type nlattr[TYPE, PAYLOAD] { 310 nla_len len[parent, int16] 311 nla_type const[TYPE, int16] 312 payload PAYLOAD 313 } [align_4] 314 ``` 315 316 and later used as follows: 317 ``` 318 syscall(a buffer[in], b fileoff[int64], c ptr[in, nlattr[FOO, int32]]) 319 ``` 320 321 There is builtin type template `optional` defined as: 322 ``` 323 type optional[T] [ 324 val T 325 void void 326 ] [varlen] 327 ``` 328 329 ## Length 330 331 You can specify length of a particular field in struct or a named argument by 332 using `len`, `bytesize` and `bitsize` types, for example: 333 334 ``` 335 write(fd fd, buf ptr[in, array[int8]], count len[buf]) 336 337 sock_fprog { 338 len len[filter, int16] 339 filter ptr[in, array[sock_filter]] 340 } 341 ``` 342 343 If `len`'s argument is a pointer, then the length of the pointee argument is used. 344 345 To denote the length of a field in N-byte words use `bytesizeN`, possible values 346 for N are 1, 2, 4 and 8. 347 348 To denote the length of the parent struct, you can use `len[parent, int8]`. 349 To denote the length of the higher level parent when structs are embedded into 350 one another, you can specify the type name of the particular parent: 351 352 ``` 353 s1 { 354 f0 len[s2] # length of s2 355 } 356 357 s2 { 358 f0 s1 359 f1 array[int32] 360 f2 len[parent, int32] 361 } 362 ``` 363 364 `len` argument can also be a path expression which allows more complex 365 addressing. Path expressions are similar to C field references, but also allow 366 referencing parent and sibling elements. A special reference `syscall` used 367 in the beginning of the path allows to refer directly to the syscall arguments. 368 For example: 369 370 ``` 371 s1 { 372 a ptr[in, s2] 373 b ptr[in, s3] 374 c array[int8] 375 } 376 377 s2 { 378 d array[int8] 379 } 380 381 s3 { 382 # This refers to the array c in the parent s1. 383 e len[s1:c, int32] 384 # This refers to the array d in the sibling s2. 385 f len[s1:a:d, int32] 386 # This refers to the array k in the child s4. 387 g len[i:j, int32] 388 # This refers to syscall argument l. 389 h len[syscall:l, int32] 390 i ptr[in, s4] 391 } 392 393 s4 { 394 j array[int8] 395 } 396 397 foo(k ptr[in, s1], l ptr[in, array[int8]]) 398 ``` 399 400 ## Proc 401 402 The `proc` type can be used to denote per process integers. 403 The idea is to have a separate range of values for each executor, so they don't interfere. 404 405 The simplest example is a port number. 406 The `proc[20000, 4, int16be]` type means that we want to generate an `int16be` 407 integer starting from `20000` and assign `4` values for each process. 408 As a result the executor number `n` will get values in the `[20000 + n * 4, 20000 + (n + 1) * 4)` range. 409 410 ## Integer Constants 411 412 Integer constants can be specified as decimal literals, as `0x`-prefixed 413 hex literals, as `'`-surrounded char literals, or as symbolic constants 414 extracted from kernel headers or defined by `define` directives. For example: 415 416 ``` 417 foo(a const[10], b const[-10]) 418 foo(a const[0xabcd]) 419 foo(a int8['a':'z']) 420 foo(a const[PATH_MAX]) 421 foo(a int32[PATH_MAX]) 422 foo(a ptr[in, array[int8, MY_PATH_MAX]]) 423 define MY_PATH_MAX PATH_MAX + 2 424 ``` 425 426 ## Conditional fields 427 428 ### In structures 429 430 In syzlang, it's possible to specify a condition for every struct field that 431 determines whether the field should be included or omitted: 432 433 ``` 434 header_fields { 435 magic const[0xabcd, int16] 436 haveInteger int8 437 } [packed] 438 439 packet { 440 header header_fields 441 integer int64 (if[value[header:haveInteger] == 0x1]) 442 body array[int8] 443 } [packed] 444 445 some_call(a ptr[in, packet]) 446 ``` 447 448 In this example, the `packet` structure will include the field `integer` only 449 if `header.haveInteger == 1`. In memory, `packet` will have the following 450 layout: 451 452 | header_files.magic = 0xabcd | header_files.haveInteger = 0x1 | integer | body | 453 | - | - | - | - | 454 455 456 That corresponds to e.g. the following program: 457 ``` 458 some_call(&AUTO={{AUTO, 0x1}, @value=0xabcd, []}) 459 ``` 460 461 If `header.haveInteger` is not `1`, syzkaller will just pretend that the field 462 `integer` does not exist. 463 ``` 464 some_call(&AUTO={{AUTO, 0x0}, @void, []}) 465 ``` 466 467 | header_files.magic = 0xabcd | header_files.haveInteger = 0x0 | body | 468 | - | - | - | 469 470 Every conditional field is assumed to be of variable length and so is the struct 471 to which this field belongs. 472 473 When a variable length field appears in the middle of a structure, the structure 474 must be marked with `[packed].` 475 476 Conditions on bitfields are prohibited: 477 ``` 478 struct { 479 f0 int 480 f1 int:3 (if[value[f0] == 0x1]) # It will not compile. 481 } 482 ``` 483 484 But you may reference bitfields in your conditions: 485 ``` 486 struct { 487 f0 int:1 488 f1 int:7 489 f2 int (if[value[f0] == value[f1]]) 490 } [packed] 491 ``` 492 493 ### In unions 494 495 Let's consider the following example. 496 497 ``` 498 struct { 499 type int 500 body alternatives 501 } 502 503 alternatives [ 504 int int64 (if[value[struct:type] == 0x1]) 505 arr array[int64, 5] (if[value[struct:type] == 0x2]) 506 default int32 507 ] [varlen] 508 509 some_call(a ptr[in, struct]) 510 ``` 511 512 In this case, the union option will be selected depending on the value of the 513 `type` field. For example, if `type` is `0x1`, then it can be either `int` or 514 `default`: 515 ``` 516 some_call(&AUTO={0x1, @int=0x123}) 517 some_call(&AUTO={0x1, @default=0x123}) 518 ``` 519 520 If `type` is `0x2`, it can be either `arr` or `default`. 521 522 If `type` is neither `0x1` nor `0x2`, syzkaller may only select `default`: 523 ``` 524 some_call(&AUTO={0x0, @default=0xabcd}) 525 ``` 526 527 To ensure that a union can always be constructed, the last union field **must always 528 have no condition**. 529 530 Thus, the following definition would fail to compile: 531 532 ``` 533 alternatives [ 534 int int64 (if[value[struct:type] == 0x1]) 535 arr array[int64, 5] (if[value[struct:type] == 0x1]) 536 ] [varlen] 537 ``` 538 539 During prog mutation and generation syzkaller will select a random union field 540 whose condition is satisfied. 541 542 543 ### Expression syntax 544 545 Currently, only `==`, `!=` and `&` operators are supported. However, the 546 functionality was designed in such a way that adding more operators is easy. 547 Feel free to file a GitHub issue or write us an email in case it's needed. 548 549 Expressions are evaluated as `int64` values. If the final result of an 550 expression is not 0, it's assumed to be satisfied. 551 552 If you want to reference a field's value, you can do it via 553 `value[path:to:field]`, which is similar to the `len[]` argument. 554 555 ``` 556 sub_struct { 557 f0 int 558 # Reference a field in a parent struct. 559 f1 int (if[value[struct:f2]]) # Same as if[value[struct:f2] != 0]]. 560 } 561 562 struct { 563 f2 int 564 f3 sub_struct 565 f4 int (if[value[f2] == 0x2]) # Reference a sibling field. 566 f5 int (if[value[f3:f0] == 0x1]) # Reference a nested field. 567 } [packed] 568 569 call(a ptr[in, struct]) 570 ``` 571 572 The referenced field must be of integer type and there must be no 573 conditional fields in the path to it. For example, the following 574 descriptions will not compile. 575 576 ``` 577 struct { 578 f0 int 579 f1 int (if[value[f0] == 0x1]) 580 f2 int (if[value[f1] == 0x1]) 581 } 582 ``` 583 584 You may also reference constants in expressions: 585 ``` 586 struct { 587 f0 int 588 f1 int 589 f2 int (if[value[f0] & SOME_CONST == OTHER_CONST]) 590 } 591 ``` 592 593 ## Meta 594 595 Description files can also contain `meta` directives that specify meta-information for the whole file. 596 597 ``` 598 meta noextract 599 ``` 600 Tells `make extract` to not extract constants for this file. 601 Though, `syz-extract` can still be invoked manually on this file. 602 603 ``` 604 meta arches["arch1", "arch2"] 605 ``` 606 Restricts this file only to the given set of architectures. 607 `make extract` and `make generate` will not use it on other architectures. 608 609 ## Misc 610 611 Description files also contain `include` directives that refer to Linux kernel header files, 612 `incdir` directives that refer to custom Linux kernel header directories 613 and `define` directives that define symbolic constant values. 614 615 The syzkaller executor defines some [pseudo system calls](./pseudo_syscalls.md) 616 that can be used as any other syscall in a description file. These pseudo 617 system calls expand to literal C code and can perform user-defined 618 custom actions. You can find some examples in 619 [executor/common_linux.h](../executor/common_linux.h). 620 621 Also see [tips](syscall_descriptions.md#tips) on writing good descriptions.