github.com/gotranspile/cxgo@v0.3.7/docs/architecture.md (about) 1 # Architecture 2 3 On a higher level, `cxgo` works like a regular C compiler. It operates on a "translation unit" (TU) level, meaning that 4 it considers a single file at a time with all included files concatenated. 5 6 As a regular C compiler, it runs a preprocessor and then parses the output to generate C code AST. 7 This part of the work is done by [cc](https://modernc.org/cc/v3), a C compiler frontend written in Go. 8 9 The C AST produced by `cc` is then converted by `cxgo` into a Go equivalent. `cxgo` uses a custom AST to be able to 10 represent both C concepts and Go concepts at the same time. Most of the decisions are taken when the translator 11 reaches a specific AST node. 12 13 Although `cc` type-checks the AST, `cxgo` does a separate type-check pass, adhering to Go rules this time. This allows 14 us to add missing casts, convert to/from `unsafe.Pointer`, etc. AST might be slightly changed at this stage, because 15 `cxgo` may need to insert helper calls to our C runtime, materialize function literals for expressions unsupported in Go, etc. 16 17 When the type check is done, a postprocessing step on the resulting AST is run. This step will make structural adjustments 18 to the AST, for example, it may adapt the `main` function to the Go standard, add implicit returns and fix `goto`. 19 20 After postprocessing completes, `cxgo` emits Go declarations for a specific C file. And the process repeats for the next TU. 21 22 Having this in mind, there are a few details missing in this explanation: 23 24 - How include files are found? 25 - How the mapping to the Go stdlib is done? 26 - How the common C pattern are rewritten to Go? 27 - How Go `string` is used? 28 - What about slices? 29 30 The following sections will provide more details on those. For even more details, please refer to [C quirks](quirks.md). 31 32 ## Include files lookup 33 34 `cxgo` looks up include files similar to a regular C compiler, except that the include path are set in the config, 35 instead of an environment. 36 37 To make things easier, `cxgo` automatically adds `./include` and `./includes` to a list of lookup paths. This is helpful 38 if you need to override a specific header present on the host system. 39 40 Another interesting thick that `cxgo` uses is a virtual include filesystem. Thanks to FS hooks in `cc`, `cxgo` emulates 41 a directory at `/_cxgo_overrides`, which contain customized C stdlib headers bundled into `cxgo`. 42 43 It serves two purposes: provides a zero-config experience for common use cases and allows `cxgo` to implement C stdlib 44 differently and adapt it to the needs of Go. 45 46 ## Stdlib mapping 47 48 `cxgo` implements a C stdlib mapping mechanism based on the VFS discussed above. Each virtual header file may also declare 49 a set of identifier-level overrides for types, functions and variables. 50 51 For example, we can take C `exit` from `stdlib.h` and define an override for this identifier to have Go name `os.Exit`. 52 53 This sounds trivial, but is a very powerful mechanism when combined with VFS. 54 55 For example, C has no notion of struct methods, right? But having control on the stdlib header content we can easily 56 emulate methods! Consider this example for `FILE` and `close` method: 57 58 ```c 59 typedef struct { 60 // ... 61 int (*Close)(void); 62 } FILE; 63 64 #define close(f) ((FILE*)f)->Close() 65 ``` 66 67 Every time C preprocessor sees `close` it is replaced with a call to a function pointer field on the argument. 68 At the same time, the override is set on `FILE` to resolve `Close` to a method, but omit it from the struct definition. 69 So at the end the type checker sees a perfectly valid indirection to a `FILE` struct field, leading to a function call. 70 And in Go, the expression is converted to a method call that is implemented by our C runtime. 71 72 The interesting consequence of full control of stdlib and C's implicit type casts is that it's possible to define a 73 different function signature in the virtual stdlib header and let `cxgo` to adapt types in a best possible way. 74 75 ## Rewriting C patterns to Go 76 77 There are different kinds of rewrite rules in `cxgo`. 78 79 In some cases, the stdlib override may define a Go function (e.g. `make`) and expose it to C with a different name (`_cxgo_go_make`). 80 Then, an AST translator has a hook that intercepts all function call AST nodes. If it matches a well-know pattern (`calloc(n, sizeof(T))`) 81 it will rewrite the AST node to a call to Go method with different arguments. 82 83 The second type of overrides is done on the statement level. Most statements are allowed to lead to more than one resulting 84 statement. When a pattern is recognized (e.g. `x = a ? 1 : 0`), the translator can emit multiple nodes that are semantically 85 the same, but are preferred in Go. Of course for each such case there is an ugly fallback. 86 87 ## Support for Go string 88 89 All string literals are converted directly to Go string literals. However, C expects zero-terminated string literals. 90 Again, due to the fact that `cxgo` fully controls its own C stdlib headers, we can easily define a custom `_cxgo_go_string` 91 type and use it where `const char*` is expected. `cxgo` type checker will then use helper functions to convert to/from 92 zero-terminated strings automatically. 93 94 Of course, this approach is not ideal in terms of performance, but `cxgo` [goals](../CONTRIBUTING.md#project-goals-and-principles) 95 doesn't include performance as a guiding principle. We'd rather help the user read the code and let him rewrite the bottlenecks 96 in a more idiomatic Go, instead of providing fast but unreadable code. 97 98 ## Support for slices 99 100 Although `cxgo` could in theory detect slice-like variables automatically, it doesn't implement this heuristic yet. 101 102 Instead, it allows user to [mark](config.md#identstype) specific struct fields, function arguments and variables as Go slices. 103 104 We admit that this is against our principle regarding "less human intervention", and fix is planned for the future. 105 106 For now, though, `cxgo` will check a list of user-defined overrides and will adjust all usages of the variable to use 107 slice-related features. Of course marking one variable as a slice may cause the user to be forced to mark dependant 108 variables as slices as well.