github.com/gotranspile/cxgo@v0.3.7/docs/architecture.md (about)

     1  # Architecture
     2  
     3  On a higher level, `cxgo` works like a regular C compiler. It operates on a "translation unit" (TU) level, meaning that
     4  it considers a single file at a time with all included files concatenated.
     5  
     6  As a regular C compiler, it runs a preprocessor and then parses the output to generate C code AST.
     7  This part of the work is done by [cc](https://modernc.org/cc/v3), a C compiler frontend written in Go.
     8  
     9  The C AST produced by `cc` is then converted by `cxgo` into a Go equivalent. `cxgo` uses a custom AST to be able to
    10  represent both C concepts and Go concepts at the same time. Most of the decisions are taken when the translator
    11  reaches a specific AST node.
    12  
    13  Although `cc` type-checks the AST, `cxgo` does a separate type-check pass, adhering to Go rules this time. This allows
    14  us to add missing casts, convert to/from `unsafe.Pointer`, etc. AST might be slightly changed at this stage, because
    15  `cxgo` may need to insert helper calls to our C runtime, materialize function literals for expressions unsupported in Go, etc.
    16  
    17  When the type check is done, a postprocessing step on the resulting AST is run. This step will make structural adjustments
    18  to the AST, for example, it may adapt the `main` function to the Go standard, add implicit returns and fix `goto`.
    19  
    20  After postprocessing completes, `cxgo` emits Go declarations for a specific C file. And the process repeats for the next TU.
    21  
    22  Having this in mind, there are a few details missing in this explanation:
    23  
    24  - How include files are found?
    25  - How the mapping to the Go stdlib is done?
    26  - How the common C pattern are rewritten to Go?
    27  - How Go `string` is used?
    28  - What about slices?
    29  
    30  The following sections will provide more details on those. For even more details, please refer to [C quirks](quirks.md).
    31  
    32  ## Include files lookup
    33  
    34  `cxgo` looks up include files similar to a regular C compiler, except that the include path are set in the config,
    35  instead of an environment.
    36  
    37  To make things easier, `cxgo` automatically adds `./include` and `./includes` to a list of lookup paths. This is helpful
    38  if you need to override a specific header present on the host system.
    39  
    40  Another interesting thick that `cxgo` uses is a virtual include filesystem. Thanks to FS hooks in `cc`, `cxgo` emulates
    41  a directory at `/_cxgo_overrides`, which contain customized C stdlib headers bundled into `cxgo`.
    42  
    43  It serves two purposes: provides a zero-config experience for common use cases and allows `cxgo` to implement C stdlib
    44  differently and adapt it to the needs of Go.
    45  
    46  ## Stdlib mapping
    47  
    48  `cxgo` implements a C stdlib mapping mechanism based on the VFS discussed above. Each virtual header file may also declare
    49  a set of identifier-level overrides for types, functions and variables.
    50  
    51  For example, we can take C `exit` from `stdlib.h` and define an override for this identifier to have Go name `os.Exit`.
    52  
    53  This sounds trivial, but is a very powerful mechanism when combined with VFS.
    54  
    55  For example, C has no notion of struct methods, right? But having control on the stdlib header content we can easily
    56  emulate methods! Consider this example for `FILE` and `close` method:
    57  
    58  ```c
    59  typedef struct {
    60  	// ...
    61  	int (*Close)(void);
    62  } FILE;
    63  
    64  #define close(f) ((FILE*)f)->Close()
    65  ```
    66  
    67  Every time C preprocessor sees `close` it is replaced with a call to a function pointer field on the argument.
    68  At the same time, the override is set on `FILE` to resolve `Close` to a method, but omit it from the struct definition.
    69  So at the end the type checker sees a perfectly valid indirection to a `FILE` struct field, leading to a function call.
    70  And in Go, the expression is converted to a method call that is implemented by our C runtime.
    71  
    72  The interesting consequence of full control of stdlib and C's implicit type casts is that it's possible to define a
    73  different function signature in the virtual stdlib header and let `cxgo` to adapt types in a best possible way.
    74  
    75  ## Rewriting C patterns to Go
    76  
    77  There are different kinds of rewrite rules in `cxgo`.
    78  
    79  In some cases, the stdlib override may define a Go function (e.g. `make`) and expose it to C with a different name (`_cxgo_go_make`).
    80  Then, an AST translator has a hook that intercepts all function call AST nodes. If it matches a well-know pattern (`calloc(n, sizeof(T))`)
    81  it will rewrite the AST node to a call to Go method with different arguments.
    82  
    83  The second type of overrides is done on the statement level. Most statements are allowed to lead to more than one resulting
    84  statement. When a pattern is recognized (e.g. `x = a ? 1 : 0`), the translator can emit multiple nodes that are semantically
    85  the same, but are preferred in Go. Of course for each such case there is an ugly fallback.
    86  
    87  ## Support for Go string
    88  
    89  All string literals are converted directly to Go string literals. However, C expects zero-terminated string literals.
    90  Again, due to the fact that `cxgo` fully controls its own C stdlib headers, we can easily define a custom `_cxgo_go_string`
    91  type and use it where `const char*` is expected. `cxgo` type checker will then use helper functions to convert to/from
    92  zero-terminated strings automatically.
    93  
    94  Of course, this approach is not ideal in terms of performance, but `cxgo` [goals](../CONTRIBUTING.md#project-goals-and-principles)
    95  doesn't include performance as a guiding principle. We'd rather help the user read the code and let him rewrite the bottlenecks
    96  in a more idiomatic Go, instead of providing fast but unreadable code.
    97  
    98  ## Support for slices
    99  
   100  Although `cxgo` could in theory detect slice-like variables automatically, it doesn't implement this heuristic yet.
   101  
   102  Instead, it allows user to [mark](config.md#identstype) specific struct fields, function arguments and variables as Go slices.
   103  
   104  We admit that this is against our principle regarding "less human intervention", and fix is planned for the future.
   105  
   106  For now, though, `cxgo` will check a list of user-defined overrides and will adjust all usages of the variable to use
   107  slice-related features. Of course marking one variable as a slice may cause the user to be forced to mark dependant
   108  variables as slices as well.