github.com/ipld/go-ipld-prime@v0.21.0/schema/gen/go/HACKME_tradeoffs.md

github.com/ipld/go-ipld-prime@v0.21.0/schema/gen/go/HACKME_tradeoffs.md (about)

     1  tradeoffs and design decisions in codegen
     2  =========================================
     3  
     4  In creating codegen for IPLD, as with any piece of software,
     5  there are design decisions to be made, and tradeoffs to be considered.
     6  
     7  Some of these decisions and tradeoffs are particularly interesting
     8  in IPLD codegen because they're:
     9  
    10  - significantly different resolutions and answers than the same decisions for non-codegen Node implementations
    11  - able to make significantly different choices, expanding the decision space dimentionality, since they have more information before compile-time
    12  - can reach higher upper bounds of performance, due to that pre-compile-time foothold
    13  - have correspondingly less flexibility in many ways because of the same.
    14  
    15  
    16  values we can balance
    17  ---------------------
    18  
    19  Let's enumerate the things we can balance (and give them some short reference codes):
    20  
    21  - AS: assembly/binary/final-shipping size, in bytes
    22  - BM: builder memory, in bytes, used as long as a NodeBuilder is in use
    23  - SP: execution speed, in seconds, especially of NodeBuilder in deserialization use
    24  - AC: allocations, in count, needed for operations (though in truth, this is just a proxy for SP due to its outsized impact there)
    25  - ERG: ergonomics, as an ineffable, ellusive-to-measurement sort of vibe of the thing, and how well it self-explains use and deters erroneous application
    26  - GLOC: generated lines of code, as a line count or in bytes, of interest because it may be of noticable cost in version control weight
    27  
    28  This list is in particular regarding concerns that come to light in considering performant deserialization operations...
    29  however, it's fairly representative of general use as well:
    30  traversals and serialization are generally easier situations to handle (they essentially get to skip the "BM" term);
    31  and while different operations might encounter different scalars for how much these different values affect them,
    32  as we'll see in the prioritization coming up in the next section... that turns out not to matter for our priorities.
    33  
    34  We can also other code which knows it's addressing generated code can use special methods,
    35    which means we can in a way disregard its effect on this ordering (mostly).
    36  
    37  Side note: though "AC" is *mostly* just a proxy for SP,
    38  AC can also count on its own in *addition* to SP because it increases the *variance* in SP.
    39  (But we don't often regard this; it's a pretty fine detail, and the goal is "minimize" either way.)
    40  
    41  
    42  prioritization of those values
    43  ------------------------------
    44  
    45  The designs here emerge from `SP > BM > AS`.
    46  
    47  More fully: `SP|AC > BM > ERG > AS > GLOC`.
    48  
    49  In other words: speed is the overwhelming priority;
    50  thereafter, we'd like to conserve memory (but will readily sell it for speed);
    51  ergonomics takes a side seat to both of these (the intention being that we can add 'porcelain' layers separately later);
    52  assembly size is a concern but fourth fiddle (if this is your dominant concern, you may not want to use codegen, or may want a different library implementation that aims at the same specs);
    53  and generated code size is a concern but we'll trade it away for any of the other priorities
    54  (because it's a cost that doesn't end up affecting final users of products built with this system!).
    55  
    56  (Some caveats: it's still possible to consider it a red flag if ratios on these get wild.
    57  For example if BM gets > 2x, it's questionable;
    58  and at some point we could imagine saying that AS has really gotten out of hand.)
    59  
    60  (BM also has some special conditions such that if it increases on recursive kinds, but not on scalars,
    61  we regard that as roughly half price, because generally most of a tree is leaves.
    62  (As it happens, though, this has turned out not to change any results much.))
    63  
    64  "Ergonomics" remains a tricky to account for.
    65  It's true that when push comes to shove, speed and memory economy win.
    66  But it's not at all single-dimentional; and with codegen, there are many options
    67  which set a higher bar for all three concerns at the same time.
    68  (In contrast, there's a stark upper limit to the ergonomic possbilities for
    69  non-codegen no-schema handling of data -- code handling the data model has
    70  the limits that its monomorphized approach imposes on it, and there's little
    71  that can be done to avoid or improve upon that.)