github.com/ipld/go-ipld-prime@v0.21.0/schema/gen/go/HACKME_scalars.md

github.com/ipld/go-ipld-prime@v0.21.0/schema/gen/go/HACKME_scalars.md (about)

     1  What's the deal with scalars, anyway?
     2  =====================================
     3  
     4  Two sorts of scalars
     5  --------------------
     6  
     7  There are two sorts of scalars that show up in codegen:
     8  
     9  - 1: scalars that are just the plain kind (e.g. "string", not even named);
    10  - 2: scalars that have named types.
    11  
    12  Plain scalars can't have any special rules or semantics attached to them.
    13  
    14  Named types with scalar kinds (aka a "typedef") **can** have additional rules and semantics attached to them.
    15  
    16  Let's talk about named scalars first, because it's clearer that there's fun there.
    17  
    18  
    19  ### named scalars
    20  
    21  Named scalars cause a type to be generated.
    22  That type information is part of their identity (practically speaking: affects their definition of equality).
    23  
    24  #### named scalars are never equal even if their contents are
    25  
    26  It stands to reason that named scalars can't be freely interchanged.
    27  
    28  If you have a schema:
    29  
    30  ```ipldsch
    31  type Foo string
    32  type Bar string
    33  ```
    34  
    35  ... then you'll get codegen output code with an exported type for each:
    36  
    37  ```go
    38  type Foo struct{ x string }
    39  /*...*/
    40  
    41  type Bar struct{ x string }
    42  /*...*/
    43  ```
    44  
    45  ... and clearly, `(Foo{"asdf"} == Bar{"asdf"}) == false`.
    46  
    47  #### named scalars appear in specialized method argument types and return types
    48  
    49  Just like any other named type, named scalars will appear in specialized methods
    50  which are exported on codegen'd types.
    51  
    52  For example, if you have a schema:
    53  
    54  ```ipldsch
    55  type Foo string
    56  type Bar string
    57  type Foomp map {Foo:Bar}
    58  ```
    59  
    60  ... then you'll get codegen output code which includes a method on Foomp:
    61  
    62  ```go
    63  func (x *Foomp) LookupByNode(k *Foo) (*Bar) { /*...*/ }
    64  ```
    65  
    66  Such specialized methods are often much shorter, much more efficient to execute,
    67  and involve much less error handling to use than their more generalized
    68  counterparts on the `datamodel.Node` interface.
    69  
    70  Note that when named scalars appear in the signitures of specialized methods,
    71  they always appear as pointers.  They will never be `nil`, but there is still
    72  a reason that pointers are used here, and it's based on performance.
    73  (The details don't matter as a user, but: it means if those values need to be
    74  regarded as the `datamodel.Node` interface again in the future, that boxing is
    75  inexpensive since we already have a (heap-escaped long ago) pointer.
    76  By contrast, copying by value in more places is likely to result in more
    77  heap escapes and thus additional undesirable new allocation costs in the
    78  (entirely common!) case that the values end up handled as `datamodel.Node` later.)
    79  
    80  #### named scalars have a specialized method which unboxes them to a native primitive unconditionally
    81  
    82  Every named scalar type as a specialized unbox method corresponding to its kind.
    83  
    84  For example, for a `type Foo string`, there will be a `func (f Foo) String() string` method
    85  (in addition to the `func (f Foo) AsString() (string, error)` method,
    86  which does the same thing but is stuck presenting an error due to interface conformance even though we know that it's statically impossible).
    87  
    88  #### named scalars can have additional methods attached to them
    89  
    90  It's possible for users of codegen to attach additional methods to the types
    91  generated for a named scalar.
    92  
    93  This can be either done for purely aesthetic/ergonomic purposes particular
    94  to the user's exact product, or, as part of some extended library features.
    95  For example, we plan support extended features like "validation" methods
    96  via detecting when a user adds a `Valdiate() error` method to a generated type.
    97  
    98  
    99  ### plain scalars
   100  
   101  Plain scalars also cause a type to be generated;
   102  one type for each kind in the Data Model is sufficient.
   103  
   104  Plain scalars show up in codegen output packages almost exactly as if
   105  there was a short preamble in every schema:
   106  
   107  ```ipldsch
   108  type Int int
   109  type Bool bool
   110  type Float float
   111  type String string
   112  type Bytes bytes
   113  ```
   114  
   115  #### note about schema syntax
   116  
   117  There's an issue about capitalization that's somewhat unresolved in schemas:
   118  namely, is `type Fwee struct { someField string }` allowed, or a parse error?
   119  
   120  This syntax is questionable because it means some of the scalar kind identifier
   121  keywords are allowed in the same place as type names,
   122  and it's potentially confusing because when we come to interacting with the
   123  generated output code in golang, we still have `String`-with-a-capital-S
   124  as a type identifier.
   125  
   126  At any rate, it seems clear that you can mentally capitalize the 's'
   127  at any time you see this debatable syntax.
   128  
   129  (We should resolve this issue in the specs, which are in the `datamodel.specs` repo.)
   130  
   131  #### plain scalars appear in specialized method argument types and return types
   132  
   133  This is the same story as for named scalars.
   134  
   135  For example, if you have a schema:
   136  
   137  ```ipldsch
   138  type Foomp map {String:String}
   139  ```
   140  
   141  ... then you'll get codegen output code which includes a method on Foomp:
   142  
   143  ```go
   144  func (x *Foomp) LookupByNode(k String) (String) { /*...*/ }
   145  ```
   146  
   147  (The exact symbols involved and whether or not they're pointers may vary.)
   148  
   149  The type might carry less semantic information than it does when a
   150  named scalar shows up in the same position, but we still use a generated
   151  type (and a pointer) here for two reasons: first of all, and more simply,
   152  consistency; but secondly, for the same performance reasons as applied
   153  to named scalars (if we need to treat this value as an `datamodel.Node` again
   154  in the future, it's much better if we already have a heap pointer rather
   155  than a bare primitive value (`runtime.convT*` functions are often not your
   156  favorite thing to see in a pprof flamegraph)).
   157  
   158  (FUTURE: this is still worth review.  We might actually want to use
   159  bare primitives in a lot of these cases, because surely, if you're about
   160  to want to treat something as an `datamodel.Node` again, then you can use the
   161  generalized methods conforming to `datamodel.Node` which already yield that...?
   162  We'll get more information and impressions about this after trying to use
   163  codegen in bulk (especially the specialized methods).)
   164  
   165  #### plain scalars do not allow additional method attachedments
   166  
   167  While we can't *stop* developers from modifying the source code emitted by codegen,
   168  adding a method to any of the plain scalars is intensely discouraged.
   169  Nothing sensible or good can come of trying to attach a "Validate" method
   170  to something like the `String` type.  Don't do it.
   171  
   172  
   173  Code reuse for plain scalars
   174  ----------------------------
   175  
   176  We *always* need some type that can contain a plain scalar while also
   177  implementing all the `datamodel.Node` methods.  Even if we didn't export it
   178  or show it in any method signitures anywhere at all, we'd *still* need it
   179  for internal implementation of other types, because it's important those
   180  types be able to return a pointer to their fields in their implements of
   181  the `datamodel.Node` contract (otherwise, they'd be terribly slow and alloc-heavy).
   182  
   183  ### can we reuse another package's plain scalars?
   184  
   185  Since there's no functional difference between the plain scalars in a schema
   186  and the scalars implementation from another package that's untyped in the first place,
   187  can we reuse some code from an untyped package in codegen output packages?
   188  
   189  No.
   190  
   191  (Or: "maybe, conditionally, and it would have a lot of caveats and make the
   192  untyped package we try to hitch a ride on become significantly weirder, so...
   193  it's probably not worth it".)
   194  
   195  The reason to desire this so there's less (admittedly quite duplicative) code
   196  in the package emitted by using codegen.
   197  
   198  However, there are *many* "cons" which outweight that single "pro":
   199  
   200  - This would require the untyped package to export their concrete implementation types.
   201  	- This is the *only* reason those implementation types would need to be exported, which is a concerning smell all by itself.
   202  	- In the case of we consider using the 'basicnode' package in particular:
   203  		- Exporting those types allows creation by casting, which exposes an API surface that's not conventional (nor necessarily even possible) for other packages, and will thus be likely to create confusion as well as create multiple ways of doing things which will make refactors harder.
   204  			- We don't like allowing casting for creating values in general for reasons explored well in the go-cid refactors to use wrapper structs: if casting is possible, it's far too easy for an end-user to write shoddy code which dodges all constructors and validation logic.
   205  		- Exporting those types allows unboxing by casting, which again exposes an API surface that's not conventional (nor necessarily even possible) for other packages, and will thus be likely to create confusion as well as create multiple ways of doing things which will make refactors harder.
   206  			- Since we're talking about scalars and they're essentially copy-by-value (except for bytes -- but we give up and rely on "lawful" code for those anyway, since defensive copies are completely nonviable in performance terms), this doesn't create incorrectness issues... but it's still not *good*.
   207  			- Note that while casting to concrete types exported by the output package of codegen is considered acceptable, this is a different beast: you still can't get the raw content out without using at least one more unboxing method; and, if you're casting or doing a type switch with type in a codegen package, it should already instantly be clear that your code is no longer general-purpose, and this will surprise no one.
   208  		- ...And while the above two are true only because the implmentation is by typedefs and they could be fixed by using a wrapper struct... that fix would have exactly the effect of making reuse impossible anyway, since the field in that wrapper struct would need to be unexported (otherwise, immutability would then in turn trivially shatter).
   209  		- The implementation of the scalar for link kinds can't be reused anyway (it *does* use a wrapper struct already, and needs to; type aliases on an interface don't permit adding methods), adding yet more inconsistency and jagged edges to the picture.
   210  		- The "more unnecessarily(-for-end-user-perspectives) exported symbols" code smell counts about 10x as hard for this package in particular, since it's often one of the first ones a newcomer to this library will see: there shouldn't be weird designs with elaborate and far away justifications poking up here.
   211  - Reusing concrete types between packages makes it more likely uncautious users could write code that uses native equality on scalars and get away with it *sometimes*.  Since this is still incorrect and would sometimes fail in fully general code, it's better if code like this flunks out as early as possible, which results in a better ecosystem overall.
   212  - We like it when error messages can include a type name.  It's marginally better for that to be something like "gendemo.String" ('gendemo' being consistent with whatever the rest of the package also says) than just bare "string".
   213  
   214  There are also a few bits that aren't entirely known (at least, at the time of this writing):
   215  namely, how 'any' types are going to be handled in codegen.
   216  Probably, though, the answer is: it's just treated as 'datamodel.Node',
   217  and the codegen package doesn't export *any* more types which regard this situation because that's already sufficient.
   218  
   219  Long story short?  It's better to have plain scalar types in codegen output,
   220  even if they look somewhat duplicative,
   221  because trying to do anything fancier either fails outright
   222  or spawns ridiculously detailed epicycles of complexity.
   223  Emitting the plain scalar types in codegen output
   224  is *more consistent* in almost every way,
   225  will generate less cognitive load for users,
   226  and just plain *works unconditionally*.