github.com/ipld/go-ipld-prime@v0.21.0/schema/gen/go/HACKME_scalars.md (about) 1 What's the deal with scalars, anyway? 2 ===================================== 3 4 Two sorts of scalars 5 -------------------- 6 7 There are two sorts of scalars that show up in codegen: 8 9 - 1: scalars that are just the plain kind (e.g. "string", not even named); 10 - 2: scalars that have named types. 11 12 Plain scalars can't have any special rules or semantics attached to them. 13 14 Named types with scalar kinds (aka a "typedef") **can** have additional rules and semantics attached to them. 15 16 Let's talk about named scalars first, because it's clearer that there's fun there. 17 18 19 ### named scalars 20 21 Named scalars cause a type to be generated. 22 That type information is part of their identity (practically speaking: affects their definition of equality). 23 24 #### named scalars are never equal even if their contents are 25 26 It stands to reason that named scalars can't be freely interchanged. 27 28 If you have a schema: 29 30 ```ipldsch 31 type Foo string 32 type Bar string 33 ``` 34 35 ... then you'll get codegen output code with an exported type for each: 36 37 ```go 38 type Foo struct{ x string } 39 /*...*/ 40 41 type Bar struct{ x string } 42 /*...*/ 43 ``` 44 45 ... and clearly, `(Foo{"asdf"} == Bar{"asdf"}) == false`. 46 47 #### named scalars appear in specialized method argument types and return types 48 49 Just like any other named type, named scalars will appear in specialized methods 50 which are exported on codegen'd types. 51 52 For example, if you have a schema: 53 54 ```ipldsch 55 type Foo string 56 type Bar string 57 type Foomp map {Foo:Bar} 58 ``` 59 60 ... then you'll get codegen output code which includes a method on Foomp: 61 62 ```go 63 func (x *Foomp) LookupByNode(k *Foo) (*Bar) { /*...*/ } 64 ``` 65 66 Such specialized methods are often much shorter, much more efficient to execute, 67 and involve much less error handling to use than their more generalized 68 counterparts on the `datamodel.Node` interface. 69 70 Note that when named scalars appear in the signitures of specialized methods, 71 they always appear as pointers. They will never be `nil`, but there is still 72 a reason that pointers are used here, and it's based on performance. 73 (The details don't matter as a user, but: it means if those values need to be 74 regarded as the `datamodel.Node` interface again in the future, that boxing is 75 inexpensive since we already have a (heap-escaped long ago) pointer. 76 By contrast, copying by value in more places is likely to result in more 77 heap escapes and thus additional undesirable new allocation costs in the 78 (entirely common!) case that the values end up handled as `datamodel.Node` later.) 79 80 #### named scalars have a specialized method which unboxes them to a native primitive unconditionally 81 82 Every named scalar type as a specialized unbox method corresponding to its kind. 83 84 For example, for a `type Foo string`, there will be a `func (f Foo) String() string` method 85 (in addition to the `func (f Foo) AsString() (string, error)` method, 86 which does the same thing but is stuck presenting an error due to interface conformance even though we know that it's statically impossible). 87 88 #### named scalars can have additional methods attached to them 89 90 It's possible for users of codegen to attach additional methods to the types 91 generated for a named scalar. 92 93 This can be either done for purely aesthetic/ergonomic purposes particular 94 to the user's exact product, or, as part of some extended library features. 95 For example, we plan support extended features like "validation" methods 96 via detecting when a user adds a `Valdiate() error` method to a generated type. 97 98 99 ### plain scalars 100 101 Plain scalars also cause a type to be generated; 102 one type for each kind in the Data Model is sufficient. 103 104 Plain scalars show up in codegen output packages almost exactly as if 105 there was a short preamble in every schema: 106 107 ```ipldsch 108 type Int int 109 type Bool bool 110 type Float float 111 type String string 112 type Bytes bytes 113 ``` 114 115 #### note about schema syntax 116 117 There's an issue about capitalization that's somewhat unresolved in schemas: 118 namely, is `type Fwee struct { someField string }` allowed, or a parse error? 119 120 This syntax is questionable because it means some of the scalar kind identifier 121 keywords are allowed in the same place as type names, 122 and it's potentially confusing because when we come to interacting with the 123 generated output code in golang, we still have `String`-with-a-capital-S 124 as a type identifier. 125 126 At any rate, it seems clear that you can mentally capitalize the 's' 127 at any time you see this debatable syntax. 128 129 (We should resolve this issue in the specs, which are in the `datamodel.specs` repo.) 130 131 #### plain scalars appear in specialized method argument types and return types 132 133 This is the same story as for named scalars. 134 135 For example, if you have a schema: 136 137 ```ipldsch 138 type Foomp map {String:String} 139 ``` 140 141 ... then you'll get codegen output code which includes a method on Foomp: 142 143 ```go 144 func (x *Foomp) LookupByNode(k String) (String) { /*...*/ } 145 ``` 146 147 (The exact symbols involved and whether or not they're pointers may vary.) 148 149 The type might carry less semantic information than it does when a 150 named scalar shows up in the same position, but we still use a generated 151 type (and a pointer) here for two reasons: first of all, and more simply, 152 consistency; but secondly, for the same performance reasons as applied 153 to named scalars (if we need to treat this value as an `datamodel.Node` again 154 in the future, it's much better if we already have a heap pointer rather 155 than a bare primitive value (`runtime.convT*` functions are often not your 156 favorite thing to see in a pprof flamegraph)). 157 158 (FUTURE: this is still worth review. We might actually want to use 159 bare primitives in a lot of these cases, because surely, if you're about 160 to want to treat something as an `datamodel.Node` again, then you can use the 161 generalized methods conforming to `datamodel.Node` which already yield that...? 162 We'll get more information and impressions about this after trying to use 163 codegen in bulk (especially the specialized methods).) 164 165 #### plain scalars do not allow additional method attachedments 166 167 While we can't *stop* developers from modifying the source code emitted by codegen, 168 adding a method to any of the plain scalars is intensely discouraged. 169 Nothing sensible or good can come of trying to attach a "Validate" method 170 to something like the `String` type. Don't do it. 171 172 173 Code reuse for plain scalars 174 ---------------------------- 175 176 We *always* need some type that can contain a plain scalar while also 177 implementing all the `datamodel.Node` methods. Even if we didn't export it 178 or show it in any method signitures anywhere at all, we'd *still* need it 179 for internal implementation of other types, because it's important those 180 types be able to return a pointer to their fields in their implements of 181 the `datamodel.Node` contract (otherwise, they'd be terribly slow and alloc-heavy). 182 183 ### can we reuse another package's plain scalars? 184 185 Since there's no functional difference between the plain scalars in a schema 186 and the scalars implementation from another package that's untyped in the first place, 187 can we reuse some code from an untyped package in codegen output packages? 188 189 No. 190 191 (Or: "maybe, conditionally, and it would have a lot of caveats and make the 192 untyped package we try to hitch a ride on become significantly weirder, so... 193 it's probably not worth it".) 194 195 The reason to desire this so there's less (admittedly quite duplicative) code 196 in the package emitted by using codegen. 197 198 However, there are *many* "cons" which outweight that single "pro": 199 200 - This would require the untyped package to export their concrete implementation types. 201 - This is the *only* reason those implementation types would need to be exported, which is a concerning smell all by itself. 202 - In the case of we consider using the 'basicnode' package in particular: 203 - Exporting those types allows creation by casting, which exposes an API surface that's not conventional (nor necessarily even possible) for other packages, and will thus be likely to create confusion as well as create multiple ways of doing things which will make refactors harder. 204 - We don't like allowing casting for creating values in general for reasons explored well in the go-cid refactors to use wrapper structs: if casting is possible, it's far too easy for an end-user to write shoddy code which dodges all constructors and validation logic. 205 - Exporting those types allows unboxing by casting, which again exposes an API surface that's not conventional (nor necessarily even possible) for other packages, and will thus be likely to create confusion as well as create multiple ways of doing things which will make refactors harder. 206 - Since we're talking about scalars and they're essentially copy-by-value (except for bytes -- but we give up and rely on "lawful" code for those anyway, since defensive copies are completely nonviable in performance terms), this doesn't create incorrectness issues... but it's still not *good*. 207 - Note that while casting to concrete types exported by the output package of codegen is considered acceptable, this is a different beast: you still can't get the raw content out without using at least one more unboxing method; and, if you're casting or doing a type switch with type in a codegen package, it should already instantly be clear that your code is no longer general-purpose, and this will surprise no one. 208 - ...And while the above two are true only because the implmentation is by typedefs and they could be fixed by using a wrapper struct... that fix would have exactly the effect of making reuse impossible anyway, since the field in that wrapper struct would need to be unexported (otherwise, immutability would then in turn trivially shatter). 209 - The implementation of the scalar for link kinds can't be reused anyway (it *does* use a wrapper struct already, and needs to; type aliases on an interface don't permit adding methods), adding yet more inconsistency and jagged edges to the picture. 210 - The "more unnecessarily(-for-end-user-perspectives) exported symbols" code smell counts about 10x as hard for this package in particular, since it's often one of the first ones a newcomer to this library will see: there shouldn't be weird designs with elaborate and far away justifications poking up here. 211 - Reusing concrete types between packages makes it more likely uncautious users could write code that uses native equality on scalars and get away with it *sometimes*. Since this is still incorrect and would sometimes fail in fully general code, it's better if code like this flunks out as early as possible, which results in a better ecosystem overall. 212 - We like it when error messages can include a type name. It's marginally better for that to be something like "gendemo.String" ('gendemo' being consistent with whatever the rest of the package also says) than just bare "string". 213 214 There are also a few bits that aren't entirely known (at least, at the time of this writing): 215 namely, how 'any' types are going to be handled in codegen. 216 Probably, though, the answer is: it's just treated as 'datamodel.Node', 217 and the codegen package doesn't export *any* more types which regard this situation because that's already sufficient. 218 219 Long story short? It's better to have plain scalar types in codegen output, 220 even if they look somewhat duplicative, 221 because trying to do anything fancier either fails outright 222 or spawns ridiculously detailed epicycles of complexity. 223 Emitting the plain scalar types in codegen output 224 is *more consistent* in almost every way, 225 will generate less cognitive load for users, 226 and just plain *works unconditionally*.