github.com/simpleiot/simpleiot@v0.18.3/docs/ref/data.md (about) 1 # Data 2 3 **Contents** 4 5 <!-- toc --> 6 7 See also: 8 9 - [Data store](store.md) 10 - [Data syncronization](sync.md) 11 12 ## Data Structures 13 14 As a client developer, there are two main primary structures: 15 [`NodeEdge`](https://pkg.go.dev/github.com/simpleiot/simpleiot/data#NodeEdge) 16 and [`Point`](https://pkg.go.dev/github.com/simpleiot/simpleiot/data#Point). A 17 `Node` can be considered a collection of `Points`. 18 19 These data structures describe most data that is stored and transferred in a 20 Simple IoT system. 21 22 The core data structures are currently defined in the 23 [`data`](https://github.com/simpleiot/simpleiot/tree/master/data) directory for 24 Go code, and 25 [`frontend/src/Api`](https://github.com/simpleiot/simpleiot/tree/master/frontend/src/Api) 26 directory for Elm code. 27 28 A `Point` can represent a sensor value, or a configuration parameter for the 29 node. With sensor values and configuration represented as `Points`, it becomes 30 easy to use both sensor data and configuration in rule or equations because the 31 mechanism to use both is the same. Additionally, if all `Point` changes are 32 recorded in a time series database (for instance Influxdb), you automatically 33 have a record of all configuration and sensor changes for a `node`. 34 35 Treating most data as `Points` also has another benefit in that we can easily 36 simulate a device -- simply provide a UI or write a program to modify any point 37 and we can shift from working on real data to simulating scenarios we want to 38 test. 39 40 Edges are used to describe the relationships between nodes as a 41 [directed acyclic graph](https://en.wikipedia.org/wiki/Directed_acyclic_graph). 42 43  44 45 `Nodes` can have parents or children and thus be represented in a hierarchy. To 46 add structure to the system, you simply add nested `Nodes`. The `Node` hierarchy 47 can represent the physical structure of the system, or it could also contain 48 virtual `Nodes`. These virtual nodes could contain logic to process data from 49 sensors. Several examples of virtual nodes: 50 51 - a pump `Node` that converts motor current readings into pump events. 52 - implement moving averages, scaling, etc on sensor data. 53 - combine data from multiple sensors 54 - implement custom logic for a particular application 55 - a component in an edge device such as a cellular modem 56 57 Like Nodes, Edges also contain a Point array that further describes the 58 relationship between Nodes. Some examples: 59 60 - role the user plays in the node (viewer, admin, etc) 61 - order of notifications when sequencing notifications through a node's users 62 - node is enabled/disabled -- for instance we may want to disable a Modbus IO 63 node that is not currently functioning. 64 65 Being able to arranged nodes in an arbitrary hierarchy also opens up some 66 interesting possibilities such as creating virtual nodes that have a number of 67 children that are collecting data. The parent virtual nodes could have rules or 68 logic that operate off data from child nodes. In this case, the virtual parent 69 nodes might be a town or city, service provider, etc., and the child nodes are 70 physical edge nodes collecting data, users, etc. 71 72 ### The Point `Key` field constraint 73 74 The Point data structure has a `Key` field that can be used to construct Array 75 and Map data structures in a node. This is a flexible idea in that it is easy to 76 transition from a scaler value to an array or map. However, it can also cause 77 problems if one client is writing key values of `""` and another client (say a 78 rule action) is writing value of `"0"`. One solution is to have fancy logic that 79 equates `""` to `"0"` on point updates, compares, etc. Another approach is to 80 consider `""` and invalid key value and set key to `"0"` for scaler values. This 81 incurs a slight amount of overhead, but leads to more predictable operation and 82 eliminates the possibility of having two points in a node that mean the same 83 things. 84 85 **The Simple IoT Store always sets the Key field to `"0"` on incoming points if 86 the Key field is blank.** 87 88 Clients should be written with this in mind. 89 90 ### Converting Nodes to other data structures 91 92 Nodes and Points are convenient for storage and synchronization, but cumbersome 93 to work with in application code that uses the data, so we typically convert 94 them to another data structure. 95 [`data.Decode`](https://pkg.go.dev/github.com/simpleiot/simpleiot/data#Decode), 96 [`data.Encode`](https://pkg.go.dev/github.com/simpleiot/simpleiot/data#Encode), 97 and 98 [`data.MergePoints`](https://pkg.go.dev/github.com/simpleiot/simpleiot/data#MergePoints) 99 can be used to convert Node data structures to your own custom `struct`, much 100 like the Go `json` package. 101 102 ### Arrays and Maps 103 104 Points can be used to represent arrays and maps. For an array, the `key` field 105 contains the index `"0"`, `"1"`, `"2"`, etc. For maps, the `key` field contains 106 the key of the map. An example: 107 108 | Type | Key | Text | Value | 109 | --------------- | ----- | ---------------- | ----- | 110 | description | 0 | Node Description | | 111 | ipAddress | 0 | 192.168.1.10 | | 112 | ipAddress | 1 | 10.0.0.3 | | 113 | diskPercentUsed | / | | 43 | 114 | diskPercentUsed | /home | | 75 | 115 | switch | 0 | | 1 | 116 | switch | 1 | | 0 | 117 118 The above would map to the following Go type: 119 120 ```go 121 type myNode struct { 122 ID string `node:"id"` 123 Parent string `node:"parent"` 124 Description string `node:"description"` 125 IpAddresses []string `point:"ipAddress"` 126 Switches []bool `point:"switch"` 127 DiscPercentUsed []float64 `point:"diskPercentUsed"` 128 } 129 ``` 130 131 The 132 [`data.Decode()`](https://pkg.go.dev/github.com/simpleiot/simpleiot/data#Decode) 133 function can be used to decode an array of points into the above type. The 134 [`data.Merge()`](https://pkg.go.dev/github.com/simpleiot/simpleiot/data#MergePoints) 135 function can be used to update an existing struct from a new point. 136 137 #### Best practices for working with arrays 138 139 If you are going to make changes to an array in UI/Client code, and you are 140 storing the array in a native structure, then you also need to store a length 141 field as well so you know how long the original array was. After modifying the 142 array, check if the new length is less than the original -- if it is, then add a 143 tombstone points to the end so that the deleted points get removed. 144 145 Generally it is simplest to send the entire array as a single message any time 146 any value in it has changed -- especially if values are going to be added or 147 removed. The `data.Decode` will then correctly handle the array resizing. 148 149 #### Technical details of how `data.Decode` works with slices 150 151 Some consideration is needed when using `Decode` and `MergePoints` to decode 152 points into Go slices. Slices are never allocated / copied unless they are being 153 expanded. Instead, deleted points are written to the slice as the zero value. 154 However, for a given `Decode` call, if points are deleted from the _end_ of the 155 slice, `Decode` will re-slice it to remove those values from the slice. Thus, 156 there is an important consideration for clients: if they wish to rely on slices 157 being truncated when points are deleted, points must be batched in order such 158 that `Decode` sees the trailing deleted points first. Put another way, `Decode` 159 does not care about points deleted from prior calls to `Decode`, so "holes" of 160 zero values may still appear at the end of a slice under certain circumstances. 161 Consider points with integer values `[0, 1, 2, 3, 4]`. If tombstone is set on 162 point with `Key` 3 followed by a point tombstone set on point with `Key` `4`, 163 the resulting slice will be `[0, 1, 2]` if these points are batched together, 164 but if they are sent separately (thus resulting in multiple `Decode` calls), the 165 resulting slice will be `[0, 1, 2, 0]`. 166 167 ## Node Topology changes 168 169 Nodes can exist in multiple locations in the tree. This allows us to do things 170 like include a user in multiple groups. 171 172 ### Add 173 174 Node additions are detected in real-time by sending the points for the new node 175 as well as points for the edge node that adds the node to the tree. 176 177 ### Copy 178 179 Node copies are are similar to add, but only the edge points are sent. 180 181 ### Delete 182 183 Node deletions are recorded by setting a tombstone point in the edge above the 184 node to true. If a node is deleted, this information needs to be recorded, 185 otherwise the synchronization process will simply re-create the deleted node if 186 it exists on another instance. 187 188 ### Move 189 190 Move is just a combination of Copy and Delete. 191 192 If the any real-time data is lost in any of the above operations, the catch up 193 synchronization will propagate any node changes. 194 195 ## Tracking who made changes 196 197 The `Point` type has an `Origin` field that is used to track who generated this 198 point. If the node that owned the point generated the point, then Origin can be 199 left blank -- this saves data bandwidth -- especially for sensor data which is 200 generated by the client managing the node. There are several reasons for the 201 `Origin` field: 202 203 - track who made changes for auditing and debugging purposes. If a rule or some 204 process other than the owning node modifies a point, the Origin should always 205 be populated. Tests that generate points should generally set the origin to 206 "test". 207 - eliminate echos where a client may be subscribed to a subject as well as 208 publish to the same subject. With the Origin field, the client can determine 209 if it was the author of a point it receives, and if so simply drop it. See 210 [client documentation](client.md#message-echo) for more discussion of the echo 211 topic. 212 213 ## Evolvability 214 215 One important consideration in data design is the can the system be easily 216 changed. With a distributed system, you may have different versions of the 217 software running at the same time using the same data. One version may use/store 218 additional information that the other does not. In this case, it is very 219 important that the other version does not delete this data, as could easily 220 happen if you decode data into a type, and then re-encode and store it. 221 222 With the Node/Point system, we don't have to worry about this issue because 223 Nodes are only updated by sending Points. It is not possible to delete a Node 224 Point. So it one version writes a Point the other is not using, it will be 225 transferred, stored, synchronized, etc and simply ignored by version that don't 226 use this point. This is another case where SIOT solves a hard problem that 227 typically requires quite a bit of care and effort.