github.com/simpleiot/simpleiot@v0.18.3/docs/ref/data.md

github.com/simpleiot/simpleiot@v0.18.3/docs/ref/data.md (about)

1 # Data
2
3 **Contents**
4
5 
6
7 See also:
8
9 - [Data store](store.md)
10 - [Data syncronization](sync.md)
11
12 ## Data Structures
13
14 As a client developer, there are two main primary structures:
15 [`NodeEdge`](https://pkg.go.dev/github.com/simpleiot/simpleiot/data#NodeEdge)
16 and [`Point`](https://pkg.go.dev/github.com/simpleiot/simpleiot/data#Point). A
17 `Node` can be considered a collection of `Points`.
18
19 These data structures describe most data that is stored and transferred in a
20 Simple IoT system.
21
22 The core data structures are currently defined in the
23 [`data`](https://github.com/simpleiot/simpleiot/tree/master/data) directory for
24 Go code, and
25 [`frontend/src/Api`](https://github.com/simpleiot/simpleiot/tree/master/frontend/src/Api)
26 directory for Elm code.
27
28 A `Point` can represent a sensor value, or a configuration parameter for the
29 node. With sensor values and configuration represented as `Points`, it becomes
30 easy to use both sensor data and configuration in rule or equations because the
31 mechanism to use both is the same. Additionally, if all `Point` changes are
32 recorded in a time series database (for instance Influxdb), you automatically
33 have a record of all configuration and sensor changes for a `node`.
34
35 Treating most data as `Points` also has another benefit in that we can easily
36 simulate a device -- simply provide a UI or write a program to modify any point
37 and we can shift from working on real data to simulating scenarios we want to
38 test.
39
40 Edges are used to describe the relationships between nodes as a
41 [directed acyclic graph](https://en.wikipedia.org/wiki/Directed_acyclic_graph).
42
43 ![dag](images/dag.svg)
44
45 `Nodes` can have parents or children and thus be represented in a hierarchy. To
46 add structure to the system, you simply add nested `Nodes`. The `Node` hierarchy
47 can represent the physical structure of the system, or it could also contain
48 virtual `Nodes`. These virtual nodes could contain logic to process data from
49 sensors. Several examples of virtual nodes:
50
51 - a pump `Node` that converts motor current readings into pump events.
52 - implement moving averages, scaling, etc on sensor data.
53 - combine data from multiple sensors
54 - implement custom logic for a particular application
55 - a component in an edge device such as a cellular modem
56
57 Like Nodes, Edges also contain a Point array that further describes the
58 relationship between Nodes. Some examples:
59
60 - role the user plays in the node (viewer, admin, etc)
61 - order of notifications when sequencing notifications through a node's users
62 - node is enabled/disabled -- for instance we may want to disable a Modbus IO
63 node that is not currently functioning.
64
65 Being able to arranged nodes in an arbitrary hierarchy also opens up some
66 interesting possibilities such as creating virtual nodes that have a number of
67 children that are collecting data. The parent virtual nodes could have rules or
68 logic that operate off data from child nodes. In this case, the virtual parent
69 nodes might be a town or city, service provider, etc., and the child nodes are
70 physical edge nodes collecting data, users, etc.
71
72 ### The Point `Key` field constraint
73
74 The Point data structure has a `Key` field that can be used to construct Array
75 and Map data structures in a node. This is a flexible idea in that it is easy to
76 transition from a scaler value to an array or map. However, it can also cause
77 problems if one client is writing key values of `""` and another client (say a
78 rule action) is writing value of `"0"`. One solution is to have fancy logic that
79 equates `""` to `"0"` on point updates, compares, etc. Another approach is to
80 consider `""` and invalid key value and set key to `"0"` for scaler values. This
81 incurs a slight amount of overhead, but leads to more predictable operation and
82 eliminates the possibility of having two points in a node that mean the same
83 things.
84
85 **The Simple IoT Store always sets the Key field to `"0"` on incoming points if
86 the Key field is blank.**
87
88 Clients should be written with this in mind.
89
90 ### Converting Nodes to other data structures
91
92 Nodes and Points are convenient for storage and synchronization, but cumbersome
93 to work with in application code that uses the data, so we typically convert
94 them to another data structure.
95 [`data.Decode`](https://pkg.go.dev/github.com/simpleiot/simpleiot/data#Decode),
96 [`data.Encode`](https://pkg.go.dev/github.com/simpleiot/simpleiot/data#Encode),
97 and
98 [`data.MergePoints`](https://pkg.go.dev/github.com/simpleiot/simpleiot/data#MergePoints)
99 can be used to convert Node data structures to your own custom `struct`, much
100 like the Go `json` package.
101
102 ### Arrays and Maps
103
104 Points can be used to represent arrays and maps. For an array, the `key` field
105 contains the index `"0"`, `"1"`, `"2"`, etc. For maps, the `key` field contains
106 the key of the map. An example:
107
108 | Type | Key | Text | Value |
109 | --------------- | ----- | ---------------- | ----- |
110 | description | 0 | Node Description | |
111 | ipAddress | 0 | 192.168.1.10 | |
112 | ipAddress | 1 | 10.0.0.3 | |
113 | diskPercentUsed | / | | 43 |
114 | diskPercentUsed | /home | | 75 |
115 | switch | 0 | | 1 |
116 | switch | 1 | | 0 |
117
118 The above would map to the following Go type:
119
120 ```go
121 type myNode struct {
122 ID string `node:"id"`
123 Parent string `node:"parent"`
124 Description string `node:"description"`
125 IpAddresses []string `point:"ipAddress"`
126 Switches []bool `point:"switch"`
127 DiscPercentUsed []float64 `point:"diskPercentUsed"`
128 }
129 ```
130
131 The
132 [`data.Decode()`](https://pkg.go.dev/github.com/simpleiot/simpleiot/data#Decode)
133 function can be used to decode an array of points into the above type. The
134 [`data.Merge()`](https://pkg.go.dev/github.com/simpleiot/simpleiot/data#MergePoints)
135 function can be used to update an existing struct from a new point.
136
137 #### Best practices for working with arrays
138
139 If you are going to make changes to an array in UI/Client code, and you are
140 storing the array in a native structure, then you also need to store a length
141 field as well so you know how long the original array was. After modifying the
142 array, check if the new length is less than the original -- if it is, then add a
143 tombstone points to the end so that the deleted points get removed.
144
145 Generally it is simplest to send the entire array as a single message any time
146 any value in it has changed -- especially if values are going to be added or
147 removed. The `data.Decode` will then correctly handle the array resizing.
148
149 #### Technical details of how `data.Decode` works with slices
150
151 Some consideration is needed when using `Decode` and `MergePoints` to decode
152 points into Go slices. Slices are never allocated / copied unless they are being
153 expanded. Instead, deleted points are written to the slice as the zero value.
154 However, for a given `Decode` call, if points are deleted from the _end_ of the
155 slice, `Decode` will re-slice it to remove those values from the slice. Thus,
156 there is an important consideration for clients: if they wish to rely on slices
157 being truncated when points are deleted, points must be batched in order such
158 that `Decode` sees the trailing deleted points first. Put another way, `Decode`
159 does not care about points deleted from prior calls to `Decode`, so "holes" of
160 zero values may still appear at the end of a slice under certain circumstances.
161 Consider points with integer values `[0, 1, 2, 3, 4]`. If tombstone is set on
162 point with `Key` 3 followed by a point tombstone set on point with `Key` `4`,
163 the resulting slice will be `[0, 1, 2]` if these points are batched together,
164 but if they are sent separately (thus resulting in multiple `Decode` calls), the
165 resulting slice will be `[0, 1, 2, 0]`.
166
167 ## Node Topology changes
168
169 Nodes can exist in multiple locations in the tree. This allows us to do things
170 like include a user in multiple groups.
171
172 ### Add
173
174 Node additions are detected in real-time by sending the points for the new node
175 as well as points for the edge node that adds the node to the tree.
176
177 ### Copy
178
179 Node copies are are similar to add, but only the edge points are sent.
180
181 ### Delete
182
183 Node deletions are recorded by setting a tombstone point in the edge above the
184 node to true. If a node is deleted, this information needs to be recorded,
185 otherwise the synchronization process will simply re-create the deleted node if
186 it exists on another instance.
187
188 ### Move
189
190 Move is just a combination of Copy and Delete.
191
192 If the any real-time data is lost in any of the above operations, the catch up
193 synchronization will propagate any node changes.
194
195 ## Tracking who made changes
196
197 The `Point` type has an `Origin` field that is used to track who generated this
198 point. If the node that owned the point generated the point, then Origin can be
199 left blank -- this saves data bandwidth -- especially for sensor data which is
200 generated by the client managing the node. There are several reasons for the
201 `Origin` field:
202
203 - track who made changes for auditing and debugging purposes. If a rule or some
204 process other than the owning node modifies a point, the Origin should always
205 be populated. Tests that generate points should generally set the origin to
206 "test".
207 - eliminate echos where a client may be subscribed to a subject as well as
208 publish to the same subject. With the Origin field, the client can determine
209 if it was the author of a point it receives, and if so simply drop it. See
210 [client documentation](client.md#message-echo) for more discussion of the echo
211 topic.
212
213 ## Evolvability
214
215 One important consideration in data design is the can the system be easily
216 changed. With a distributed system, you may have different versions of the
217 software running at the same time using the same data. One version may use/store
218 additional information that the other does not. In this case, it is very
219 important that the other version does not delete this data, as could easily
220 happen if you decode data into a type, and then re-encode and store it.
221
222 With the Node/Point system, we don't have to worry about this issue because
223 Nodes are only updated by sending Points. It is not possible to delete a Node
224 Point. So it one version writes a Point the other is not using, it will be
225 transferred, stored, synchronized, etc and simply ignored by version that don't
226 use this point. This is another case where SIOT solves a hard problem that
227 typically requires quite a bit of care and effort.