github.com/simpleiot/simpleiot@v0.18.3/docs/ref/research.md

github.com/simpleiot/simpleiot@v0.18.3/docs/ref/research.md (about)

1 # Research
2
3 This document contains information that has been researched during the course of
4 creating Simple IoT.
5
6 ## Synchronization
7
8 An IoT system is inherently distributed. At a minimum, there are three
9 components:
10
11 1. device (Go, C, etc.)
12 1. cloud (Go)
13 1. multiple browsers (Elm, Js)
14
15 Data can be changed in any of the above locations and must be seamlessly
16 synchronized to other locations. Failing to consider this simple requirement
17 early in building the system can make for brittle and overly complex systems.
18
19 ### The moving target problem
20
21 As long as the connection between instances is solid, they will stay
22 synchronized as each instance will receive all points it is interested in.
23 Therefore, verifying synchronization by comparing Node hashes is a backup
24 mechanism -- that allows us to see what changed when disconnected. The root
25 hashes for a downstream instance changes every time anything in that system
26 changes. This is very useful in that you only need to compare one value to
27 ensure your entire config is synchronized, but it is also a disadvantage in that
28 the top level hash is changing more often so you are trying to compare two
29 moving targets. This is not a problem if things are changing slow enough that it
30 does not matter if they are changing. However, this also limits the data rates
31 to which we can scale.
32
33 Some systems use a concept called Merkle clocks, where events are stored in a
34 Merle DAG and existing nodes in the DAG are immutable and new events are always
35 added as parents to existing events. An immutable DAG has an advantage in that
36 you can always work back in history, which never changes. The SIOT Node tree is
37 mutable by definition. Actual budget uses a similar concept in that it
38 [uses a Merkle Trie](https://github.com/actualbudget/actual/discussions/257) to
39 represent events in time and then prunes the tree as time goes on.
40
41 We could create a separate structure to sync all events (points), but that would
42 require a separate structure on the server for every downstream device and seems
43 overly complex.
44
45 Is it critical that we see all historical data? In an IoT system, there are
46 essentially two sets of date -- current state/config, and historical data. The
47 current state is most critical for most things, but historical data may be used
48 for some algorithms and viewed by users. The volume of data makes it impractical
49 to store all data in resource constrained edge systems. However, maybe it's a
50 mistake to separate these two as synchronizing all data might simplify the
51 system.
52
53 One way to handle the moving target problem is to store an array of previous
54 hashes for the device node in both instances -- perhaps for as long as the
55 synchronization interval. The downstream could then fetch the upstream hash
56 array and see if any of the entries match an entry in the downstream array. This
57 would help cover the case where there may be some time difference when things
58 get updated, but the history should be similar. If there is a hash in history
59 that matches, then we are probably OK.
60
61 Another approach would be to track metrics on how often the top level hash is
62 updating -- if it is too often, then perhaps the system needs tuned.
63
64 There could also be some type of stop-the-world lock where both systems stop
65 processing new nodes during the sync operation. However, if they are not in
66 sync, this probably won't help and definitely hurts scalability.
67
68 ### Resgate
69
70 [resgate.io](https://resgate.io) is an interesting project that solves the
71 problem of creating a real-time API gateway where web clients are synchronized
72 seamlessly. This project uses NATS.io for a backbone, which makes it interesting
73 as NATS is core to this project.
74
75 The Resgate system is primarily concerned with synchronizing browser contents.
76
77 ### Couch/pouchdb
78
79 Has some interesting ideas.
80
81 ### Merkle Trees
82
83 - https://medium.com/@rkkautsar/synchronizing-your-hierarchical-data-with-merkle-tree-dbfe37db3ab7
84 - https://en.wikipedia.org/wiki/Merkle_tree
85 - https://jack-vanlightly.com/blog/2016/10/24/exploring-the-use-of-hash-trees-for-data-synchronization-part-1
86 - https://www.codementor.io/blog/merkle-trees-5h9arzd3n8
87 - Version Control Systems Version control systems like Git and Mercurial use
88 specialized merkle trees to manage versions of files and even directories.
89 One advantage of using merkle trees in version control systems is we can
90 simply compare hashes of files and directories between two commits to know
91 if they've been modified or not, which is quite fast.
92 - No-SQL distributed database systems like Apache Cassandra and Amazon
93 DynamoDB use merkle trees to detect inconsistencies between data replicas.
94 This process of repairing the data by comparing all replicas and updating
95 each one of them to the newest version is also called anti-entropy repair.
96 The process is also described in
97 [Cassandra's documentation](https://docs.datastax.com/en/cassandra/3.0/cassandra/operations/opsRepairNodesManualRepair.html).
98
99 #### Scaling Merkel trees
100
101 One limitation of Merkel trees is the difficulty of updating the tree
102 concurrently. Some information on this:
103
104 - [how to scale blockchains](https://www.forbes.com/sites/forbestechcouncil/2018/11/27/sidechains-how-to-scale-and-improve-blockchains-safely/?sh=193537e64418)
105 - [Angela: A Sparse, Distributed, and Highly Concurrent Merkle Tree](https://people.eecs.berkeley.edu/~kubitron/courses/cs262a-F18/projects/reports/project1_report_ver3.pdf)
106
107 ### Distributed key/value databases
108
109 - etcd
110 - NATS
111 [key/value store](https://docs.nats.io/using-nats/developer/develop_jetstream/kv)
112
113 ### Distributed Hash Tables
114
115 - https://en.wikipedia.org/wiki/Distributed_hash_table
116
117 ### CRDT (Conflict-free replicated data type)
118
119 - https://en.wikipedia.org/wiki/Conflict-free_replicated_data_type
120 - [Yjs](https://yjs.dev/#community)
121 - https://blog.kevinjahns.de/are-crdts-suitable-for-shared-editing/
122 - https://tantaman.com/2022-10-18-lamport-sufficient-for-lww.html
123
124 ### Databases
125
126 - https://tantaman.com/2022-08-23-why-sqlite-why-now.html
127 - instead of doing:
128 `select comment.* from post join comment on comment.post_id = post.id where post.id = x and comment.date < cursor.date and comment.id < cursor.id order by date, id desc limit 101`
129 - we do: `post.comments().last(10).after(curosr);`
130
131 ### Timestamps
132
133 - [Lamport timestamp](https://en.wikipedia.org/wiki/Lamport_timestamp)
134 - used by Yjs
135
136 ## Other IoT Systems
137
138 ### AWS IoT
139
140 - https://www.thingrex.com/aws_iot_thing_attributes_intro/
141 - Thing properties include the following, which are analogous to SIOT node
142 fields.
143 - Name (Desription)
144 - Type (Type)
145 - Attributes (Points)
146 - Groups (Described by tree structure)
147 - Billing Group (Can also be described by tree structure)
148 - https://www.thingrex.com/aws_iot_thing_type/
149 - each type has a specified attributes -- kind of a neat idea