github.com/tickoalcantara12/micro/v3@v3.0.0-20221007104245-9d75b9bcbab9/docs/v2/design/framework/sync.md

github.com/tickoalcantara12/micro/v3@v3.0.0-20221007104245-9d75b9bcbab9/docs/v2/design/framework/sync.md (about)

1 # Sync
2
3 Sync is a synchronization mechanism for data storage.
4
5 ## Overview
6
7 We need to be able to sync between different Store types and locations. Often we describe
8 this as local, regional, global or cloud, edge, dev. Sync provides a way to quite literally
9 sync data between different stores and provides a Key-Value abstraction with built in
10 data encoding for efficiency and timestamp values.
11
12 What we're fundamentally fighting is replication versus a layered distributed architecture.
13 Replication is a flat design which works at one layer but multi-layer is ultimately
14 the model for a large scale distributed system.
15
16 ## Design
17
18 Ideally we operate like a computer. Cache misses walk the chain, writes as well.
19
20 A computer model
21
22 - CPU Register, L1, L2, L3, Ram, Disk
23
24 Our model
25
26 - local, region, global
27 - memory, cache, database, blob store
28
29 Or more concretely
30
31 - memory, etcd, cockroach, s3/blob/github
32 - service, cache, store, blob
33
34 ## Architecture
35
36 Ultimately what we want is to replicate data without the need for data replication, where
37 every cache miss results in recursively walking the chain. We find that federated models
38 are far superior to replication alone. Again replication operates at a single layer
39 and federation layers on top of it.
40
41 Walking through a real example. Where we're using the micro runtime we have reached
42 limitations in terms of APi rates for cloudflare and global data storage is
43 something we've found is expensive or unsupported by other services without
44 using vpn or wireguard to support replication.
45
46 By using a federated model built entirely in micro, we can allow each layer to
47 operate with their respective abstraction or Store and layer on top simply building
48 the primitive for synchronisation.
49
50 Order of retrieval and storage
51
52 1. Local (memory)
53 2. Cache (etcd)
54 3. Database (cockroach)
55 4. Blob (github)
56
57 ## Source of Truth
58
59 GitHub is and will always be our source of truth, for code, for configuration, for packages and now
60 potentially for blob storage. By creating a central point that is not a server but in some ways
61 cold storage we have a place for long term storage all things.
62
63 What we store in GitHub
64
65 - Code
66 - Configuration
67 - Binaries
68 - Packages
69 - Events
70 - Blobs
71 - Files
72
73 We can optionally load the entirety of our source of truth into DigitalOcean for higher throughput
74 at very low cost and may choose to provide APIs as a central point through DO or elsewhere.
75
76 We have attempted to use CloudFlare as a distributed source of truth but without fully immersing
77 ourselves in workers this will not work. In fact workers pushes us more down the path of a complete
78 micro runtime on the edge using wasm (2022).
79
80 ## References
81
82 - https://en.wikipedia.org/wiki/Microsoft_Sync_Framework
83 - GitHub Large File Storage https://help.github.com/en/github/managing-large-files/versioning-large-files
84 - https://www.digitalocean.com/products/block-storage/
85 - BigCache https://github.com/allegro/bigcache
86 - GroupCache https://github.com/golang/groupcache
87