github.com/tickoalcantara12/micro/v3@v3.0.0-20221007104245-9d75b9bcbab9/docs/v2/design/framework/sync.md (about) 1 # Sync 2 3 Sync is a synchronization mechanism for data storage. 4 5 ## Overview 6 7 We need to be able to sync between different Store types and locations. Often we describe 8 this as local, regional, global or cloud, edge, dev. Sync provides a way to quite literally 9 sync data between different stores and provides a Key-Value abstraction with built in 10 data encoding for efficiency and timestamp values. 11 12 What we're fundamentally fighting is replication versus a layered distributed architecture. 13 Replication is a flat design which works at one layer but multi-layer is ultimately 14 the model for a large scale distributed system. 15 16 ## Design 17 18 Ideally we operate like a computer. Cache misses walk the chain, writes as well. 19 20 A computer model 21 22 - CPU Register, L1, L2, L3, Ram, Disk 23 24 Our model 25 26 - local, region, global 27 - memory, cache, database, blob store 28 29 Or more concretely 30 31 - memory, etcd, cockroach, s3/blob/github 32 - service, cache, store, blob 33 34 ## Architecture 35 36 Ultimately what we want is to replicate data without the need for data replication, where 37 every cache miss results in recursively walking the chain. We find that federated models 38 are far superior to replication alone. Again replication operates at a single layer 39 and federation layers on top of it. 40 41 Walking through a real example. Where we're using the micro runtime we have reached 42 limitations in terms of APi rates for cloudflare and global data storage is 43 something we've found is expensive or unsupported by other services without 44 using vpn or wireguard to support replication. 45 46 By using a federated model built entirely in micro, we can allow each layer to 47 operate with their respective abstraction or Store and layer on top simply building 48 the primitive for synchronisation. 49 50 Order of retrieval and storage 51 52 1. Local (memory) 53 2. Cache (etcd) 54 3. Database (cockroach) 55 4. Blob (github) 56 57 ## Source of Truth 58 59 GitHub is and will always be our source of truth, for code, for configuration, for packages and now 60 potentially for blob storage. By creating a central point that is not a server but in some ways 61 cold storage we have a place for long term storage all things. 62 63 What we store in GitHub 64 65 - Code 66 - Configuration 67 - Binaries 68 - Packages 69 - Events 70 - Blobs 71 - Files 72 73 We can optionally load the entirety of our source of truth into DigitalOcean for higher throughput 74 at very low cost and may choose to provide APIs as a central point through DO or elsewhere. 75 76 We have attempted to use CloudFlare as a distributed source of truth but without fully immersing 77 ourselves in workers this will not work. In fact workers pushes us more down the path of a complete 78 micro runtime on the edge using wasm (2022). 79 80 ## References 81 82 - https://en.wikipedia.org/wiki/Microsoft_Sync_Framework 83 - GitHub Large File Storage https://help.github.com/en/github/managing-large-files/versioning-large-files 84 - https://www.digitalocean.com/products/block-storage/ 85 - BigCache https://github.com/allegro/bigcache 86 - GroupCache https://github.com/golang/groupcache 87