github.com/tickoalcantara12/micro/v3@v3.0.0-20221007104245-9d75b9bcbab9/docs/v2/design/framework/sync.md (about)

     1  # Sync
     2  
     3  Sync is a synchronization mechanism for data storage.
     4  
     5  ## Overview
     6  
     7  We need to be able to sync between different Store types and locations. Often we describe 
     8  this as local, regional, global or cloud, edge, dev. Sync provides a way to quite literally 
     9  sync data between different stores and provides a Key-Value abstraction with built in 
    10  data encoding for efficiency and timestamp values.
    11  
    12  What we're fundamentally fighting is replication versus a layered distributed architecture. 
    13  Replication is a flat design which works at one layer but multi-layer is ultimately 
    14  the model for a large scale distributed system.
    15  
    16  ## Design
    17  
    18  Ideally we operate like a computer. Cache misses walk the chain, writes as well.
    19  
    20  A computer model
    21  
    22  - CPU Register, L1, L2, L3, Ram, Disk
    23  
    24  Our model
    25  
    26  - local, region, global
    27  - memory, cache, database, blob store
    28  
    29  Or more concretely
    30  
    31  - memory, etcd, cockroach, s3/blob/github
    32  - service, cache, store, blob
    33  
    34  ## Architecture
    35  
    36  Ultimately what we want is to replicate data without the need for data replication, where 
    37  every cache miss results in recursively walking the chain. We find that federated models 
    38  are far superior to replication alone. Again replication operates at a single layer 
    39  and federation layers on top of it.
    40  
    41  Walking through a real example. Where we're using the micro runtime we have reached 
    42  limitations in terms of APi rates for cloudflare and global data storage is 
    43  something we've found is expensive or unsupported by other services without 
    44  using vpn or wireguard to support replication.
    45  
    46  By using a federated model built entirely in micro, we can allow each layer to 
    47  operate with their respective abstraction or Store and layer on top simply building 
    48  the primitive for synchronisation.
    49  
    50  Order of retrieval and storage
    51  
    52  1. Local (memory)
    53  2. Cache (etcd)
    54  3. Database (cockroach)
    55  4. Blob (github)
    56  
    57  ## Source of Truth
    58  
    59  GitHub is and will always be our source of truth, for code, for configuration, for packages and now 
    60  potentially for blob storage. By creating a central point that is not a server but in some ways 
    61  cold storage we have a place for long term storage all things.
    62  
    63  What we store in GitHub
    64  
    65  - Code
    66  - Configuration
    67  - Binaries
    68  - Packages
    69  - Events
    70  - Blobs
    71  - Files
    72  
    73  We can optionally load the entirety of our source of truth into DigitalOcean for higher throughput 
    74  at very low cost and may choose to provide APIs as a central point through DO or elsewhere.
    75  
    76  We have attempted to use CloudFlare as a distributed source of truth but without fully immersing 
    77  ourselves in workers this will not work. In fact workers pushes us more down the path of a complete 
    78  micro runtime on the edge using wasm (2022).
    79  
    80  ## References
    81  
    82  - https://en.wikipedia.org/wiki/Microsoft_Sync_Framework
    83  - GitHub Large File Storage https://help.github.com/en/github/managing-large-files/versioning-large-files
    84  - https://www.digitalocean.com/products/block-storage/
    85  - BigCache https://github.com/allegro/bigcache
    86  - GroupCache https://github.com/golang/groupcache
    87