github.com/voedger/voedger@v0.0.0-20240520144910-273e84102129/design/archive/20211001/ha.md (about)

     1  # Datacenter HA
     2  
     3  ```dot
     4  digraph name {
     5      node [ fontname="Cambria" shape=rect fontsize=12]
     6  
     7      subgraph cluster_dc1 {
     8          label = "dc1";
     9          cas1_1 [shape = "cylinder"]
    10          cas1_2 [shape = "cylinder"]
    11          app1_1
    12      }
    13      subgraph cluster_dc2 {
    14          label = "dc2";
    15          cas2_1 [shape = "cylinder"]
    16          cas2_2 [shape = "cylinder"]
    17          app1_2 
    18      }
    19      subgraph cluster_dc3 {
    20          label = "dc3";
    21          cas3_1 [shape = "cylinder"]
    22          cas3_2 [shape = "cylinder"]
    23          app1_3
    24      }
    25  
    26      edge [dir=both style=dotted]
    27      app1_1 -> cas1_1
    28      app1_1 -> cas2_1
    29      app1_2 -> cas1_1
    30      app1_2 -> cas3_1
    31      app1_3 -> cas2_2
    32      app1_3 -> cas1_2
    33  }
    34  ```
    35  
    36  # App Update
    37  
    38  - Zero Downtime
    39      - Clients do not get server errors (e.g. 503)
    40      - Latency growth MUST be minimized
    41  - Persistent Cache
    42  
    43  ```dot
    44  digraph cluster {
    45      node [ fontname = "Cambria" fontsize = 12 shape = "rect"]
    46  
    47      subgraph cluster_ac {
    48          label = "App Container";
    49          cache [label="cache.prc"]
    50          old [label="oldApp.prc"]
    51          new [label="newApp.prc" style=dashed]
    52          cder [label="cder.prc"]
    53      }
    54      hbuilder -> hcc [style=dotted]
    55      hcc -> cder
    56      cder -> new
    57      cder -> old
    58      cache -> new [dir=none style=dotted]
    59      cache -> old [dir=none style=dotted]
    60  }
    61  ```
    62  
    63  - cache.prc is a separate process which shares cache memory with apps
    64  - Own memory manager
    65    - https://github.com/couchbase/go-slab 
    66  
    67  ## App Update: Java
    68  
    69  ```dot
    70  digraph name {
    71      node [ fontname = "Cambria" fontsize = 12 shape = "rect"]
    72  
    73      subgraph cluster_node {
    74          label = "node";
    75          cache [label="cache.prc"]
    76          old [label="oldApp.fatjar"]
    77          new [label="newApp.fatjar" style=dashed]
    78          cder [label="cder.prc"]
    79          core [label="core.jar"]
    80      }
    81      hbuilder -> hcc [style=dotted]
    82      hcc -> cder
    83      cder -> core
    84      core -> old
    85      core -> new
    86      cache -> core [dir=none style=dotted]
    87  }
    88  ```
    89  
    90  - Cache can be inside `core.jar`, but  will be lost during core.jar update
    91  
    92  # Node/Container Failure
    93  
    94  ```dot
    95  digraph graphname {
    96  
    97      graph[rankdir=BT splines=ortho]
    98      node [ fontname = "Cambria" shape = "rect" fontsize = 12]
    99      edge [dir=both arrowhead=none arrowtail=none]
   100  
   101      Database[shape = "cylinder"]
   102      PLog[shape = "cylinder"]
   103      WLog[shape = "cylinder"]
   104      WLogP[label="WLog.Partition"]
   105      State[shape = "cylinder"]
   106      StateP[label="State.Partition"]
   107      Partition[label="PLog.Partition"]
   108      Workspace
   109      Container [label="Main App Container" shape=box3d]
   110  
   111      Container -> Database[arrowtail=crow]
   112      Partition -> PLog [arrowtail=crow]
   113      Partition -> Container [arrowtail=crow]
   114      Workspace -> Partition [arrowtail=crow]
   115      PLog -> Database
   116      WLog -> Database
   117      State -> Database
   118      WLogP -> Workspace
   119      WLogP -> WLog [arrowtail=crow]
   120      StateP-> Workspace
   121      StateP->State [arrowtail=crow]
   122      
   123  
   124  }
   125  ```
   126  
   127  ## Distributed Request Handling
   128  ```dot
   129  digraph name {
   130      node [ fontname = "Cambria" fontsize = 12 shape = "rect"]
   131      fd[label="Detect container failure"]
   132      cu[label="Mark container as `Unavailable`"]
   133      fd -> cu
   134  }
   135  ```
   136  
   137  ## Partitioned  Request Handling
   138  
   139  ```dot
   140  digraph name {
   141      node [ fontname = "Cambria" fontsize = 12 shape = "rect"]
   142      fd[label="Detect container failure"]
   143      cu[label="Mark container as `Unavailable`"]
   144      en[label="Elect container for PLog.Parition"]
   145      iph[label="Initialize PartitionHandler"]
   146      fd -> cu
   147      cu -> en
   148      en -> iph
   149  }
   150  ```
   151  
   152  # Links 
   153  
   154  - [Дешевле, надежнее, проще / Александр Христофоров (Одноклассники)](https://youtu.be/Hs2txKgnpAk?t=130)
   155  - [Maintaining Consistency Across Data Centers(Randy Fradin, BlackRock) | Cassandra Summit 2016](https://www.slideshare.net/DataStax/maintaining-consistency-across-data-centers-randy-fradin-blackrock-cassandra-summit-2016)
   156    - Maintaining Consistency Across Data Centers or: How I Learned to Stop Worrying About WAN Latency Randy Fradin BlackRock
   157    - Challenge 1: Latency With all that latency on each operation, isn’t performance terrible? 
   158    - Actually, this wasn’t such a problem: 
   159    - 10ms+ latency per operation is acceptable for many apps 
   160    - Minimize use of sequential operations 
   161    - High throughput still achievable
   162